Sitecore xConnect purging tool keeps xDB data in check

One of the crucial components of Sitecore XP’s functionality is xDB (Experience Database), which stores and manages contact and interaction data for use in analytics and personalisation. By default, data is collected and stored for all human interactions and consequently data accumulates over time. It becomes imperative to efficiently manage and clean up stale contact and interaction data to maintain performance and in some cases comply with data privacy regulations.

Prior to support for the xConnect Purging Tool (introduced in XP 10.1 and iterated to support interactions in 10.2) cleaning up stale or irrelevant data from xDB was cumbersome. Writing custom scripts via the xConnect API or raw SQL Deletes could be used, but more often than not cleanup was reactive when an issue arose…..at 2am….on a long weekend. The xConnect purging tool makes this task simple and efficient so that developers or administrators can plug it into proactive regular maintenance plans or even CI/CD pipelines.

How it works

Purge requests are issued via a WebAPI endpoint, however a Sitecore CLI plugin that wraps the API is provided by Sitecore for convenience. Architecturally, both approaches have the same data flow for a purge request:

xConnect purging tool data flow

The purging tool takes advantage of the Cortex task processing engine. Historically this has been marketed and used as a base for registering tasks associated with Machine Learning, but can be used for any task requiring data heavy processing and scalability. The great things is….you don’t really need to worry about it, it’s already running as part of a default XP install.

Both the CLI and Web API offer functionality to:

  • Purge contacts – Delete contacts and all related child data (e.g. interactions, contact facets etc) within specified parameters
  • Purge interactions – Delete interactions and all related child data (e.g. interaction facets etc) within specified parameters
  • Estimate contacts – Estimate the number of contacts that would be purged within specified parameters
  • Estimate interactions – Estimate the number of interactions that would be purged within specified parameters
  • Monitor task progress – Check the status of the task including whether it has completed or failed
  • Cancel a task – For those “Uh oh, I think I just purged all our contacts!” moments

As the tasks are queued and executed asynchronously, there is opportunity to tune task parameters such as batch sizes to best suit your needs and scaling requirements.

xConnect CLI plugin

Using the CLI tooling enables administrators to easily run adhoc maintenance or integrate into automated processes. The xConnect DevEx CLI plugin is available for installation alongside your Sitecore CLI, which will surface the xdb purge commands.

Once the Sitecore CLI is installed (it’s likely you’re already using it for things like serialization and indexing) and you have authenticated with appropriate credentials (interactive and non-interactive flows are supported), the xConnect DevEx plugin can be installed by running:

> dotnet sitecore plugin add -n Sitecore.DevEx.Extensibility.XConnect

NB: At time of writing the latest version of Sitecore.DevEx.Extensibility.XConnect is v3.0.44 . Prior versions are NOT compatible with v5+ of the Sitecore CLI and .NET 6.

Upon successful install, running the xconnect command with the –help flag will show all sub-commands.

> dotnet sitecore xconnect --help
xconnect
  Manage xconnect commands.

Usage:
  sitecore.cli [options] xconnect [command]

Options:
  -?, -h, --help  Show help and usage information

Commands:
  purge     Manage data purge commands.
  estimate  Manage estimate commands.
  export    Manage data export commands.

NB: we’ll only be looking at the purge and estimate sub-commands here.

The –help flag is your friend to explore the CLI and see what options are available with each subcommand.

Estimate

Estimates can be used to validate parameters before purging (a destructive and irreversible process. Always back up folks!) or simply checking if it’s worth executing a purge. Estimates will take into account similar parameters used with the purge command (e.g. cutoffdays) and calculate the expected results. It is worth noting that estimates may vary from actual final purge results due to:

  • The rolling time based nature of the required cutoffdays and started-days-cutoff parameters
  • Consideration for custom conditions (which aren’t discussed here – a blog for another day!) as they are executed at task runtime, so are not included in estimates.
  • Errors – If a piece of data cannot be purged due to an integrity or access issue

Example – Contacts purge estimate

Estimate the number of contacts that would be purged that haven’t interacted or been modified in over 365 days:

> dotnet sitecore xconnect estimate contacts --cutoffdays 365
Estimated total count: 419

Example – Interactions purge estimate

Estimate the number of interactions that would be purged that are over 365 days old from the start of the interaction:

> dotnet sitecore xconnect estimate interactions --started-days-cutoff 365
Estimated total count: 68419

Purge

WARNING: Purge commands are destructive! Don’t run these in an environment you hold dear without understanding the consequences and having appropriate backup and restore regimes in place. By default you will be asked for confirmation, but be careful 😬

Registering a purge task will submit a request to begin asynchronously deleting all contacts or interactions and their associated data in batches that match the parameters provided. Successfully registering a purge task will return the task ID, which can then be used to check the status or cancel the task in flight.

Example – Contacts purge

Delete contacts and associated child entities that haven’t interacted or been modified in over 365 days:

> dotnet sitecore xconnect purge contacts start --cutoffdays 365

The start command deletes data. Are you sure you want to continue?
        [Y|y] - Yes
        [N|n] - No
y
Approved Task registration.
Registered task id: a240122a-fb94-4e21-9b7c-d69296f8863f
To get the status of the purge task, run the command: sitecore xconnect purge status --pti a240122a-fb94-4e21-9b7c-d69296f8863f
To cancel the purge task, run the command: sitecore xconnect purge cancel --pti a240122a-fb94-4e21-9b7c-d69296f8863f

NB: At the time of writing v3.0.44 of the xConnect plugin doesn’t appear to support the -d short alias for –cutoffdays as described in the –help and documentation 🐛🪲.

Example – Interactions purge

Delete interactions that are over 365 days old from the start of the interaction:

> dotnet sitecore xconnect purge interactions start --started-days-cutoff 365

The start command deletes data. Are you sure you want to continue?
        [Y|y] - Yes
        [N|n] - No
y
Approved Task registration.
Registered task id: 3d982367-89ff-4352-b448-2f662d60ca96
To get the status of the purge task, run the command: sitecore xconnect purge status --pti 3d982367-89ff-4352-b448-2f662d60ca96
To cancel the purge task, run the command: sitecore xconnect purge cancel --pti 3d982367-89ff-4352-b448-2f662d60ca96

Both purge requests require the basic parameters of cutoffdays (for contacts) or started-days-cutoff (for interactions) to indicate a point in time where data is considered stale. Additionally custom conditions can be used to create more specific rules around what gets purged beyond just the age of the record but are beyond the scope of this post.

Monitor status

As purge tasks are asynchronously executed in batches, the progress of a task can be checked using the status sub-command. This is particularly handy for large purge tasks or scripts that may want to halt until execution is complete before continuing.

Example – Progress of task

Check the progress of a task previously registered. The processing-task-id is a GUID provided in the CLI output when the purge task is registered.

> dotnet sitecore xconnect purge status --processing-task-id a240122a-fb94-4e21-9b7c-d69296f8863f
 ID       : a240122a-fb94-4e21-9b7c-d69296f8863f
 Created  : 08/21/2023 03:50:36
 Updated  : 08/21/2023 03:50:39
 Status   : Completed
 Progress : 419
 Total    : 419

Cancel task

Tasks can be cancelled if a mistake was made or there is an issue with execution. Progress will be halted, however it is worth noting that any batches already executed will not get rolled back.

> dotnet sitecore xconnect purge cancel--processing-task-id {abc123}

Web API for xConnect Data Tools

If for one reason or another using the CLI is not possible all of the above tasks can be registered directly via the Web API.

Web API requests require an authentication token from an authorized user (administrator or member of the sitecore\Sitecore XConnect Data Admin role) to be included in the Authorization header.

The parameters and expected responses are well documented.

All estimate and purge requests are HTTP POSTs to the CM instance. For example to start a purge of all contacts older than 365 days, make a POST request with the following body to https://cminstance.example.com/sitecore/api/datatools/purge/tasks/contacts

{
    "CutoffDays": 365
}

Example response:

{
    "TaskId": "fbb25c27-fde7-4015-889e-4fc1ba1e0b0f"
}

Help prevent an accidental💥

The xConnect Data Tools and data purging functionality is powerful and a core piece to xDB maintenance, but with great power comes the risk of accidentally purging data that you really wanted to keep. To minimise this risk, the Cortex processing engine exposes task validators. OOTB xConnect Data Tools registers validators to check for MinimumContactCutoffDays and MinimumInteractionStartedDaysCutoff on purge tasks. Setting these minimum values via config validates purge task requests before they are registered, ensuring that accidental typos that may purge recent data cannot be executed. The defaults are both 180, but can be set via config patches on the cortex processing engine.

Example config patch for Cortex processing modifying the minimum values to be 365. Purge tasks attempting to use lower thresholds will be rejected.

<?xml version="1.0" encoding="utf-8" ?>
<Settings>
  <Sitecore>
    <Processing>
      <Services>
        <DataPurgeTaskOptionsValidatorOptions>         
          <Options>            
            <MinimumContactCutoffDays>365</MinimumContactCutoffDays>            
            <MinimumInteractionStartedDaysCutoff>365</MinimumInteractionStartedDaysCutoff>
          </Options>
        </DataPurgeTaskOptionsValidatorOptions>
      </Services>
    </Processing>
  </Sitecore>
</Settings>

Summary

The Sitecore xConnect purging tool makes maintaining xDB easy for administrators. It is scalable, flexible and can be customised to fit in with business requirements while maintaining performance and optimising storage costs. The xConnect CLI plugin enables first class access to register purge tasks with the cortex processing engine so that maintenance can be automated or run ad-hoc. Given purge tasks are destructive in their nature (by design), always ensure you understand commands being executed, have adequate backup regimes in place and understand any local legislation that may be applicable to the deletion of contact and interaction related data in xDB.

Leave a comment