EXM comes Out Of the box with Sitecore 9+. That’s great, but how many emails can it send? How do you scale it to meet business requirements? These are questions that need to be answered well ahead of rolling out your new email marketing campaigns as there may be some architectural or infrastructure changes required.
Determining the target audience is the first step to successfully tuning your setup. Consult with your marketing teams to work out what your goals are for volumes and “timeliness” of dispatch. No system is going to be able to send 1 million emails in a few seconds, so set an achievable and acceptable target. This is a metric you can test against and optimise for. EG. Let’s say you have 250k contacts to send to and you’d like them dispatched under 2 hours. Your target magic number is 34.7 emails per second.
Secondly, this is all about performance optimisation so the game here is:
- Test (Dispatch incrementally larger lists)
- Measure (Identify bottlenecks)
- Optimise (Address bottlenecks)
- Rinse, repeat.
As with all performance testing, you want to run this on a “similar” environment to production, but as it’s a trial and error process you’re likely to go too heavy or too light to start with……don’t do this in prod to start with as it can severely impact live environments. <usual disclaimer>
To make the dispatch as realistic as possible, without spamming users and impacting delivery reputation, there are a couple of options:
- Create dummy contact lists with email addresses that will send to “sinkhole” addresses. Most Mail Transfer Agents (MTAs like SparkPost, Sendgrid etc) have a specifically formatted address that will not bounce, but will not get delivered or impact reputation. Since in most cases we use Sitecore Email Cloud (which under the hood uses Sparkpost for delivery), we can use Sparkpost’s sink server addresses (https://www.sparkpost.com/docs/faq/using-sink-server/). If your MTA cannot offer this (most do!), then you may be able to use aliased addresses on a private email server (EG. SiteCoreemail@example.com). Don’t use gmail etc for bulk sends, you will very quickly build bad reputation!
- Emulate the MTA send. This option will not send any emails, however will wait a specified amount of time to “fake” the sending time of an email. This could be based off statistics gathered on a small send via the real MTA. Emulation can be configured per campaign, or as a more likely scenario when you testing, at a server level (Documentation here). Keep in mind these configs need to be set for any role doing dispatch (CM and/or DDS depending on the setup). Patching in the below to your config would skip the send to your MTA for all campaigns and in it’s place delay between 200ms and 400ms at the usual time of dispatch. Also, setting the FailProbability can help making realistic scenarios. Remember to revert these config settings once testing is complete!
<!–Specifies whether the message sending process is emulated without actual message transmission via MTA.–><setting name=”MtaEmulation.Active“ value=”false“/><!–The minimum amount of time to emulate a single sending (milliseconds).–><setting name=”MtaEmulation.MinSendTime“ value=”200“/><!–The maximum amount of time to emulate a single sending (milliseconds).–><setting name=”MtaEmulation.MaxSendTime“ value=”400“/><!– The probability of a connection fail (%). –><setting name=”MtaEmulation.FailProbability“ value=”0.01“/>
So now the dispatch process is sorted, the architecture should be looked at. As a separation of concerns I like to have a separate DDS (Dedicated Dispatch Server) role. There is no hard requirement for a DDS, but this does allow the dispatching to be offloaded, to minimise impact on the CM. By default, adding a DDS service is not enough. The CM will require config changes to ensure it does NOT do any dispatching. Doing this also makes testing the dispatch process easier as you can focus on DDS. For a great how to guide on setting this up and scaling EXM, take a look at Pete Navarra’s series https://sitecorehacker.com/2018/12/24/scaling-exm/.
To help identify bottlenecks, make sure you have access to basic metrics on your services. We’re in Paas, so I created a nice dashboard in Azure portal to review after a dispatch.
- CM/DDS CPU utilisation
- CM/DDS Memory utilisation
- XDB shards, EXM Master, Messaging (Rebus) Database DTU or CPU/IO
This is the easy part. I like to send incrementally larger campaigns working up to the target sizes. This lets you catch smaller issues early. Once comfortable sending (to dummy contacts or MTA emulation) to larger lists, it’s time to measure.
When dispatching, Sitecore gives you a handy stats page that may help you identify the parts of the process that are eating up valuable time. Load this page on the service(s) that are doing the dispatch (DDS or CM depending on your setup). It’s real(ish) time, so hit F5 and monitor progress.
Note the highlighted value. This is the overall throughput on this service. Remember, if you scaled out to have multiple Dispatch servers, add them all together. From here, we need to identify the bottlenecks and optimise until we reach the desired throughput (the magic target number).
Analysing this report in combination with the basic metrics from the dispatch services usually gives a great indication of what needs to be tweaked to ramp up the performance.
Based on the above analysis, optimisations can be implemented and then re-tested to cross reference improvements (or degredations!). Warning: this can be incrementally tedious, yet satisfying once the target is hit!
There are some key areas that can be focused on that will improve performance and throughput. Ideally getting to a stage where the main bottleneck is only addressable by scaling or a dependency such as the MTA.
The OOTB EXM threading and batch size settings tend to be “unoptimised”. There are a number of settings that can be tweaked to make the most of resources available. The Sitecore documentation for EXM configuration settings outlines all the options.
Problem: The queuing phase takes an excessive amount of time
Before dispatching begins, all contacts are queued. This process can be optimised by increasing:
- DispatchEnqueueBatchSize – Allows larger batches to be added to the queue at the expense of CPU, Memory and DB resources.
- DispatchEnqueueThreadsNumber – Number of threads available for queuing the above batches at the expense of CPU, Memory and DB resources.
Problem: Overall throughput while dispatching is low, but the CPU is underutilised
The overall throughput can be manipulated at the expense of CPU utilisation.
- NumberThreads – Increasing this will create more threads to do work, at the expense of CPU.
- MaxGenerationThreads – Increasing this will allow more threads to dispatch, at the expense of CPU. Max limit should be the value you have set in NumberThreads.
Check sleep settings
OOTB config settings implement a delay in the SendEmail Pipeline after dispatching each email. I assume in a bid not to accidentally max out the CPU on the CM. Especially if using a DDS, the sleep processor can be removed entirely or reduced to a much smaller value. Even a small delay can have a huge impact on throughput over time. Sitecore 9.2 OOTB config has a default of 50 ms.
<processor type=”Sitecore.EmailCampaign.Cm.Pipelines.SendEmail.Sleep, Sitecore.EmailCampaign.Cm”>
Number of milliseconds to put the thread to sleep for after an email has been sent.
Do you need Personalization?
When dispatching a new campaign, Sitecore asks if you want to render rule based personalization. If not using it, throughput will be far improved….but keep in mind this is likely one of the main reasons you want to use EXM! NB: This is only rule based personalization, tokens will always be rendered.
The dispatch summary may indicate that there is a bottleneck when creating your page (check the GetPage stat). Like all other Sitecore code, the performance of the page hugely depends on the performance of your code in renderings etc. If this is the case, spend some time optimising (cache things!).
EXM can be scaled both up and/or up by adding additional DDS services. First stop should be optimising your configs, but if that target is still alluding you it may be time to throw some 💰 at it.
Scaling services (App Services or Databases) up can just be done in the usual ways in line with your existing practices.
Scaling the DDS service out is straight forward, however if you are in Azure Paas keep in mind you cannot use the app service scaling Azure provides. Each DDS service needs to be addressable and contactable directly from the CM as it acts as an orchestration point. Create an app service for each DDS instance you require and modify the CM configs to address them directly.
Other performance considerations
Although not directly associated with dispatch, there are other performance considerations to keep in mind when you start pushing out high volume email messages:
- Media requests will increase due to images in the emails. This can be a huge drain on CDs if implemented poorly. Use a CDN to deliver your assets.
- CD, Session and XConnect utilisation will increase through handling of Open pixels and click redirects. Important to factor into load testing scenarios. Consider scaling up/out.