Sitecore 9: Azure search and index all fields

Index all fields

Out of the box Sitecore 9 configs will index all fields in each of the basic indexes (core,master,web, etc..). This is a great strategy to ensure that everything you add to your templates gets indexed down the line. However this also leads to a couple problems:

  • Exceeding the 1000 field maximum on Azure Search indexes
  • Over indexing

1000 field limit

Seen this error pop up in your logs when using the Azure search provider?

ManagedPoolThread #3 09:59:00 ERROR [Index=sitecore_master_index] Commit failed
Exception: System.AggregateException
Message: One or more errors occurred.
Source: mscorlib
at System.Threading.Tasks.Task.ThrowIfExceptional(Boolean includeTaskCanceledExceptions)
at System.Threading.Tasks.Task.Wait(Int32 millisecondsTimeout, CancellationToken cancellationToken)
at System.Threading.Tasks.Parallel.ForWorker[TLocal](Int32 fromInclusive, Int32 toExclusive, ParallelOptions parallelOptions, Action`1 body, Action`2 bodyWithState, Func`4 bodyWithLocal, Func`1 localInit, Action`1 localFinally)
at System.Threading.Tasks.Parallel.ForEachWorker[TSource,TLocal](IEnumerable`1 source, ParallelOptions parallelOptions, Action`1 body, Action`2 bodyWithState, Action`3 bodyWithStateAndIndex, Func`4 bodyWithStateAndLocal, Func`5 bodyWithEverything, Func`1 localInit, Action`1 localFinally)
at System.Threading.Tasks.Parallel.ForEach[TSource](IEnumerable`1 source, Action`1 body)
at Sitecore.ContentSearch.Azure.Http.CompositeSearchService.PostDocuments(ICloudBatch batch)
at Sitecore.ContentSearch.Azure.CloudSearchUpdateContext.Commit()
Nested Exception
Exception: Sitecore.ContentSearch.Azure.Http.Exceptions.BadRequestException
Message: Error in the request URI, headers, or body
Source: Sitecore.ContentSearch.Azure
at Sitecore.ContentSearch.Azure.Http.SearchServiceClient.EnsureSuccessStatusCode(HttpResponseMessage response)
at Sitecore.ContentSearch.Azure.Http.SearchServiceClient.UpdateIndex(IndexDefinition indexDefinition)
at Sitecore.ContentSearch.Azure.Schema.SearchServiceSchemaSynchronizer.SyncRemoteService(IndexDefinition sourceIndexDefinition, IEnumerable`1 incomingFields)
at Sitecore.ContentSearch.Azure.Schema.SearchServiceSchemaSynchronizer.c__DisplayClass17_0.b__0()
at Sitecore.ContentSearch.Azure.Utils.Retryer.RetryPolicy.Execute(Action action)
at Sitecore.ContentSearch.Azure.Http.SearchService.PostDocumentsImpl(ICloudBatch batch)
at Sitecore.ContentSearch.Azure.Http.SearchService.PostDocuments(ICloudBatch batch)
at Sitecore.ContentSearch.Azure.Http.CompositeSearchService.c__DisplayClass15_0.b__0(ISearchService searchService)
at System.Threading.Tasks.Parallel.c__DisplayClass17_0`1.b__1()
at System.Threading.Tasks.Task.InnerInvokeWithArg(Task childTask)
at System.Threading.Tasks.Task.c__DisplayClass176_0.b__0(Object )

Nested Exception
Exception: Sitecore.ContentSearch.Azure.Http.Exceptions.AzureSearchServiceRESTCallException
Message: {"error":{"code":"","message":"The request is invalid. Details: definition : Invalid index: The index contains 1033 field(s). An index can have at most 1000 fields.\r\n"}}

Uh oh, this doesn’t happen on Solr!

Azure search has a hard limit of 1000 fields per index 😢

This is confirmed by checking the limits and quotas for an S1 instance of Azure search service:

Azure search index limits
Source: https://docs.microsoft.com/en-us/azure/search/search-limits-quotas-capacity

An out if the box Sitecore 9.0.1 instance will create some indexes with more than 900 fields…so it works, but allows for very little head room when adding your own fields. As the last point indicates the default settings on the basic indexes index all fields, so any customisation (eg. adding a couple of fields to templates for items that gets indexed) pretty much leads to exceeding this limit. You’ll then start having a bad time when trying to add/modify items or rebuilding the index.

After confirmation from Sitecore support, the default index configs do explicitly include a list of required fields to support basic functionality. So you can change the indexAllFields flag in the default documentOptions to false to ensure all of your custom fields are not automatically added to the index. Patch in this change with a config patch like so:

<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/" xmlns:set="http://www.sitecore.net/xmlconfig/set/" xmlns:search="http://www.sitecore.net/xmlconfig/search/" xmlns:role="http://www.sitecore.net/xmlconfig/role/" >
<sitecore role:require="Standalone or ContentManagement or ContentDelivery" search:require="Azure">
<contentSearch>
<indexConfigurations>
<defaultCloudIndexConfiguration type="Sitecore.ContentSearch.Azure.CloudIndexConfiguration, Sitecore.ContentSearch.Azure">
<documentOptions type="Sitecore.ContentSearch.Azure.CloudSearchDocumentBuilderOptions,Sitecore.ContentSearch.Azure">
<indexAllFields>false</indexAllFields>
</documentOptions>
</defaultCloudIndexConfiguration>
</indexConfigurations>
</contentSearch>
</sitecore>
</configuration>

Rebuilding your index should now complete successfully.

If your solution does need additional fields, be picky. Only index what is needed! Change the default indexAllFields flag so that basic functionality still works, then add a new custom index with the fields your solution requires.  Keep it sub 1000 fields!

This will keep your indexes lean and mean as your project grows. *Hint: and also fix the concern below!

Over indexing

While it’s handy to have everything right there in your search indexes, it can lead to decreases in performance as your content grows.  To keep this in check, only index what you need.

  • Only crawl parts of the tree you need to search
  • Only index the fields you need.
  • Use custom indexes to help keep it organised and performant

 

Azure Search on Sitecore 9

Hello, is it me you’re looking for?

Sitecore has implemented Azure Search and Solr search providers for 9+ XP installs.  They both have their pros and cons. But the best way to evaluate and compare them is by getting your hands dirty and breaking some things on your local machine.

Azure Search is Microsoft’s Paas offering that is supported by Sitecore and is the Out Of The Box search provider for Sitecore 9+ destined for PaaS environments. The Sitecore Quickstart ARM templates for all Cloud XP installs will include the appropriate service infrastructure and configs.

Solr is the recommended option for XP on prem deployments.  It is battle tested and fully featured, but has a difficult time fitting in with the Azure Paas offerings many organisations are investing in.

When it comes to your dev box, you’ll likely have installed an on prem package which uses Solr as the default search provider.  Using Azure search for your dev instance full time may be cost prohibitive as you need a Standard instance at minimum. But as an inquisitive dev, it’s easy to have a play with it locally to create and consume custom indexes or just to get a feel for the differences.   Note there are some limitations with using the Azure Search provider as outlined in the Sitecore Azure Search documentation.   There are workarounds for most of these, so really comes down to your implementation and being aware of these limitations when you’ve got the headphones on and in “code mode”.

Azure Search on your dev machine

If you’re game and have a few Azure credits up your sleeve (remember you get a fair whack with your MSDN subscription), then you can get up and running quickly on any 9+ install.

NB: You’ll need a working Sitecore 9+ install (that is using Solr) up and running. If you don’t already have one you can install via SIFLess, for fast installs that won’t take all night long.  If you have an existing customised install there are a number of considerations to keep in mind, which I’ll address in a follow up post.

Create the Azure Search Service

You’ll need to create a search service in the Azure portal.

  1. New Service > Azure Search Service
  2. Give it a URL, it just needs to be unique
  3. Select the appropriate subscription and resource group for your account
  4. Pick the location closest to you to minimise the latency on queries (NB: Azure search is not available in all regions and prices/offerings do vary!)
  5.  Selects a standard pricing tier (you’ll require 15 indexes minimum)
  6. Create it and wait a short while for the service to provision.
  7. Make a note of the search service URL (on the Overview blade) and API key (on the Keys blade) for the following steps.

Create a search service

Modify the configs

The Sitecore 9+ configs allow for an easy switch between providers, assuming you haven’t modified the OOTB configs:

Add a new connection string called “cloud.search”

Using the Search URL and API key from the Azure portal, add a connection string to your ConnectionStrings.config like the below.

<ConnectionStrings>
...
<add name="cloud.search" connectionString="serviceUrl=https://YOURSEARCHURL;apiVersion=2015-02-28-preview;apiKey=YOURAPIKEY"/>
...
</ConnectionStrings>
Change the app search definition app setting in your web.config

Sitecore 9 allows for roles based configuration so, switching providers is just a matter of changing the search definition in app settings.

<AppSettings>
...
<add key="search:define" value="Azure" />
...
</AppSettings>

Rebuild the indexes

Your instance should now be “talking” to your Azure search service, but none of the appropriate data will be populated in the indexes.  Rebuild all your indexes by:

  • Logging into your sitecore instance
  • Go to the Control Panel > Indexing Manager
  • Rebuild Indexes

That’s about it. You’re dancing on the ceiling with searches running in the cloud.  Do some tests.  Have a play, maybe even implement a custom index for your site search.  Just remember to delete your Azure search instance in the portal once you’re done and save those valuable credits!