Sitecore 9: Azure search and index all fields

Index all fields

Out of the box Sitecore 9 configs will index all fields in each of the basic indexes (core,master,web, etc..). This is a great strategy to ensure that everything you add to your templates gets indexed down the line. However this also leads to a couple problems:

  • Exceeding the 1000 field maximum on Azure Search indexes
  • Over indexing

1000 field limit

Seen this error pop up in your logs when using the Azure search provider?

ManagedPoolThread #3 09:59:00 ERROR [Index=sitecore_master_index] Commit failed
Exception: System.AggregateException
Message: One or more errors occurred.
Source: mscorlib
at System.Threading.Tasks.Task.ThrowIfExceptional(Boolean includeTaskCanceledExceptions)
at System.Threading.Tasks.Task.Wait(Int32 millisecondsTimeout, CancellationToken cancellationToken)
at System.Threading.Tasks.Parallel.ForWorker[TLocal](Int32 fromInclusive, Int32 toExclusive, ParallelOptions parallelOptions, Action`1 body, Action`2 bodyWithState, Func`4 bodyWithLocal, Func`1 localInit, Action`1 localFinally)
at System.Threading.Tasks.Parallel.ForEachWorker[TSource,TLocal](IEnumerable`1 source, ParallelOptions parallelOptions, Action`1 body, Action`2 bodyWithState, Action`3 bodyWithStateAndIndex, Func`4 bodyWithStateAndLocal, Func`5 bodyWithEverything, Func`1 localInit, Action`1 localFinally)
at System.Threading.Tasks.Parallel.ForEach[TSource](IEnumerable`1 source, Action`1 body)
at Sitecore.ContentSearch.Azure.Http.CompositeSearchService.PostDocuments(ICloudBatch batch)
at Sitecore.ContentSearch.Azure.CloudSearchUpdateContext.Commit()
Nested Exception
Exception: Sitecore.ContentSearch.Azure.Http.Exceptions.BadRequestException
Message: Error in the request URI, headers, or body
Source: Sitecore.ContentSearch.Azure
at Sitecore.ContentSearch.Azure.Http.SearchServiceClient.EnsureSuccessStatusCode(HttpResponseMessage response)
at Sitecore.ContentSearch.Azure.Http.SearchServiceClient.UpdateIndex(IndexDefinition indexDefinition)
at Sitecore.ContentSearch.Azure.Schema.SearchServiceSchemaSynchronizer.SyncRemoteService(IndexDefinition sourceIndexDefinition, IEnumerable`1 incomingFields)
at Sitecore.ContentSearch.Azure.Schema.SearchServiceSchemaSynchronizer.c__DisplayClass17_0.b__0()
at Sitecore.ContentSearch.Azure.Utils.Retryer.RetryPolicy.Execute(Action action)
at Sitecore.ContentSearch.Azure.Http.SearchService.PostDocumentsImpl(ICloudBatch batch)
at Sitecore.ContentSearch.Azure.Http.SearchService.PostDocuments(ICloudBatch batch)
at Sitecore.ContentSearch.Azure.Http.CompositeSearchService.c__DisplayClass15_0.b__0(ISearchService searchService)
at System.Threading.Tasks.Parallel.c__DisplayClass17_0`1.b__1()
at System.Threading.Tasks.Task.InnerInvokeWithArg(Task childTask)
at System.Threading.Tasks.Task.c__DisplayClass176_0.b__0(Object )

Nested Exception
Exception: Sitecore.ContentSearch.Azure.Http.Exceptions.AzureSearchServiceRESTCallException
Message: {"error":{"code":"","message":"The request is invalid. Details: definition : Invalid index: The index contains 1033 field(s). An index can have at most 1000 fields.\r\n"}}

Uh oh, this doesn’t happen on Solr!

Azure search has a hard limit of 1000 fields per index 😢

This is confirmed by checking the limits and quotas for an S1 instance of Azure search service:

Azure search index limits
Source: https://docs.microsoft.com/en-us/azure/search/search-limits-quotas-capacity

An out if the box Sitecore 9.0.1 instance will create some indexes with more than 900 fields…so it works, but allows for very little head room when adding your own fields. As the last point indicates the default settings on the basic indexes index all fields, so any customisation (eg. adding a couple of fields to templates for items that gets indexed) pretty much leads to exceeding this limit. You’ll then start having a bad time when trying to add/modify items or rebuilding the index.

After confirmation from Sitecore support, the default index configs do explicitly include a list of required fields to support basic functionality. So you can change the indexAllFields flag in the default documentOptions to false to ensure all of your custom fields are not automatically added to the index. Patch in this change with a config patch like so:

<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/" xmlns:set="http://www.sitecore.net/xmlconfig/set/" xmlns:search="http://www.sitecore.net/xmlconfig/search/" xmlns:role="http://www.sitecore.net/xmlconfig/role/" >
<sitecore role:require="Standalone or ContentManagement or ContentDelivery" search:require="Azure">
<contentSearch>
<indexConfigurations>
<defaultCloudIndexConfiguration type="Sitecore.ContentSearch.Azure.CloudIndexConfiguration, Sitecore.ContentSearch.Azure">
<documentOptions type="Sitecore.ContentSearch.Azure.CloudSearchDocumentBuilderOptions,Sitecore.ContentSearch.Azure">
<indexAllFields>false</indexAllFields>
</documentOptions>
</defaultCloudIndexConfiguration>
</indexConfigurations>
</contentSearch>
</sitecore>
</configuration>

Rebuilding your index should now complete successfully.

If your solution does need additional fields, be picky. Only index what is needed! Change the default indexAllFields flag so that basic functionality still works, then add a new custom index with the fields your solution requires.  Keep it sub 1000 fields!

This will keep your indexes lean and mean as your project grows. *Hint: and also fix the concern below!

Over indexing

While it’s handy to have everything right there in your search indexes, it can lead to decreases in performance as your content grows.  To keep this in check, only index what you need.

  • Only crawl parts of the tree you need to search
  • Only index the fields you need.
  • Use custom indexes to help keep it organised and performant

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s