Crawler Impact Rules in SharePoint 2010

In SharePoint 2010, you can control the content crawl rate at the search service application level by using Crawler Impact Rules. By default, the number of simultaneous requests changes dynamically based on the server hardware and utilization. Crawler impact rules are often used to throttle the request rate for external websites. You can manage crawler impact rules in Central Administration > Search Service Application > Crawler Impact Rules.

Let’s watch the Filtering Threads and Idle Threads from the OSS Search Gatherer category in Performance Monitor during a content crawl before any crawler impact rules are defined.

In my development VM, I see that the number of filtering threads stays at around 20 and the number of idle threads around 12 for the duration of the crawl which means that an average of 8 threads are being used by the gatherer process.

Next, let’s create a new crawler impact rule to limit the number of simultaneous requests to 2 using the * site name wildcard to apply the rule to all sites.

If we keep an eye on the performance counters this time, we can see that now the difference between the number of filtering and idle threads during a crawl equals to 2 (11 filtering threads and 9 idle threads in the example below).

The performance counters confirmed that our new crawler impact rule is working. Be careful when you delete a crawler impact rule though! I’ve seen it a number of times in different SharePoint farms that SharePoint remembers and keeps using the last deleted crawler impact rule. I verified it by monitoring the performance counters – the deleted crawler impact rule remains in effect until the the SharePoint Server Search 14 service is restarted (or a new rule is added that overrides the deleted rule settings). So remember – restart the SharePoint Server Search 14 windows service after deleting a crawler impact rule!