Crawling Publishing Sites in SharePoint 2010

Scenario

You have a publishing site with a number of pages that use web parts to display dynamic content based on a query string parameter value. You crawl the site using the SharePoint connector but all you can find is the static page content – the dynamic content generated by the web parts is not searchable.

Solution

The SharePoint connector indexes the content of the Pages library but it ignores “complex URLs” meaning that it ignores URLs that contain query string parameters. The fix is simple – create a Crawl Rule in Central Administration and make sure that the fields are configured as follows:

  • Path: http://hostname/*
  • Crawl Configuration: Include all items in this path
    • Crawl comlex URLs
    • Crawl SharePoint content as http pages

Run a Full Crawl after adding the crawl rule and the dynamic page content should now be searchable.

Example

Let’s say we have a marketing site used to promote a number of different products. We created a single publishing page to show product information for all of the different product models and added the following web part to the page to dynamically set the page title and add product details to the page.

[ToolboxItemAttribute(false)]
public class ProductInformation : WebPart
{
    protected override void CreateChildControls()
    {
        // get the model number from query string
        string modelNumber = Page.Request.QueryString["ModelNumber"];
        if (!string.IsNullOrEmpty(modelNumber))
        {
            // assign a product category based on the model number
            string productCategory = string.Empty;
            switch (modelNumber)
            {
                case "M300":
                case "M400":
                case "M500":
                case "X200":
                case "X250":
                    productCategory = "Digital Camera";
                    break;
                case "X300":
                case "X358":
                case "X400":
                case "X458":
                case "X500":
                    productCategory = "Digital SLR";
                    break;
            }

            // set the page title
            ContentPlaceHolder contentPlaceHolder = (ContentPlaceHolder)Page.Master.FindControl("PlaceHolderPageTitle");
            contentPlaceHolder.Controls.Clear();
            contentPlaceHolder.Controls.Add(new LiteralControl() { Text = string.Format("{0} {1}", modelNumber, productCategory) });

            // add the model number and product category to the page as an H2 heading
            Controls.Add(new LiteralControl() { Text = string.Format("<h2>{0} {1}</h2>", modelNumber, productCategory) });
        }
    }
}

There’s also a static rollup page with a link to each product information page.
ProductRollupPage

We run a full crawl using a SharePoint connector and search the site for one of the product model numbers. All we get back is a single result to the rollup page.
ProductSearchBefore

Navigate to Central Administration > Search Service Application > Crawl Rules and create a new crawl rule using the settings below.
CrawlRuleComplexURLs

Run another full content crawl and then search the site for the same product model number used previously. This time the product information page is included in the search results.
ProductSearchAfter