Web Page Exclusions

To configure this content crawler to avoid importing unwanted Web pages into your portal:

  1. By default, this content crawler follows the Web server's recommendations about which pages might be of value to automated crawlers. If you want to ignore these recommendations, clear the Obey the target site's robot exclusion protocols check box.

    In general, these recommendations help limit unwanted content from being crawled into the portal. However, some sites offer very strict recommendations. If your content crawler is not importing any content from a site, try turning this option off.

  2. By default, this content crawler saves the URLs to imported Web pages in the case used on the source Web site. To change the URLs to lower case, select Convert all URLs to lower case.

  3. To avoid importing content from an area of a Web site or to avoid importing particular pages:

  4. By default, this content crawler does not crawl or import any pages specified in the exclusions. If your content crawler will navigate from a link on an excluded page to a page that is not excluded and that should be imported, choose Crawl excluded pages, but do not import them.

  5. To limit your crawl to an area of a Web site or a particular page:


  1. Click Administration.
  2. Open the Content Crawler Editor:
  3. On the left, under Edit Object Settings, click Web Page Exclusions.