Web Page Exclusions

To configure this content crawler to avoid importing unwanted Web pages into your portal:

By default, this content crawler follows the Web server's recommendations about which pages might be of value to automated crawlers. If you want to ignore these recommendations, clear the Obey the target site's robot exclusion protocols check box.

In general, these recommendations help limit unwanted content from being crawled into the portal. However, some sites offer very strict recommendations. If your content crawler is not importing any content from a site, try turning this option off.
By default, this content crawler saves the URLs to imported Web pages in the case used on the source Web site. To change the URLs to lower case, select Convert all URLs to lower case.
To avoid importing content from an area of a Web site or to avoid importing particular pages:

To specify an area to avoid, click Add exclusion filter; then, in the text box, type the URL to the area of the Web site that you want to avoid.

You can use wildcard notation (*) to make the exclusion more general. For example, to avoid crawling sales information from a site, you might type http://mycompany.com*sales. As a result, this crawler would not import any pages from mycompany.com that have "sales" anywhere in the URL.

Note: Wildcards are assumed on either side of your text. For example, if you type sales, the crawler will not import any pages from any site accessible from the target URL that has "sales" anywhere in the URL.

Important: If you list exclusions and inclusions (described in step 5), the exclusions apply only to the included pages. For example, if you excluded sales and included http://mycompany.com, your crawler would import all pages from http://mycompany.com except for those pages that had "sales" anywhere in the URL.
To remove an exclusion filter, select it and click .
To select or clear all exclusion filter check boxes, select or clear the box to the left of Exclusion Filters.

By default, this content crawler does not crawl or import any pages specified in the exclusions. If your content crawler will navigate from a link on an excluded page to a page that is not excluded and that should be imported, choose Crawl excluded pages, but do not import them.
To limit your crawl to an area of a Web site or a particular page:

To specify where this content crawler may crawl, click Add inclusion filter; then, in the text box, type the URL to the area of the Web site to which you want to restrict your crawl. Because Web sites can contain links to other sites, you might want to use inclusions to keep your content crawler on a particular site. To avoid crawling other sites, add the base URL of the site you want to crawl to the inclusion list; for example, http://mycompany.com.

You can use wildcard notation (*) to make the inclusion more general. For example, if you want to crawl only information on single sign-on (SSO), you might type http://mycompany.com*sso. As a result, this content crawler would import only pages from mycompany.com that have "sso" anywhere in the URL.

Note: Wildcards are assumed on either side of your text. For example, if you type sso, the content crawler will import any pages from any site accessible from the target URL that has "sso" anywhere in the URL.

Important: If you list inclusions and exclusions, the exclusions apply only to the included pages. For example, if you included http://mycompany.com and excluded sso, your content crawler would import all pages from http://mycompany.com except for those pages that had "sso" anywhere in the URL.
To remove an inclusion filter, select the it and click .
To select or clear all inclusion filter check boxes, select or clear the box to the left of Inclusion Filters.