About Content Crawlers

Create a content crawler to import content into your portal from external content repositories. You must run a job associated with the content crawler to periodically search the external repository for content and import that content. For information about jobs, see About Jobs.

Note: Content crawlers depend on content sources. For information on content sources, see About Content Sources.

This topic discusses the following information:

To learn how to create or edit administrative objects (including content crawlers), click here.

Web Content Crawlers

A Web content crawler allows users to import content from the Web into the portal.

To learn about the Web Content Crawler Editor, click one of the following editor pages:

Remote Content Crawlers

A remote content crawler allows users to import content from an external content repository into the portal.

Some crawl providers are installed with the portal and are readily available to portal users, but others require you to manually install them and set them up. For example, Oracle provides the following crawl providers:

Note: For information on obtaining crawl providers, refer to the Oracle Technology Network at http://www.oracle.com/technology/index.html. For information on installing crawl providers, refer to the Installation Guide for Oracle WebCenter Interaction (available on the Oracle Technology Network at http://www.oracle.com/technology/documentation/bea.html) or the documentation that comes with your crawl provider, or contact your portal administrator.

To create a remote content crawler:

  1. Install the crawl provider on the portal computer or another computer.
  2. Create a remote server.
  3. Create a content Web service (discussed next).
  4. Create a remote content source.
  5. Create a remote content crawler.

To learn about the Remote Content Crawler Editor, click one of the following editor pages:

The following crawl providers, if installed, include at least one extra page to the Remote Content Crawler Editor:

Content Web Services

Content Web services allow you to specify general settings for your remote content repository, leaving the target and security settings to be set in the associated remote content source and remote content crawler. This allows you to crawl multiple locations of the same content repository without having to repeatedly specify all the settings.

Note: You create content Web services on which to base your remote content sources. For information on content sources, see About Content Sources.

To learn about the Content Web Service Editor, click one of the following editor pages:

Importing Document Security

Users can automatically be granted access to the content imported by some remote content crawlers. The Global ACL Sync Map shows these content crawlers how to import source document security.

For an example of how importing security works, see Importing Security Example.

Troubleshooting the Results of a Crawl

You should check the following if your content crawler does not import the expected content: