Create a content crawler to import content into your portal from external content repositories. You must run a job associated with the content crawler to periodically search the external repository for content and import that content. For information about jobs, see About Jobs.
Note: Content crawlers depend on content sources. For information on content sources, see About Content Sources.
This topic discusses the following information:
To learn how to create or edit administrative objects (including content crawlers), click here.
A Web content crawler allows users to import content from the Web into the portal.
To learn about the Web Content Crawler Editor, click one of the following editor pages:
A remote content crawler allows users to import content from an external content repository into the portal.
Some crawl providers are installed with the portal and are readily available to portal users, but others require you to manually install them and set them up. For example, Oracle provides the following crawl providers:
Note: For information on obtaining crawl providers, refer to the Oracle Technology Network at http://www.oracle.com/technology/index.html. For information on installing crawl providers, refer to the Installation Guide for Oracle WebCenter Interaction (available on the Oracle Technology Network at http://www.oracle.com/technology/documentation/bea.html) or the documentation that comes with your crawl provider, or contact your portal administrator.
To create a remote content crawler:
To learn about the Remote Content Crawler Editor, click one of the following editor pages:
The following crawl providers, if installed, include at least one extra page to the Remote Content Crawler Editor:
Content Web services allow you to specify general settings for your remote content repository, leaving the target and security settings to be set in the associated remote content source and remote content crawler. This allows you to crawl multiple locations of the same content repository without having to repeatedly specify all the settings.
Note: You create content Web services on which to base your remote content sources. For information on content sources, see About Content Sources.
To learn about the Content Web Service Editor, click one of the following editor pages:
Users can automatically be granted access to the content imported by some remote content crawlers. The Global ACL Sync Map shows these content crawlers how to import source document security.
For an example of how importing security works, see Importing Security Example.
You should check the following if your content crawler does not import the expected content:
Make sure your folder filters are correctly filtering content. To learn about testing your filters, see the Testing Filters section on the Main Settings (Filter) page.
Make sure your content crawler did not place unwanted content into the target folder. If a document does not filter into any subfolders, your content crawler might place the document in the target folder. This is determined by a setting on the Main Settings page of the Folder Editor.
Make sure the content crawler did not place content into the Unclassified Documents folder. If a document cannot be placed in any target folders or subfolders, your content crawler might place the document in the Unclassified Documents folder. This is determined by a setting on the Advanced Settings page of the Content Crawler Editor. If you have the correct permissions, you can view the Unclassified Documents folder when you are editing the Directory or by clicking Administration | Select Utility | Access Unclassified Documents.
Make sure you have at least Edit access to the target folder.
For Web content crawlers, make sure the robot exclusion protocols or any exclusions or inclusions are not keeping your content crawler from importing the expected content. This is determined by a setting on the Web Page Exclusions page of the Content Crawler Editor.
Make sure the authentication information specified in the associated content source allows the portal to access content.
Review the job history for additional information.