About Content Sources

A content source provides access to external content repositories, allowing users to import content into the portal through the use of content crawlers and document submission. Each content source is configured to access a document repository. For example, a content source for a secured Web site can be configured to fill out the Web form necessary to gain access to that site.

This topic discusses:

To learn how to create or edit administrative objects (including content sources), click here.

Web Content Sources

A Web content source allows users to import content from the Web into the portal through Web content crawlers or Web document submission.

When you install the portal, the World Wide Web content source is created. This content source provides access to any unsecured Web site.

To learn about the Web Content Source Editor, click one of the following editor pages:

Remote Content Sources

A remote content source allows users to import content from an external content repository into the portal through remote content crawlers or remote document submission.

Some crawl providers are installed with the portal and are readily available to portal users, but others require you to manually install them and set them up. For example, Oracle provides the following crawl providers:

Note: For information on obtaining crawl providers, refer to the Oracle Technology Network at http://www.oracle.com/technology/index.html. For information on installing crawl providers, refer to the Installation Guide for Oracle WebCenter Interaction (available on the Oracle Technology Network at http://www.oracle.com/technology/documentation/bea.html) or the documentation that comes with your crawl provider, or contact your portal administrator.

To create a remote content source:

  1. Install the crawl provider on the portal computer or another computer.
  2. Create a remote server.
  3. Create a crawler Web service.
  4. Create a remote content source.

To learn about the Remote Content Source Editor, click one of the following editor pages:

The following crawl providers, if installed, include at least one extra page to the Remote Content Source Editor:

Securing Content Sources and Their Associated Documents

Depending on users' permissions, they might be able to view, submit, or crawl documents from a content source.

Action

Permissions Needed

Access documents imported into the portal

  • Read access to the document link in the Directory
  • Read access to the Directory folder in which the link is stored
  • Read access to the content source used to import the document
  • If the document is not gatewayed, access to the document in the source repository

Crawl documents into the portal

  • Edit access to the Directory folder into which they are crawling documents
  • Edit access to the administrative folder in which they are creating the content crawler
  • Select access to the content source
  • Access Administration activity right
  • Create Content Crawlers activity right
  • Select access to a job that can run the content crawler or Create Jobs activity right plus Edit access to an administrative folder that is registered to an Automation Service

Submit a document into the portal

  • Edit access to the Directory folder into which they are submitting a document
  • Select access to a content source that supports document submission
  • If the associated content Web service does not support browsing, knowledge of the path to the document

If you have content sources that access sensitive information, be aware that users that have access to the content source and have the additional permissions listed in the table could access anything that the user that the content source impersonates can access. For this reason, you might want to create multiple content sources that access the same repository but that use different authentication information and for which you allow different users access.

Deleting Content Sources

If you delete a content source from which documents have been imported into the portal, the links to the documents will still exist, but users will no longer be able to access these documents.