Advanced Settings (Content Crawler)

To specify the language of content, what to do with rejected documents, and a content crawler tag:

  1. Under Content Language, in the drop-down list, choose the language in which the majority of content that you want to import is written.

  2. Under Rejected Documents, specify what to do with documents that do not successfully sort into a folder:

  3. If you are editing an existing content crawler, you see additional options under Rejected Documents that allow you to specify what to do when this content crawler finds a previously rejected document. The definition of "previously rejected" depends on the option you chose in step 4b:

  4. Specify what to do with previously rejected documents:

    If absolutely necessary, you can delete the history of previously rejected documents. Again, the definition of "previously rejected" depends on the option you chose in step 4b. If you chose "from this Content Source" in step 4b, you are deleting the rejection history for all content crawlers that import documents from this content source. If you are still sure that you must delete the history of previously rejected documents, click Clear Rejection History.

    Note: If a document does not sort into any folder but is placed into the Unclassified Documents folder, this does not count as being rejected. Rejected documents are documents that were not placed in any folder.

  5. If you are editing an existing content crawler, you see the section Importing Documents. Under Importing Documents, specify whether to import only new documents. By default, this content crawler attempts to import only new documents (those that have not been previously imported by this content crawler or other content crawlers that access this same content source). You can change the content crawler setting to import multiple copies of each document, which might be useful while testing your content crawlers.

    1. To import only new documents, select Import only new links and new options display; otherwise, skip to step 5.

    2. To specify what new links means:

    3. Note: The option you choose here affects your actions in step 3 and step 4f.

    4. To refresh the previously imported documents as specified on the Document Settings page, select refresh them. Generally, refreshing documents is the job of the Document Refresh Agent; refreshing documents slows the content crawler down. However, if you changed the document settings for this content crawler or changed the property mappings in the associated content types, refreshing documents updates these settings for the previously imported documents.

      Note: If you are crawling an RSS feed, the refresh them option refreshes the properties (such as the title and description) with the values from the target documents, not the RSS feed. If you want to retain the properties from the RSS feed, do not select refresh them.

    5. If you created additional folders or applied different filters to destination folders, select try to sort them into additional folders to sort the previously imported documents into new Knowledge Directory folders.

      Another content crawler might have imported documents from the same content source but into different folders than the destination folders specified for this content crawler. Make sure you really want to re-sort those documents into the destination folders specified for this content crawler.

    6. To re-import documents that were previously deleted (manually, due to expiration, or due to missing source documents), select regenerate deleted links. This might re-import documents that were at one time deemed inappropriate for your portal.

    7. If absolutely necessary, you can delete the history of documents that have been deleted from the portal. "History" is defined by what you specified as new documents in step 3b:

    8. If you are still sure that you must delete the record of documents deleted from the portal, click Clear Deletion History.

  6. To mark imported documents with a content crawler tag, type a tag in the Mark imported documents with the following Content Crawler Tag box. This tag is used to differentiate documents imported by this content crawler from those imported by another content crawler.

  7. Under Runtime Configuration, set the following:

  8. The allowable ranges for these fields are set in the portal configuration file. The values set here are also limited by the maximum threads allowable in the automation service used for the job associated with this content crawler.


  1. Click Administration.
  2. Open the Content Crawler Editor:
  3. On the left, under Edit Object Settings, click Advanced Settings.