Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.open.cx/llms.txt

Use this file to discover all available pages before exploring further.

The Website source crawls a domain you control and ingests its pages into the AI’s training index. Use it when the content you want the AI to know about is reachable at a public URL — a help center, a product docs site, a marketing pages surface — and you don’t have a direct integration for the system that serves it.

When to use the crawler vs a direct integration

SituationUse
You publish your help center through Zendesk, Intercom, or FrontThe dedicated Zendesk / Intercom / Front source
Your content lives in Confluence, Notion, GitBook, or FreshdeskThe dedicated source for that tool
Your content is reachable at a public URL and nothing above appliesThe crawler
The site is gated behind authNot today — the crawler only fetches public URLs
Don’t run both. If you’ve already connected Zendesk (or any other direct source), don’t also crawl the public help-center URL for the same content. The AI will double-index every article and surface duplicates in retrieval.

What the crawler does

  • Kicks off a crawl against the URL you provide.
  • Follows links within the domain up to page_limit pages (default 100, max 5000).
  • Extracts main content as Markdown, skipping navigation chrome and binary assets.
  • Tracks a content hash per page — re-crawls only re-index pages whose content changed.
  • Re-runs on a schedule you pick (crawl_interval_hours, default 168 hours / 7 days).
  • Exposes every discovered page so you can exclude individual URLs, re-include ones you excluded, or force-resync one page.

Connect a website

URL, limits, include/exclude paths, crawl interval.

Troubleshooting

Stuck crawls, locale scoping, pages not indexing.

Crawl API

Programmatic control of datasources, crawls, and pages.

Connect a knowledge source

Decision matrix for all sources.