Overview
Website crawling enables:- Automated Content Indexing - Automatically extract and index content from websites
- Knowledge Base Integration - Crawled content is added directly to your knowledge base
- Real-time Status Tracking - Monitor crawl progress and completion status
- Flexible Configuration - Control include/exclude paths, page limits, and crawl intervals
- Page Management - Exclude, include, delete, or resync individual pages
How It Works
- Create a Datasource - Provide a website URL and configuration options
- Crawl Starts Automatically - By default, a crawl begins immediately after creation
- Monitor Progress - Check crawl status and track page processing
- Manage Pages - Review crawled pages, exclude irrelevant ones, or resync outdated content
- Scheduled Recrawls - Datasources automatically recrawl on a configurable interval
Crawl Job Statuses
pending- Crawl job created, waiting to startscraping- Crawl is actively running and extracting contentcompleted- Crawl finished successfully, content has been indexedfailed- Crawl encountered an error and could not completecancelled- Crawl was manually cancelled before completion
Page Sync Statuses
synced- Page content is indexed in the knowledge basepending- Page is waiting to be syncederror- Page failed to syncexcluded- Page is excluded from syncing
Available Endpoints
Datasource Management
Create Datasource
Create a new website datasource and start crawling
List Datasources
List all website datasources for your organization
Get Datasource
Get datasource details with page stats
Update Datasource
Update datasource configuration
Delete Datasource
Delete a website datasource
Crawl Operations
Start Crawl
Start a new crawl for a datasource
Cancel Crawl
Cancel an active crawl
List Crawl Jobs
View crawl history for a datasource
Get Crawl Job
Check the status of a specific crawl job