Crawl

Overview
How It Works
Crawl Job Statuses
Page Sync Statuses
Available Endpoints
Datasource Management
Crawl Operations
Page Management

The Crawl API allows you to programmatically crawl and index websites into your knowledge base. This enables your AI agents to access and reference content from your website, documentation, or any other web-based resources when responding to customer inquiries.

Overview

Website crawling enables:

Automated Content Indexing - Automatically extract and index content from websites
Knowledge Base Integration - Crawled content is added directly to your knowledge base
Real-time Status Tracking - Monitor crawl progress and completion status
Flexible Configuration - Control include/exclude paths, page limits, and crawl intervals
Page Management - Exclude, include, delete, or resync individual pages

How It Works

Create a Datasource - Provide a website URL and configuration options
Crawl Starts Automatically - By default, a crawl begins immediately after creation
Monitor Progress - Check crawl status and track page processing
Manage Pages - Review crawled pages, exclude irrelevant ones, or resync outdated content
Scheduled Recrawls - Datasources automatically recrawl on a configurable interval

Crawl Job Statuses

pending - Crawl job created, waiting to start
scraping - Crawl is actively running and extracting content
completed - Crawl finished successfully, content has been indexed
failed - Crawl encountered an error and could not complete
cancelled - Crawl was manually cancelled before completion

Page Sync Statuses

synced - Page content is indexed in the knowledge base
pending - Page is waiting to be synced
error - Page failed to sync
excluded - Page is excluded from syncing

Crawling large websites can take significant time and resources. Use include/exclude paths to focus on relevant content and set appropriate page limits.

Available Endpoints

Datasource Management

Create Datasource

Create a new website datasource and start crawling

List Datasources

List all website datasources for your organization

Get Datasource

Get datasource details with page stats

Update Datasource

Update datasource configuration

Delete Datasource

Delete a website datasource

Crawl Operations

Start Crawl

Start a new crawl for a datasource

Cancel Crawl

Cancel an active crawl

List Crawl Jobs

View crawl history for a datasource

Get Crawl Job

Check the status of a specific crawl job

Page Management

List Pages

List crawled pages with filtering options

Exclude Pages

Exclude pages from future syncs

Include Pages

Re-include previously excluded pages

Redact all messages in a session

Create a website datasource

Blocklist

Autopilot

Organization

Contacts

Sessions

AI Phone Agents

AI Email

AI WhatsApp

AI Actions

AI Outbound Sequencing

AI Training

Customer Insights

Teams

CSAT

Redaction

Tags

Office Hours

Handoff Analytics

Workflows

Webhooks

​Overview

​How It Works

​Crawl Job Statuses

​Page Sync Statuses

​Available Endpoints

​Datasource Management

Create Datasource

List Datasources

Get Datasource

Update Datasource

Delete Datasource

​Crawl Operations

Start Crawl

Cancel Crawl

List Crawl Jobs

Get Crawl Job

​Page Management

List Pages

Exclude Pages

Include Pages

Overview

How It Works

Crawl Job Statuses

Page Sync Statuses

Available Endpoints

Datasource Management

Crawl Operations

Page Management