Domain Crawling

Our Domain Crawling API discovers URLs from any domain. Find and filter thousands of URLs in seconds, optionally, validate and scrape them all in one request—perfect for content audits, competitor analysis, and bulk data collection.

Why Domain Crawling?

Automated URL Discovery No need to manually browse websites or maintain URL lists. Our API discovers URLs and gives coverage of any domain.

Smart Validation Optionally validate that discovered URLs are still accessible with parallel HEAD checks. Filter out broken links before you scrape, saving time and credits.

Integrated Scraping Combine discovery with scraping in a single request. Discovered URLs are automatically queued for scraping with your configured settings, respecting your concurrency limits.

Flexible Filtering Use regex patterns to target specific URL structures—find only blog posts, product pages, or any URL pattern you need.

Parameters Overview

Parameter Type Required Description
domain string Yes Domain to crawl (e.g., “example.com”)
sources array No Discovery sources: ["sitemap"], ["commoncrawl"], or both. Default: both
max_urls integer Yes Maximum URLs to return (1-10,000)
check_if_live boolean No Validate URLs with HEAD checks. Default: true
url_pattern string No Regex pattern to filter URLs during discovery
scraper_config object No Scrape config on discovered URLs
async boolean No Process in background and return task ID. Default: false

Response Formats

{
  "success": true,
  "domain": "example.com",
  "discovered_count": 150,
  "results": [
    {
      "url": "https://example.com/page1",
      "source": "sitemap"
    },
    {
      "url": "https://example.com/page2",
      "source": "commoncrawl"
    }
  ],
  "credits_used": 77.0,
  "credits_remaining": 923.0
}

With Scraper Integration:

{
  "success": true,
  "domain": "example.com",
  "discovered_count": 50,
  "scraper_tasks_submitted": 50,
  "results": [
    {
      "url": "https://example.com/page1",
      "source": "sitemap",
      "scrape_task_id": "abc-123",
      "scrape_check_url": "/api/v1/scraper/tasks/abc-123"
    }
  ],
  "credits_used": 27.0,
  "credits_remaining": 973.0
}

Each discovered URL with scraper integration includes a scrape_task_id for tracking individual scrape jobs.

Error Handling

The API uses standard HTTP status codes:

Status Meaning Action
200 Success Results are ready
202 Accepted Async task is processing
400 Bad request Check your parameters
401 Unauthorized Verify your API key
402 Insufficient credits Add credits to your account
404 Task not found Invalid task ID
429 Rate limit exceeded Wait and retry

Sync vs Async Modes

Synchronous Mode (default) Returns results immediately. Best for small to medium domains (up to ~1000 URLs).

curl -X POST "https://scrape.evomi.com/api/v1/scraper/crawl" \
  -H "x-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "domain": "example.com",
    "max_urls": 500
  }'

Asynchronous Mode Returns a task ID immediately. Poll the status endpoint to retrieve results when ready. Best for large domains (1000+ URLs) or when integrating scraping.

curl -X POST "https://scrape.evomi.com/api/v1/scraper/crawl" \
  -H "x-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "domain": "example.com",
    "max_urls": 5000,
    "async": true
  }'

Response (202 Accepted):

{
  "success": true,
  "task_id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "processing",
  "message": "Crawl task submitted for background processing",
  "check_url": "/api/v1/scraper/crawl/tasks/550e8400-e29b-41d4-a716-446655440000",
  "credits_reserved": 2502.5
}

Check Status:

curl "https://scrape.evomi.com/api/v1/scraper/crawl/tasks/550e8400-e29b-41d4-a716-446655440000" \
  -H "x-api-key: YOUR_API_KEY"

Transparent, Pricing

Operation Cost
Sitemap discovery 2 credits
Common Crawl discovery 2 credits
URL validation (per URL) 0.5 credits

Examples:

  • Discover 100 URLs from sitemap + validate = 2 + (100 × 0.5) = 52 credits
  • Discover 500 URLs from both sources + validate = 4 + (500 × 0.5) = 254 credits
  • Discover 1000 URLs, no validation = 2-4 credits = 2-4 credits
⚠️
Scraping Integration: When using scraper_config to automatically scrape discovered URLs, each scrape is charged separately according to Scraper API pricing. Discovery costs and scraping costs are billed independently.
ℹ️
Smart Refunds: If fewer URLs are discovered than your max_urls limit, you’re only charged for actual validation performed. For example, if you request 1000 URLs but only 300 are found, you pay for 300 validations, not 1000.

Quick Start

Discover all URLs from a domain:

ℹ️
Your API key is available in the Evomi Dashboard. You need an active Scraper API subscription or credits.
curl -X POST "https://scrape.evomi.com/api/v1/scraper/crawl" \
  -H "x-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "domain": "example.com",
    "sources": ["sitemap"],
    "max_urls": 100,
    "check_if_live": true
  }'

Response:

{
  "success": true,
  "domain": "example.com",
  "discovered_count": 87,
  "results": [
    {
      "url": "https://example.com/",
      "source": "sitemap"
    },
    {
      "url": "https://example.com/about",
      "source": "sitemap"
    }
  ],
  "credits_used": 45.5,
  "credits_remaining": 954.5
}

Base URL

https://scrape.evomi.com

All API requests use this base URL with the /api/v1/scraper/crawl endpoint.

Common Use Cases

Content Audits Discover all pages on your site to ensure completeness, check for broken links, or verify sitemap accuracy.

Competitor Analysis Map out competitor websites—find all product pages, blog posts, or landing pages for market research.

SEO Monitoring Track which URLs are indexed, validate they’re accessible, and monitor for 404 errors or redirects.

Bulk Data Collection Discover thousands of URLs and automatically scrape them with a single API call—perfect for large-scale data gathering.

Archive Research Access historical URLs from Common Crawl that may no longer appear in current sitemaps.

Discovery Sources

Sitemaps

Fast and structured. We automatically discover sitemaps from:

  • Common locations (/sitemap.xml, /sitemap_index.xml)
  • Robots.txt declarations
  • Sitemap indexes (recursive parsing)

Best for: Active websites with maintained sitemaps, complete coverage of current content.

Common Crawl

Historical web archive with billions of indexed URLs. Searches Common Crawl’s database for URLs matching your domain.

Best for: Finding historical URLs, discovering pages not in sitemaps, comprehensive competitor research.

ℹ️
You can use both sources in a single request for maximum coverage. The API automatically deduplicates URLs found in multiple sources.

Next Steps

Dive deeper into the Domain Crawling API - Usage Examples -

⚠️
Start with small max_urls values to understand costs before scaling. Use check_if_live: false for initial discovery, then validate only the URLs you need.