Domain Crawling
Our Domain Crawling API discovers URLs from any domain. Find and filter thousands of URLs in seconds, optionally, validate and scrape them all in one request—perfect for content audits, competitor analysis, and bulk data collection.
Why Domain Crawling?
Automated URL Discovery No need to manually browse websites or maintain URL lists. Our API discovers URLs and gives coverage of any domain.
Smart Validation Optionally validate that discovered URLs are still accessible with parallel HEAD checks. Filter out broken links before you scrape, saving time and credits.
Integrated Scraping Combine discovery with scraping in a single request. Discovered URLs are automatically queued for scraping with your configured settings, respecting your concurrency limits.
Flexible Filtering Use regex patterns to target specific URL structures—find only blog posts, product pages, or any URL pattern you need.
Parameters Overview
| Parameter | Type | Required | Description |
|---|---|---|---|
domain |
string | Yes | Domain to crawl (e.g., “example.com”) |
sources |
array | No | Discovery sources: ["sitemap"], ["commoncrawl"], or both. Default: both |
max_urls |
integer | Yes | Maximum URLs to return (1-10,000) |
check_if_live |
boolean | No | Validate URLs with HEAD checks. Default: true |
url_pattern |
string | No | Regex pattern to filter URLs during discovery |
scraper_config |
object | No | Scrape config on discovered URLs |
async |
boolean | No | Process in background and return task ID. Default: false |
Response Formats
{
"success": true,
"domain": "example.com",
"discovered_count": 150,
"results": [
{
"url": "https://example.com/page1",
"source": "sitemap"
},
{
"url": "https://example.com/page2",
"source": "commoncrawl"
}
],
"credits_used": 77.0,
"credits_remaining": 923.0
}With Scraper Integration:
{
"success": true,
"domain": "example.com",
"discovered_count": 50,
"scraper_tasks_submitted": 50,
"results": [
{
"url": "https://example.com/page1",
"source": "sitemap",
"scrape_task_id": "abc-123",
"scrape_check_url": "/api/v1/scraper/tasks/abc-123"
}
],
"credits_used": 27.0,
"credits_remaining": 973.0
}Each discovered URL with scraper integration includes a scrape_task_id for tracking individual scrape jobs.
Error Handling
The API uses standard HTTP status codes:
| Status | Meaning | Action |
|---|---|---|
| 200 | Success | Results are ready |
| 202 | Accepted | Async task is processing |
| 400 | Bad request | Check your parameters |
| 401 | Unauthorized | Verify your API key |
| 402 | Insufficient credits | Add credits to your account |
| 404 | Task not found | Invalid task ID |
| 429 | Rate limit exceeded | Wait and retry |
Sync vs Async Modes
Synchronous Mode (default) Returns results immediately. Best for small to medium domains (up to ~1000 URLs).
curl -X POST "https://scrape.evomi.com/api/v1/scraper/crawl" \
-H "x-api-key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"domain": "example.com",
"max_urls": 500
}'Asynchronous Mode Returns a task ID immediately. Poll the status endpoint to retrieve results when ready. Best for large domains (1000+ URLs) or when integrating scraping.
curl -X POST "https://scrape.evomi.com/api/v1/scraper/crawl" \
-H "x-api-key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"domain": "example.com",
"max_urls": 5000,
"async": true
}'Response (202 Accepted):
{
"success": true,
"task_id": "550e8400-e29b-41d4-a716-446655440000",
"status": "processing",
"message": "Crawl task submitted for background processing",
"check_url": "/api/v1/scraper/crawl/tasks/550e8400-e29b-41d4-a716-446655440000",
"credits_reserved": 2502.5
}Check Status:
curl "https://scrape.evomi.com/api/v1/scraper/crawl/tasks/550e8400-e29b-41d4-a716-446655440000" \
-H "x-api-key: YOUR_API_KEY"Transparent, Pricing
| Operation | Cost |
|---|---|
| Sitemap discovery | 2 credits |
| Common Crawl discovery | 2 credits |
| URL validation (per URL) | 0.5 credits |
Examples:
- Discover 100 URLs from sitemap + validate = 2 + (100 × 0.5) = 52 credits
- Discover 500 URLs from both sources + validate = 4 + (500 × 0.5) = 254 credits
- Discover 1000 URLs, no validation = 2-4 credits = 2-4 credits
scraper_config to automatically scrape discovered URLs, each scrape is charged separately according to Scraper API pricing. Discovery costs and scraping costs are billed independently.max_urls limit, you’re only charged for actual validation performed. For example, if you request 1000 URLs but only 300 are found, you pay for 300 validations, not 1000.Quick Start
Discover all URLs from a domain:
curl -X POST "https://scrape.evomi.com/api/v1/scraper/crawl" \
-H "x-api-key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"domain": "example.com",
"sources": ["sitemap"],
"max_urls": 100,
"check_if_live": true
}'Response:
{
"success": true,
"domain": "example.com",
"discovered_count": 87,
"results": [
{
"url": "https://example.com/",
"source": "sitemap"
},
{
"url": "https://example.com/about",
"source": "sitemap"
}
],
"credits_used": 45.5,
"credits_remaining": 954.5
}Base URL
https://scrape.evomi.comAll API requests use this base URL with the /api/v1/scraper/crawl endpoint.
Common Use Cases
Content Audits Discover all pages on your site to ensure completeness, check for broken links, or verify sitemap accuracy.
Competitor Analysis Map out competitor websites—find all product pages, blog posts, or landing pages for market research.
SEO Monitoring Track which URLs are indexed, validate they’re accessible, and monitor for 404 errors or redirects.
Bulk Data Collection Discover thousands of URLs and automatically scrape them with a single API call—perfect for large-scale data gathering.
Archive Research Access historical URLs from Common Crawl that may no longer appear in current sitemaps.
Discovery Sources
Sitemaps
Fast and structured. We automatically discover sitemaps from:
- Common locations (
/sitemap.xml,/sitemap_index.xml) - Robots.txt declarations
- Sitemap indexes (recursive parsing)
Best for: Active websites with maintained sitemaps, complete coverage of current content.
Common Crawl
Historical web archive with billions of indexed URLs. Searches Common Crawl’s database for URLs matching your domain.
Best for: Finding historical URLs, discovering pages not in sitemaps, comprehensive competitor research.
Next Steps
Dive deeper into the Domain Crawling API - Usage Examples -
max_urls values to understand costs before scaling. Use check_if_live: false for initial discovery, then validate only the URLs you need.