Url Discovery
Our Url Discovery API discovers URLs from any domain. Find and filter thousands of URLs in seconds and optionally validate them all in one request—perfect for content audits, competitor analysis, and bulk data collection.
What is Url Discovery?
Url Discovery finds URLs from a domain without scraping them. Choose how you want to discover URLs: from sitemaps, Common Crawl archive, or by actively crawling the site.
Returns URLs only — no content is scraped. Use this when you just need the URLs. View “Domain Crawling” Tab for domain scraping.
Discovery Methods
Choose one or more discovery sources:
| Source | Description |
|---|---|
sitemap |
Fast and structured. Automatically discovers sitemaps from common locations, robots.txt, and sitemap indexes. |
commoncrawl |
Historical archive with billions of URLs. Great for finding pages not in sitemaps. |
crawl |
Actively browse the site by following links. Discovers pages dynamically. |
You can use one source or combine multiple sources for maximum coverage.
Parameters Overview
| Parameter | Type | Required | Description |
|---|---|---|---|
domain |
string | Yes | Domain to discover (e.g., “example.com”) |
sources |
array | No | Sources to use: ["sitemap"], ["commoncrawl"], ["crawl"], or combinations. Default: ["sitemap", "commoncrawl"] |
max_urls |
integer | Yes | Maximum URLs to discover (1-10,000) |
check_if_live |
boolean | No | Validate URLs with HEAD checks. Default: true |
url_pattern |
string | No | Regex pattern to filter discovered URLs |
depth |
integer | No | Crawl depth when using crawl source. Default: 1 |
async |
boolean | No | Process in background and return task ID. Default: false |
Pricing
| Operation | Cost |
|---|---|
| Sitemap discovery | 2 credits |
| Common Crawl discovery | 2 credits |
| Crawl source | Scraper’s rates apply |
| URL validation (per URL) | 0.5 credits |
Examples:
- Discover 100 URLs from sitemap + validate = 2 + (100 × 0.5) = 52 credits
- Discover 500 URLs from both sources + validate = 4 + (500 × 0.5) = 254 credits
- Discover 1000 URLs, no validation = 2-4 credits
max_urls limit, you’re only charged for actual validation performed.Quick Start
Discover URLs from a domain:
curl -X POST "https://scrape.evomi.com/api/v1/scraper/map" \
-H "x-api-key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"domain": "example.com",
"max_urls": 100
}'Response:
{
"success": true,
"domain": "example.com",
"discovered_count": 87,
"results": [
{
"url": "https://example.com/",
"source": "sitemap"
},
{
"url": "https://example.com/about",
"source": "sitemap"
}
],
"credits_used": 45.5,
"credits_remaining": 954.5
}Filter with URL Patterns
Use regex to find specific URLs:
curl -X POST "https://scrape.evomi.com/api/v1/scraper/map" \
-H "x-api-key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"domain": "example.com",
"sources": ["sitemap"],
"url_pattern": "/blog/.*",
"max_urls": 50
}'Use Specific Sources
Sitemap only:
curl -X POST "https://scrape.evomi.com/api/v1/scraper/map" \
-H "x-api-key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"domain": "example.com",
"sources": ["sitemap"],
"max_urls": 100
}'Common Crawl only:
curl -X POST "https://scrape.evomi.com/api/v1/scraper/map" \
-H "x-api-key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"domain": "example.com",
"sources": ["commoncrawl"],
"max_urls": 100
}'Active crawl:
curl -X POST "https://scrape.evomi.com/api/v1/scraper/map" \
-H "x-api-key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"domain": "example.com",
"sources": ["crawl"],
"depth": 2,
"max_urls": 100
}'Async Mode
For large discoveries, use async mode:
curl -X POST "https://scrape.evomi.com/api/v1/scraper/map" \
-H "x-api-key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"domain": "example.com",
"max_urls": 5000,
"async": true
}'Response (202 Accepted):
{
"success": true,
"task_id": "550e8400-e29b-41d4-a716-446655440000",
"status": "processing",
"message": "Map task submitted for background processing",
"check_url": "/api/v1/scraper/map/tasks/550e8400-e29b-41d4-a716-446655440000"
}Check status:
curl "https://scrape.evomi.com/api/v1/scraper/map/tasks/550e8400-e29b-41d4-a716-446655440000" \
-H "x-api-key: YOUR_API_KEY"Common Use Cases
Content Audits Discover all pages on your site to ensure completeness or check for broken links.
Competitor Analysis Map out competitor websites—find all product pages, blog posts, or landing pages.
SEO Monitoring Track which URLs are indexed and validate they’re accessible.
Bulk Collection Build a list of URLs to use with other tools or processes.
Base URL
https://scrape.evomi.comAll API requests use this base URL with the /api/v1/scraper/map endpoint.
Error Handling
| Status | Meaning |
|---|---|
| 200 | Success |
| 202 | Async task processing |
| 400 | Bad request |
| 401 | Unauthorized |
| 402 | Insufficient credits |
| 404 | Task not found |
| 429 | Rate limit exceeded |
Next Steps
- Usage Examples - More detailed examples
- Need to scrape too? Use Domain Crawling instead
max_urls values to understand costs. Use check_if_live: false for initial discovery.