Url Discovery

Scraper API

Tools

Url Discovery

Our Url Discovery API discovers URLs from any domain. Find and filter thousands of URLs in seconds and optionally validate them all in one request—perfect for content audits, competitor analysis, and bulk data collection.

What is Url Discovery?

Url Discovery finds URLs from a domain without scraping them. Choose how you want to discover URLs: from sitemaps, Common Crawl archive, or by actively crawling the site.

Returns URLs only — no content is scraped. Use this when you just need the URLs. View “Domain Crawling” Tab for domain scraping.

Discovery Methods

Choose one or more discovery sources:

Source	Description
`sitemap`	Fast and structured. Automatically discovers sitemaps from common locations, robots.txt, and sitemap indexes.
`commoncrawl`	Historical archive with billions of URLs. Great for finding pages not in sitemaps.
`crawl`	Actively browse the site by following links. Discovers pages dynamically.

You can use one source or combine multiple sources for maximum coverage.

Parameters Overview

Parameter	Type	Required	Description
`domain`	string	Yes	Domain to discover (e.g., “example.com”)
`sources`	array	No	Sources to use: `["sitemap"]`, `["commoncrawl"]`, `["crawl"]`, or combinations. Default: `["sitemap", "commoncrawl"]`
`max_urls`	integer	Yes	Maximum URLs to discover (1-10,000)
`check_if_live`	boolean	No	Validate URLs with HEAD checks. Default: true
`url_pattern`	string	No	Regex pattern to filter discovered URLs
`depth`	integer	No	Crawl depth when using `crawl` source. Default: 1
`async`	boolean	No	Process in background and return task ID. Default: false
`webhook`	object	No	Webhooks configuration for real-time notifications

Pricing

Operation	Cost
Sitemap discovery	2 credits
Common Crawl discovery	2 credits
Crawl source	Scraper’s rates apply
URL validation (per URL)	0.5 credits

Examples:

Discover 100 URLs from sitemap + validate = 2 + (100 × 0.5) = 52 credits
Discover 500 URLs from both sources + validate = 4 + (500 × 0.5) = 254 credits
Discover 1000 URLs, no validation = 2-4 credits

ℹ️

Smart Refunds: If fewer URLs are discovered than your max_urls limit, you’re only charged for actual validation performed.

Quick Start

Discover URLs from a domain:

curl -X POST "https://scrape.evomi.com/api/v1/scraper/map" \
  -H "x-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "domain": "example.com",
    "max_urls": 100
  }'

Response:

{
  "success": true,
  "domain": "example.com",
  "discovered_count": 87,
  "results": [
    {
      "url": "https://example.com/",
      "source": "sitemap"
    },
    {
      "url": "https://example.com/about",
      "source": "sitemap"
    }
  ],
  "credits_used": 45.5,
  "credits_remaining": 954.5
}

Filter with URL Patterns

Use regex to find specific URLs:

curl -X POST "https://scrape.evomi.com/api/v1/scraper/map" \
  -H "x-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "domain": "example.com",
    "sources": ["sitemap"],
    "url_pattern": "/blog/.*",
    "max_urls": 50
  }'

Use Specific Sources

Sitemap only:

curl -X POST "https://scrape.evomi.com/api/v1/scraper/map" \
  -H "x-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "domain": "example.com",
    "sources": ["sitemap"],
    "max_urls": 100
  }'

Common Crawl only:

curl -X POST "https://scrape.evomi.com/api/v1/scraper/map" \
  -H "x-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "domain": "example.com",
    "sources": ["commoncrawl"],
    "max_urls": 100
  }'

Active crawl:

curl -X POST "https://scrape.evomi.com/api/v1/scraper/map" \
  -H "x-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "domain": "example.com",
    "sources": ["crawl"],
    "depth": 2,
    "max_urls": 100
  }'

Async Mode

For large discoveries, use async mode:

curl -X POST "https://scrape.evomi.com/api/v1/scraper/map" \
  -H "x-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "domain": "example.com",
    "max_urls": 5000,
    "async": true
  }'

Response (202 Accepted):

{
  "success": true,
  "task_id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "processing",
  "message": "Map task submitted for background processing",
  "check_url": "/api/v1/scraper/map/tasks/550e8400-e29b-41d4-a716-446655440000"
}

Check status:

curl "https://scrape.evomi.com/api/v1/scraper/map/tasks/550e8400-e29b-41d4-a716-446655440000" \
  -H "x-api-key: YOUR_API_KEY"

Common Use Cases

Content Audits Discover all pages on your site to ensure completeness or check for broken links.

Competitor Analysis Map out competitor websites—find all product pages, blog posts, or landing pages.

SEO Monitoring Track which URLs are indexed and validate they’re accessible.

Bulk Collection Build a list of URLs to use with other tools or processes.

Base URL

https://scrape.evomi.com

All API requests use this base URL with the /api/v1/scraper/map endpoint.

Error Handling

Status	Meaning
200	Success
202	Async task processing
400	Bad request
401	Unauthorized
402	Insufficient credits
404	Task not found
429	Rate limit exceeded

Next Steps

Usage Examples - More detailed examples
Need to scrape too? Use Domain Crawling instead

⚠️

Start with small max_urls values to understand costs. Use check_if_live: false for initial discovery.