Url Discovery

Our Url Discovery API discovers URLs from any domain. Find and filter thousands of URLs in seconds and optionally validate them all in one request—perfect for content audits, competitor analysis, and bulk data collection.

What is Url Discovery?

Url Discovery finds URLs from a domain without scraping them. Choose how you want to discover URLs: from sitemaps, Common Crawl archive, or by actively crawling the site.

Returns URLs only — no content is scraped. Use this when you just need the URLs. View “Domain Crawling” Tab for domain scraping.

Discovery Methods

Choose one or more discovery sources:

Source Description
sitemap Fast and structured. Automatically discovers sitemaps from common locations, robots.txt, and sitemap indexes.
commoncrawl Historical archive with billions of URLs. Great for finding pages not in sitemaps.
crawl Actively browse the site by following links. Discovers pages dynamically.

You can use one source or combine multiple sources for maximum coverage.

Parameters Overview

Parameter Type Required Description
domain string Yes Domain to discover (e.g., “example.com”)
sources array No Sources to use: ["sitemap"], ["commoncrawl"], ["crawl"], or combinations. Default: ["sitemap", "commoncrawl"]
max_urls integer Yes Maximum URLs to discover (1-10,000)
check_if_live boolean No Validate URLs with HEAD checks. Default: true
url_pattern string No Regex pattern to filter discovered URLs
depth integer No Crawl depth when using crawl source. Default: 1
async boolean No Process in background and return task ID. Default: false

Pricing

Operation Cost
Sitemap discovery 2 credits
Common Crawl discovery 2 credits
Crawl source Scraper’s rates apply
URL validation (per URL) 0.5 credits

Examples:

  • Discover 100 URLs from sitemap + validate = 2 + (100 × 0.5) = 52 credits
  • Discover 500 URLs from both sources + validate = 4 + (500 × 0.5) = 254 credits
  • Discover 1000 URLs, no validation = 2-4 credits
ℹ️
Smart Refunds: If fewer URLs are discovered than your max_urls limit, you’re only charged for actual validation performed.

Quick Start

Discover URLs from a domain:

curl -X POST "https://scrape.evomi.com/api/v1/scraper/map" \
  -H "x-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "domain": "example.com",
    "max_urls": 100
  }'

Response:

{
  "success": true,
  "domain": "example.com",
  "discovered_count": 87,
  "results": [
    {
      "url": "https://example.com/",
      "source": "sitemap"
    },
    {
      "url": "https://example.com/about",
      "source": "sitemap"
    }
  ],
  "credits_used": 45.5,
  "credits_remaining": 954.5
}

Filter with URL Patterns

Use regex to find specific URLs:

curl -X POST "https://scrape.evomi.com/api/v1/scraper/map" \
  -H "x-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "domain": "example.com",
    "sources": ["sitemap"],
    "url_pattern": "/blog/.*",
    "max_urls": 50
  }'

Use Specific Sources

Sitemap only:

curl -X POST "https://scrape.evomi.com/api/v1/scraper/map" \
  -H "x-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "domain": "example.com",
    "sources": ["sitemap"],
    "max_urls": 100
  }'

Common Crawl only:

curl -X POST "https://scrape.evomi.com/api/v1/scraper/map" \
  -H "x-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "domain": "example.com",
    "sources": ["commoncrawl"],
    "max_urls": 100
  }'

Active crawl:

curl -X POST "https://scrape.evomi.com/api/v1/scraper/map" \
  -H "x-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "domain": "example.com",
    "sources": ["crawl"],
    "depth": 2,
    "max_urls": 100
  }'

Async Mode

For large discoveries, use async mode:

curl -X POST "https://scrape.evomi.com/api/v1/scraper/map" \
  -H "x-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "domain": "example.com",
    "max_urls": 5000,
    "async": true
  }'

Response (202 Accepted):

{
  "success": true,
  "task_id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "processing",
  "message": "Map task submitted for background processing",
  "check_url": "/api/v1/scraper/map/tasks/550e8400-e29b-41d4-a716-446655440000"
}

Check status:

curl "https://scrape.evomi.com/api/v1/scraper/map/tasks/550e8400-e29b-41d4-a716-446655440000" \
  -H "x-api-key: YOUR_API_KEY"

Common Use Cases

Content Audits Discover all pages on your site to ensure completeness or check for broken links.

Competitor Analysis Map out competitor websites—find all product pages, blog posts, or landing pages.

SEO Monitoring Track which URLs are indexed and validate they’re accessible.

Bulk Collection Build a list of URLs to use with other tools or processes.

Base URL

https://scrape.evomi.com

All API requests use this base URL with the /api/v1/scraper/map endpoint.

Error Handling

Status Meaning
200 Success
202 Async task processing
400 Bad request
401 Unauthorized
402 Insufficient credits
404 Task not found
429 Rate limit exceeded

Next Steps

⚠️
Start with small max_urls values to understand costs. Use check_if_live: false for initial discovery.