Usage Examples

Scraper API

Tools

Url Discovery

Usage Examples

Practical examples showing request payloads and response formats for common Url Discovery scenarios. Use the endpoint:

POST https://scrape.evomi.com/api/v1/scraper/map

ℹ️

Include your API key in the header: x-api-key: YOUR_API_KEY

Basic URL Discovery

Example 1: Discover URLs from Sitemap

Find all URLs from a domain’s sitemap without validation.

Request:

{
  "domain": "example.com",
  "sources": ["sitemap"],
  "max_urls": 100,
  "check_if_live": false
}

Response:

{
  "success": true,
  "domain": "example.com",
  "discovered_count": 87,
  "results": [
    {
      "url": "https://example.com/",
      "source": "sitemap"
    },
    {
      "url": "https://example.com/about",
      "source": "sitemap"
    },
    {
      "url": "https://example.com/products",
      "source": "sitemap"
    },
    {
      "url": "https://example.com/blog/post-1",
      "source": "sitemap"
    }
  ],
  "credits_used": 2.0,
  "credits_remaining": 998.0
}

Example 2: Discover from Both Sources

Use both sitemap and Common Crawl for maximum coverage.

Request:

{
  "domain": "example.com",
  "sources": ["sitemap", "commoncrawl"],
  "max_urls": 500,
  "check_if_live": false
}

Response:

{
  "success": true,
  "domain": "example.com",
  "discovered_count": 463,
  "results": [
    {
      "url": "https://example.com/",
      "source": "sitemap"
    },
    {
      "url": "https://example.com/old-page",
      "source": "commoncrawl"
    },
    {
      "url": "https://example.com/archived-content",
      "source": "commoncrawl"
    }
  ],
  "credits_used": 4.0,
  "credits_remaining": 996.0
}

URL Validation

Example 3: Discover and Validate URLs

Check that discovered URLs are still live and accessible.

Request:

{
  "domain": "example.com",
  "sources": ["sitemap"],
  "max_urls": 100,
  "check_if_live": true
}

Response:

{
  "success": true,
  "domain": "example.com",
  "discovered_count": 94,
  "results": [
    {
      "url": "https://example.com/",
      "source": "sitemap"
    },
    {
      "url": "https://example.com/about",
      "source": "sitemap"
    },
    {
      "url": "https://example.com/products",
      "source": "sitemap"
    }
  ],
  "credits_used": 49.0,
  "credits_remaining": 951.0
}

ℹ️

Note: Only 94 of 100 discovered URLs passed the live check. Validation costs: 2 credits (sitemap) + 47 credits (94 × 0.5) = 49 credits total.

Pattern Filtering

Example 4: Filter for Blog Posts Only

Use regex to find only blog post URLs.

Request:

{
  "domain": "example.com",
  "sources": ["sitemap"],
  "url_pattern": "/blog/[^/]+/?$",
  "max_urls": 50,
  "check_if_live": true
}

Response:

{
  "success": true,
  "domain": "example.com",
  "discovered_count": 42,
  "results": [
    {
      "url": "https://example.com/blog/introduction-to-apis",
      "source": "sitemap"
    },
    {
      "url": "https://example.com/blog/advanced-scraping-techniques",
      "source": "sitemap"
    },
    {
      "url": "https://example.com/blog/proxy-best-practices",
      "source": "sitemap"
    }
  ],
  "credits_used": 23.0,
  "credits_remaining": 977.0
}

Example 5: Filter for Product Pages

Target product URLs with specific patterns.

Request:

{
  "domain": "shop.example.com",
  "sources": ["sitemap", "commoncrawl"],
  "url_pattern": "/products?/[a-z0-9-]+",
  "max_urls": 200,
  "check_if_live": true
}

Response:

{
  "success": true,
  "domain": "shop.example.com",
  "discovered_count": 187,
  "results": [
    {
      "url": "https://shop.example.com/product/laptop-stand",
      "source": "sitemap"
    },
    {
      "url": "https://shop.example.com/products/wireless-mouse",
      "source": "sitemap"
    },
    {
      "url": "https://shop.example.com/product/usb-hub",
      "source": "commoncrawl"
    }
  ],
  "credits_used": 97.5,
  "credits_remaining": 902.5
}

Asynchronous Processing

Example 6: Large Domain (Async Mode)

Process large domains in the background and poll for results.

Request:

{
  "domain": "largecorp.com",
  "sources": ["sitemap", "commoncrawl"],
  "max_urls": 5000,
  "check_if_live": true,
  "async": true
}

Response (202 Accepted):

{
  "success": true,
  "task_id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "processing",
  "message": "Crawl task submitted for background processing",
  "check_url": "/api/v1/scraper/map/tasks/550e8400-e29b-41d4-a716-446655440000",
  "credits_reserved": 2504.0
}

Check Status:

GET https://scrape.evomi.com/api/v1/scraper/map/tasks/550e8400-e29b-41d4-a716-446655440000

Status Response (Still Processing):

{
  "success": true,
  "task_id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "processing",
  "message": "Task is still processing"
}

Status Response (Completed):

{
  "success": true,
  "domain": "largecorp.com",
  "discovered_count": 4732,
  "results": [
    {
      "url": "https://largecorp.com/",
      "source": "sitemap"
    }
  ],
  "credits_used": 2370.0,
  "credits_remaining": 7630.0
}

Real-World Scenarios

Example 7: Content Audit - Find All Site Pages

Comprehensive site audit for SEO and content management.

Request:

{
  "domain": "mycompany.com",
  "sources": ["sitemap", "commoncrawl"],
  "max_urls": 1000,
  "check_if_live": true,
  "async": true
}

Response (202 Accepted):

{
  "success": true,
  "task_id": "e5f6a7b8-c9d0-12e3-4f56-789abcdef012",
  "status": "processing",
  "message": "Map task submitted for background processing",
  "check_url": "/api/v1/scraper/map/tasks/e5f6a7b8-c9d0-12e3-4f56-789abcdef012",
  "credits_reserved": 504.0
}

Example 8: Competitor Analysis - Product Pages

Discover all product pages from a competitor’s site.

Request:

{
  "domain": "competitor.com",
  "sources": ["sitemap"],
  "url_pattern": "/(products?|shop|store)/",
  "max_urls": 500,
  "check_if_live": true
}

Response:

{
  "success": true,
  "domain": "competitor.com",
  "discovered_count": 387,
  "results": [
    {
      "url": "https://competitor.com/products/item-1",
      "source": "sitemap"
    },
    {
      "url": "https://competitor.com/shop/category/electronics",
      "source": "sitemap"
    }
  ],
  "credits_used": 195.5,
  "credits_remaining": 804.5
}

Example 9: Archive Research - Historical URLs

Find historical URLs no longer in active sitemaps.

Request:

{
  "domain": "oldsite.com",
  "sources": ["commoncrawl"],
  "max_urls": 1000,
  "check_if_live": false
}

Response:

{
  "success": true,
  "domain": "oldsite.com",
  "discovered_count": 856,
  "results": [
    {
      "url": "https://oldsite.com/archived-page-2020",
      "source": "commoncrawl"
    },
    {
      "url": "https://oldsite.com/removed-content",
      "source": "commoncrawl"
    }
  ],
  "credits_used": 2.0,
  "credits_remaining": 998.0
}

Error Responses

Insufficient Credits

Response (402 Payment Required):

{
  "success": false,
  "error": "Insufficient credits",
  "error_code": "INSUFFICIENT_CREDITS",
  "message": "Your account has insufficient credits. Required: 254.0, Available: 100.0",
  "credits_required": 254.0,
  "credits_available": 100.0
}

Invalid Domain

Response (400 Bad Request):

{
  "success": false,
  "error": "Invalid request parameters",
  "error_code": "VALIDATION_ERROR",
  "message": "domain is required and must be a valid domain name"
}

Invalid Pattern

Response (400 Bad Request):

{
  "success": false,
  "error": "Invalid regex pattern",
  "error_code": "VALIDATION_ERROR",
  "message": "url_pattern contains invalid regex syntax"
}

Best Practices

1. Start Small

Test with small max_urls values to understand costs:

{
  "domain": "example.com",
  "sources": ["sitemap"],
  "max_urls": 10,
  "check_if_live": false
}

2. Selective Validation

Discover first without validation, then validate only needed URLs:

{
  "domain": "example.com",
  "sources": ["sitemap"],
  "max_urls": 1000,
  "check_if_live": false
}

3. Use Patterns Efficiently

Filter during discovery, not after:

{
  "domain": "blog.example.com",
  "url_pattern": "/blog/[0-9]{4}/",
  "max_urls": 100
}

4. Async for Large Jobs

Use async mode for 1000+ URLs or when scraping:

{
  "domain": "largecorp.com",
  "max_urls": 5000,
  "async": true
}

5. Monitor Credits

Check response headers and JSON for credit usage:

X-Credits-Used: 52.0
X-Credits-Remaining: 948.0

⚠️

Always test discovery costs before enabling scraping. Discovery is cheap (2-4 credits + validation), but discovering and validating 1000 URLs can consume significant credits.