Usage Examples

Practical examples showing request payloads and response formats for common Domain Crawling scenarios. All examples use the endpoint:

POST https://scrape.evomi.com/api/v1/scraper/crawl

ℹ️

Include your API key in the header: x-api-key: YOUR_API_KEY

Basic URL Discovery

Example 1: Discover URLs from Sitemap

Find all URLs from a domain’s sitemap without validation.

Request:

{
  "domain": "example.com",
  "sources": ["sitemap"],
  "max_urls": 100,
  "check_if_live": false
}

Response:

{
  "success": true,
  "domain": "example.com",
  "discovered_count": 87,
  "results": [
    {
      "url": "https://example.com/",
      "source": "sitemap"
    },
    {
      "url": "https://example.com/about",
      "source": "sitemap"
    },
    {
      "url": "https://example.com/products",
      "source": "sitemap"
    },
    {
      "url": "https://example.com/blog/post-1",
      "source": "sitemap"
    }
  ],
  "credits_used": 2.0,
  "credits_remaining": 998.0
}

Example 2: Discover from Both Sources

Use both sitemap and Common Crawl for maximum coverage.

Request:

{
  "domain": "example.com",
  "sources": ["sitemap", "commoncrawl"],
  "max_urls": 500,
  "check_if_live": false
}

Response:

{
  "success": true,
  "domain": "example.com",
  "discovered_count": 463,
  "results": [
    {
      "url": "https://example.com/",
      "source": "sitemap"
    },
    {
      "url": "https://example.com/old-page",
      "source": "commoncrawl"
    },
    {
      "url": "https://example.com/archived-content",
      "source": "commoncrawl"
    }
  ],
  "credits_used": 4.0,
  "credits_remaining": 996.0
}

URL Validation

Example 3: Discover and Validate URLs

Check that discovered URLs are still live and accessible.

Request:

{
  "domain": "example.com",
  "sources": ["sitemap"],
  "max_urls": 100,
  "check_if_live": true
}

Response:

{
  "success": true,
  "domain": "example.com",
  "discovered_count": 94,
  "results": [
    {
      "url": "https://example.com/",
      "source": "sitemap"
    },
    {
      "url": "https://example.com/about",
      "source": "sitemap"
    },
    {
      "url": "https://example.com/products",
      "source": "sitemap"
    }
  ],
  "credits_used": 49.0,
  "credits_remaining": 951.0
}

ℹ️

Note: Only 94 of 100 discovered URLs passed the live check. Validation costs: 2 credits (sitemap) + 47 credits (94 × 0.5) = 49 credits total.

Pattern Filtering

Example 4: Filter for Blog Posts Only

Use regex to find only blog post URLs.

Request:

{
  "domain": "example.com",
  "sources": ["sitemap"],
  "url_pattern": "/blog/[^/]+/?$",
  "max_urls": 50,
  "check_if_live": true
}

Response:

{
  "success": true,
  "domain": "example.com",
  "discovered_count": 42,
  "results": [
    {
      "url": "https://example.com/blog/introduction-to-apis",
      "source": "sitemap"
    },
    {
      "url": "https://example.com/blog/advanced-scraping-techniques",
      "source": "sitemap"
    },
    {
      "url": "https://example.com/blog/proxy-best-practices",
      "source": "sitemap"
    }
  ],
  "credits_used": 23.0,
  "credits_remaining": 977.0
}

Example 5: Filter for Product Pages

Target product URLs with specific patterns.

Request:

{
  "domain": "shop.example.com",
  "sources": ["sitemap", "commoncrawl"],
  "url_pattern": "/products?/[a-z0-9-]+",
  "max_urls": 200,
  "check_if_live": true
}

Response:

{
  "success": true,
  "domain": "shop.example.com",
  "discovered_count": 187,
  "results": [
    {
      "url": "https://shop.example.com/product/laptop-stand",
      "source": "sitemap"
    },
    {
      "url": "https://shop.example.com/products/wireless-mouse",
      "source": "sitemap"
    },
    {
      "url": "https://shop.example.com/product/usb-hub",
      "source": "commoncrawl"
    }
  ],
  "credits_used": 97.5,
  "credits_remaining": 902.5
}

Asynchronous Processing

Example 6: Large Domain (Async Mode)

Process large domains in the background and poll for results.

Request:

{
  "domain": "largecorp.com",
  "sources": ["sitemap", "commoncrawl"],
  "max_urls": 5000,
  "check_if_live": true,
  "async": true
}

Response (202 Accepted):

{
  "success": true,
  "task_id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "processing",
  "message": "Crawl task submitted for background processing",
  "check_url": "/api/v1/scraper/crawl/tasks/550e8400-e29b-41d4-a716-446655440000",
  "credits_reserved": 2504.0
}

Check Status:

GET https://scrape.evomi.com/api/v1/scraper/crawl/tasks/550e8400-e29b-41d4-a716-446655440000

Status Response (Still Processing):

{
  "success": true,
  "task_id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "processing",
  "message": "Task is still processing"
}

Status Response (Completed):

{
  "success": true,
  "domain": "largecorp.com",
  "discovered_count": 4732,
  "results": [
    {
      "url": "https://largecorp.com/",
      "source": "sitemap"
    }
  ],
  "credits_used": 2370.0,
  "credits_remaining": 7630.0
}

Integrated Scraping

Example 7: Discover and Scrape Automatically

Combine URL discovery with automatic scraping in one request.

Request:

{
  "domain": "blog.example.com",
  "sources": ["sitemap"],
  "max_urls": 20,
  "check_if_live": true,
  "scraper_config": {
    "mode": "auto",
    "content": "markdown",
    "proxy_type": "residential",
    "proxy_country": "US"
  }
}

Response:

{
  "success": true,
  "domain": "blog.example.com",
  "discovered_count": 18,
  "scraper_tasks_submitted": 18,
  "results": [
    {
      "url": "https://blog.example.com/post-1",
      "source": "sitemap",
      "scrape_task_id": "a1b2c3d4-e5f6-4789-0abc-def123456789",
      "scrape_check_url": "/api/v1/scraper/tasks/a1b2c3d4-e5f6-4789-0abc-def123456789"
    },
    {
      "url": "https://blog.example.com/post-2",
      "source": "sitemap",
      "scrape_task_id": "b2c3d4e5-f6a7-89b0-1cde-f234567890ab",
      "scrape_check_url": "/api/v1/scraper/tasks/b2c3d4e5-f6a7-89b0-1cde-f234567890ab"
    }
  ],
  "credits_used": 11.0,
  "credits_remaining": 989.0
}

⚠️

Scraping Credits: The response shows only discovery costs (11 credits). Each scrape job is billed separately. Check individual scrape task status endpoints to see scraping costs.

Example 8: Scrape with AI Enhancement

Discover URLs and automatically extract structured data using AI.

Request:

{
  "domain": "news.example.com",
  "sources": ["sitemap"],
  "url_pattern": "/articles/",
  "max_urls": 10,
  "check_if_live": true,
  "async": true,
  "scraper_config": {
    "mode": "auto",
    "content": "markdown",
    "ai_enhance": true,
    "ai_source": "markdown",
    "ai_prompt": "Extract: headline, author, publish_date, summary (2 sentences), tags. Return as JSON."
  }
}

Response (202 Accepted):

{
  "success": true,
  "task_id": "c3d4e5f6-a7b8-90c1-2def-3456789abcde",
  "status": "processing",
  "message": "Crawl task submitted for background processing",
  "check_url": "/api/v1/scraper/crawl/tasks/c3d4e5f6-a7b8-90c1-2def-3456789abcde",
  "credits_reserved": 7.0
}

Completed Response:

{
  "success": true,
  "domain": "news.example.com",
  "discovered_count": 8,
  "scraper_tasks_submitted": 8,
  "results": [
    {
      "url": "https://news.example.com/articles/tech-news-2026",
      "source": "sitemap",
      "scrape_task_id": "d4e5f6a7-b8c9-01d2-3ef4-56789abcdef0",
      "scrape_check_url": "/api/v1/scraper/tasks/d4e5f6a7-b8c9-01d2-3ef4-56789abcdef0"
    }
  ],
  "credits_used": 6.0,
  "credits_remaining": 994.0
}

Real-World Scenarios

Example 9: Content Audit - Find All Site Pages

Comprehensive site audit for SEO and content management.

Request:

{
  "domain": "mycompany.com",
  "sources": ["sitemap", "commoncrawl"],
  "max_urls": 1000,
  "check_if_live": true,
  "async": true
}

Response (202 Accepted):

{
  "success": true,
  "task_id": "e5f6a7b8-c9d0-12e3-4f56-789abcdef012",
  "status": "processing",
  "message": "Crawl task submitted for background processing",
  "check_url": "/api/v1/scraper/crawl/tasks/e5f6a7b8-c9d0-12e3-4f56-789abcdef012",
  "credits_reserved": 504.0
}

Example 10: Competitor Analysis - Product Pages

Discover all product pages from a competitor’s site.

Request:

{
  "domain": "competitor.com",
  "sources": ["sitemap"],
  "url_pattern": "/(products?|shop|store)/",
  "max_urls": 500,
  "check_if_live": true
}

Response:

{
  "success": true,
  "domain": "competitor.com",
  "discovered_count": 387,
  "results": [
    {
      "url": "https://competitor.com/products/item-1",
      "source": "sitemap"
    },
    {
      "url": "https://competitor.com/shop/category/electronics",
      "source": "sitemap"
    }
  ],
  "credits_used": 195.5,
  "credits_remaining": 804.5
}

Example 11: Archive Research - Historical URLs

Find historical URLs no longer in active sitemaps.

Request:

{
  "domain": "oldsite.com",
  "sources": ["commoncrawl"],
  "max_urls": 1000,
  "check_if_live": false
}

Response:

{
  "success": true,
  "domain": "oldsite.com",
  "discovered_count": 856,
  "results": [
    {
      "url": "https://oldsite.com/archived-page-2020",
      "source": "commoncrawl"
    },
    {
      "url": "https://oldsite.com/removed-content",
      "source": "commoncrawl"
    }
  ],
  "credits_used": 2.0,
  "credits_remaining": 998.0
}

ℹ️

No Validation: Historical URLs often no longer exist. Set check_if_live: false to avoid validation costs when researching archives.

Example 12: Bulk Scraping Workflow

Complete workflow: discover, validate, then scrape product data.

Request:

{
  "domain": "ecommerce.example.com",
  "sources": ["sitemap"],
  "url_pattern": "/products/[a-z0-9-]+$",
  "max_urls": 100,
  "check_if_live": true,
  "async": true,
  "scraper_config": {
    "mode": "auto",
    "proxy_type": "residential",
    "proxy_country": "US",
    "extract_scheme": [
      {
        "label": "product_details",
        "selector": ".product-info",
        "type": "nest",
        "fields": [
          {
            "label": "title",
            "selector": "h1.product-title",
            "type": "content"
          },
          {
            "label": "price",
            "selector": ".price",
            "type": "content"
          },
          {
            "label": "in_stock",
            "selector": "button.add-to-cart",
            "type": "exists"
          }
        ]
      }
    ]
  }
}

Response (202 Accepted):

{
  "success": true,
  "task_id": "f6a7b8c9-d0e1-23f4-5678-9abcdef01234",
  "status": "processing",
  "message": "Crawl task submitted for background processing",
  "check_url": "/api/v1/scraper/crawl/tasks/f6a7b8c9-d0e1-23f4-5678-9abcdef01234",
  "credits_reserved": 52.0
}

Error Responses

Insufficient Credits

Response (402 Payment Required):

{
  "success": false,
  "error": "Insufficient credits",
  "error_code": "INSUFFICIENT_CREDITS",
  "message": "Your account has insufficient credits. Required: 254.0, Available: 100.0",
  "credits_required": 254.0,
  "credits_available": 100.0
}

Invalid Domain

Response (400 Bad Request):

{
  "success": false,
  "error": "Invalid request parameters",
  "error_code": "VALIDATION_ERROR",
  "message": "domain is required and must be a valid domain name"
}

Invalid Pattern

Response (400 Bad Request):

{
  "success": false,
  "error": "Invalid regex pattern",
  "error_code": "VALIDATION_ERROR",
  "message": "url_pattern contains invalid regex syntax"
}

Best Practices

1. Start Small

Test with small max_urls values to understand costs:

{
  "domain": "example.com",
  "sources": ["sitemap"],
  "max_urls": 10,
  "check_if_live": false
}

2. Selective Validation

Discover first without validation, then validate only needed URLs:

{
  "domain": "example.com",
  "sources": ["sitemap"],
  "max_urls": 1000,
  "check_if_live": false
}

3. Use Patterns Efficiently

Filter during discovery, not after:

{
  "domain": "blog.example.com",
  "url_pattern": "/blog/[0-9]{4}/",
  "max_urls": 100
}

4. Async for Large Jobs

Use async mode for 1000+ URLs or when scraping:

{
  "domain": "largecorp.com",
  "max_urls": 5000,
  "async": true
}

5. Monitor Credits

Check response headers and JSON for credit usage:

X-Credits-Used: 52.0
X-Credits-Remaining: 948.0

⚠️

Always test discovery costs before enabling scraping. Discovery is cheap (2-4 credits + validation), but scraping 1000 URLs can consume significant credits.