Result Polling

Scraper API

Result Polling

When scraping tasks take longer than the timeout limit or when you submit async requests, you’ll need to poll for results. This guide covers how to retrieve results from background tasks.

When Polling Is Needed

Polling is required in these scenarios:

Explicit async requests — You set async=true in your request
Request timeouts — A synchronous request exceeds the timeout limit and auto-converts to async

Timeout Handling

Different modes have different timeout limits:

Mode	Timeout	What Happens After
`request`	30 seconds	Auto-converts to async task
`browser`	45 seconds	Auto-converts to async task
`auto`	30-45 seconds	Depends on mode used

Timeout Response

When a synchronous request times out, you receive a 202 Accepted response:

{
  "success": true,
  "task_id": "task_abc123",
  "status": "processing",
  "message": "Task is taking longer than expected. Use task_id to check status.",
  "check_url": "/api/v1/scraper/tasks/task_abc123"
}

Submitting Async Requests

For long-running or batch jobs, submit async requests from the start:

Request

curl -X POST "https://scrape.evomi.com/api/v1/scraper/realtime" \
  -H "x-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "async": true
  }'

Response

{
  "task_id": "task_abc123",
  "status": "processing",
  "check_url": "/api/v1/scraper/tasks/task_abc123"
}

Checking Task Status

Request

curl "https://scrape.evomi.com/api/v1/scraper/tasks/task_abc123?api_key=YOUR_API_KEY"

Response (Processing)

{
  "task_id": "task_abc123",
  "status": "processing",
  "created_at": "2025-01-15T10:30:00Z",
  "elapsed_seconds": 12
}

Response (Completed)

{
  "task_id": "task_abc123",
  "status": "completed",
  "success": true,
  "url": "https://example.com",
  "domain": "example.com",
  "title": "Example Domain",
  "content": "<!DOCTYPE html>...",
  "status_code": 200,
  "credits_used": 1.0,
  "credits_remaining": 99.0,
  "mode_used": "auto (request)"
}

Response (Failed)

{
  "task_id": "task_abc123",
  "status": "failed",
  "success": false,
  "error": "Connection timeout after 30 seconds",
  "credits_used": 0.5
}

ℹ️

Use Cloud Storage otherwise task results are stored on our servers for a limited time and expire after 10 minutes. If not using cloud storage retrieve your results promptly after completion.

Polling for Results

Python Example

import time
import requests

def wait_for_task(task_id, api_key, max_wait=120, poll_interval=2):
    """Poll task status until completion"""
    url = f"https://scrape.evomi.com/api/v1/scraper/tasks/{task_id}"
    headers = {"x-api-key": api_key}
    
    start_time = time.time()
    
    while time.time() - start_time < max_wait:
        response = requests.get(url, headers=headers)
        data = response.json()
        
        status = data.get("status")
        
        if status == "completed":
            return data
        elif status == "failed":
            raise Exception(f"Task failed: {data.get('error')}")
        
        # Still processing
        time.sleep(poll_interval)
    
    raise TimeoutError(f"Task {task_id} did not complete in {max_wait} seconds")

# Usage
result = wait_for_task("task_abc123", api_key)
print(result["content"])

JavaScript Example

async function waitForTask(taskId, apiKey, maxWait = 120000, pollInterval = 2000) {
  const url = `https://scrape.evomi.com/api/v1/scraper/tasks/${taskId}`;
  const headers = { 'x-api-key': apiKey };
  
  const startTime = Date.now();
  
  while (Date.now() - startTime < maxWait) {
    const response = await fetch(url, { headers });
    const data = await response.json();
    
    if (data.status === 'completed') {
      return data;
    } else if (data.status === 'failed') {
      throw new Error(`Task failed: ${data.error}`);
    }
    
    await new Promise(resolve => setTimeout(resolve, pollInterval));
  }
  
  throw new Error(`Task ${taskId} did not complete in time`);
}

// Usage
const result = await waitForTask('task_abc123', apiKey);
console.log(result.content);

Go Example

func WaitForTask(taskID, apiKey string, maxWait time.Duration) (map[string]interface{}, error) {
    url := fmt.Sprintf("https://scrape.evomi.com/api/v1/scraper/tasks/%s?api_key=%s", taskID, apiKey)
    
    startTime := time.Now()
    pollInterval := 2 * time.Second
    
    for time.Since(startTime) < maxWait {
        resp, err := http.Get(url)
        if err != nil {
            return nil, err
        }
        
        var data map[string]interface{}
        json.NewDecoder(resp.Body).Decode(&data)
        resp.Body.Close()
        
        status := data["status"].(string)
        
        if status == "completed" {
            return data, nil
        } else if status == "failed" {
            return nil, fmt.Errorf("task failed: %v", data["error"])
        }
        
        time.Sleep(pollInterval)
    }
    
    return nil, fmt.Errorf("task did not complete in %v", maxWait)
}

Raw vs JSON Responses

When polling for results, the response format depends on the delivery parameter you set in your original request.

Raw Delivery (Default)

If you used delivery=raw (or didn’t specify delivery), the polled result returns the raw content directly:

curl "https://scrape.evomi.com/api/v1/scraper/tasks/task_abc123?api_key=YOUR_API_KEY"

Response:

Content-Type: text/html; charset=utf-8
X-Credits-Used: 1.0
X-Credits-Remaining: 99.0

<!DOCTYPE html>
<html>
...

⚠️

With raw delivery, you get only the content—no metadata like title, status code, or credits used in the response body. Metadata is available only in response headers.

JSON Delivery

If you used delivery=json, the polled result returns a structured JSON response with metadata:

curl "https://scrape.evomi.com/api/v1/scraper/tasks/task_abc123?api_key=YOUR_API_KEY"

Response:

{
  "task_id": "task_abc123",
  "status": "completed",
  "success": true,
  "url": "https://example.com",
  "title": "Example Domain",
  "status_code": 200,
  "credits_used": 1.0
}

⚠️

With JSON delivery, you must set include_content=true in your original request to receive the scraped content. Without this parameter, only metadata is returned—no HTML, Markdown, or other content.

Using Delivery and Include_content

When using JSON delivery mode, you must set include_content=true to receive the scraped content in the response.

Without include_content

curl "https://scrape.evomi.com/api/v1/scraper/realtime?url=https://example.com&delivery=json&api_key=YOUR_API_KEY"

Response omits content:

{
  "success": true,
  "url": "https://example.com",
  "title": "Example Domain",
  "status_code": 200,
  "credits_used": 1.0,
  "hints": ["Content omitted by default. Set include_content=true to include it."]
}

With include_content

curl "https://scrape.evomi.com/api/v1/scraper/realtime?url=https://example.com&delivery=json&include_content=true&api_key=YOUR_API_KEY"

Response includes content:

{
  "success": true,
  "url": "https://example.com",
  "title": "Example Domain",
  "content": "<!DOCTYPE html>...",
  "status_code": 200,
  "credits_used": 1.0
}

⚠️

Important: When using delivery=json, always set include_content=true if you need the scraped content (HTML, Markdown, etc.) in the response. Without this parameter, only metadata is returned to save bandwidth.

Batch Processing with Async

For large batches, submit all tasks first, then poll for results:

import asyncio
import aiohttp

async def submit_and_wait(urls, api_key):
    async with aiohttp.ClientSession() as session:
        # Submit all tasks
        task_ids = []
        for url in urls:
            async with session.post(
                "https://scrape.evomi.com/api/v1/scraper/realtime",
                headers={"x-api-key": api_key},
                json={"url": url, "async": True}
            ) as resp:
                data = await resp.json()
                task_ids.append(data["task_id"])
        
        # Poll for all results
        results = []
        for task_id in task_ids:
            result = await poll_task(session, task_id, api_key)
            results.append(result)
        
        return results

async def poll_task(session, task_id, api_key, max_wait=120):
    url = f"https://scrape.evomi.com/api/v1/scraper/tasks/{task_id}"
    headers = {"x-api-key": api_key}
    
    start = asyncio.get_event_loop().time()
    
    while asyncio.get_event_loop().time() - start < max_wait:
        async with session.get(url, headers=headers) as resp:
            data = await resp.json()
            
            if data["status"] == "completed":
                return data
            elif data["status"] == "failed":
                raise Exception(f"Task failed: {data.get('error')}")
        
        await asyncio.sleep(2)
    
    raise TimeoutError(f"Task {task_id} timed out")

# Usage
urls = ["https://example1.com", "https://example2.com", "https://example3.com"]
results = asyncio.run(submit_and_wait(urls, api_key))

Webhook Alternative

Instead of polling, you can use webhooks to receive notifications when tasks complete. See Webhooks for details.

{
  "url": "https://example.com",
  "async": true,
  "webhook": {
    "url": "https://your-server.com/webhook",
    "webhook_type": "custom",
    "events": ["completed", "failed"]
  }
}

Best Practices

Use appropriate poll intervals — Poll every 2-3 seconds, not faster
Set reasonable timeouts — Most tasks complete within 60 seconds
Handle failures gracefully — Check the error field in failed responses
Retrieve results promptly — Results expire after 10 minutes
Use webhooks for batches — Avoid polling hundreds of tasks simultaneously

Error Handling Usage Examples