Core

Scraper API results are only stored temporarily by us and are removed after delivery. To ensure long-term data persistence and seamless data pipelines, we can upload results directly to cloud storage upon completion.

Retrieve your scraped results directly in Amazon S3, Google Cloud Storage (GCS), Azure Blob Storage, or any S3-compatible cloud provider.

Storage Management

Manage your storage configuration through our REST API endpoints. Instead of defining your storage logic with every request, you’ll save a configuration once and reference it by ID.

⚠️
Note: All sensitive credentials (access_key, secret_key) are encrypted when stored.

Endpoint Reference

Endpoint Method Action
/api/v1/account/storage POST Create a new storage configuration
/api/v1/account/storage GET List all saved storage configurations
/api/v1/account/storage/{storage_id} PUT Update an existing storage configuration
/api/v1/account/storage/{storage_id} DELETE Remove a storage configuration

Authentication: Include your API key in the x-api-key header for all requests.


Storage Creation

Parameter Type Description
name String A unique identifier for your storage config.
storage_type String The provider type: s3_compatible, gcs, or azure_blob.
config Object Provider-specific credentials. View “Storage Providers” Tab.
set_as_default Boolean Whether to set this configuration as the default.
 {
  "name": "My AWS S3",
  "storage_type": "s3_compatible",
  "config": {
      "bucket": "my-bucket",
      "region": "us-east-1",
      "access_key": "AKIAIOSFODNN7EXAMPLE",
      "secret_key": "wJalrXUt...",
      "path_prefix": "scrapers/",
      "public_read": false
  },
  "set_as_default": false
}

Scraper API Integration

Specify your storage destination during the scrape request via query parameters:

Option A: Use Default Storage ID

?url=https://example.com&use_default_storage=true

Option B: Specify Storage ID

?url=https://example.com&storage_id={encoded_id}

Option C: Temporary Storage

?url=https://example.com

When using cloud storage, we will upload what you requested. Understanding how different output types are handled ensures you get your data exactly where you need it.

Delivery Modes

The Scraper Api’s delivery parameter controls how results are formatted and uploaded to storage:

Mode Description Storage Upload
raw Returns content directly (HTML, markdown, AI response, etc.) Uploads the raw response. Scraper Api’s ‘content=’ parameter
json Returns structured JSON envelope with all metadata Uploads complete result as JSON

Examples

Screenshots

?url=https://example.com&delivery=raw&use_default_storage=true&content=screenshot

Uploads: Screenshot as .png

Markdown

?url=https://example.com&delivery=raw&use_default_storage=true&content=markdown

Uploads: Markdown content as .md

Default HTML

?url=https://example.com&delivery=raw&use_default_storage=true

Uploads: HTML content as .html

File Name Templateing

Organize your scraped data at scale using dynamic path templates. Instead of manually organizing files, use template variables to automatically structure your storage based on domains, dates, and other metadata.

Template Variables

Template variables use double curly brace syntax: {{variable_name}}

Variable Description Example Output
{{task_id}} Unique task identifier abc123xyz
{{url}} Full URL being scraped https://example.com/page
{{domain}} Extracted domain name example.com
{{status}} Task completion status success, failure
{{timestamp}} Full timestamp (YYYYMMDD_HHMMSS) 20260205_141530
{{date}} Date only (YYYY-MM-DD) 2026-02-05
{{time}} Time only (HH-MM-SS) 14-15-30
{{year}} Four-digit year 2026
{{month}} Two-digit month 02
{{day}} Two-digit day 05
{{extension}} File extension based on output type json, png, pdf, html, md, txt

Template Examples

Domain & Date Organization

{
  "path_prefix": "scrapes/{{domain}}/{{date}}/{{task_id}}.{{extension}}"
}

Result: scrapes/example.com/2026-02-05/abc123xyz.json

Year/Month/Day Hierarchy

{
  "path_prefix": "data/{{year}}/{{month}}/{{day}}/{{domain}}_{{task_id}}.{{extension}}"
}

Result: data/2026/02/05/example.com_abc123xyz.json

Status-based Filtering

{
  "path_prefix": "results/{{status}}/{{domain}}/{{timestamp}}.{{extension}}"
}

Result: results/success/example.com/20260205_141530.json

Simple Domain Organization

{
  "path_prefix": "{{domain}}/{{task_id}}.{{extension}}"
}

Result: example.com/abc123xyz.json

Time-series Data Collection

{
  "path_prefix": "timeseries/{{domain}}/{{year}}-{{month}}/{{day}}_{{time}}.{{extension}}"
}

Result: timeseries/example.com/2026-02/05_14-15-30.json