Core
Scraper API results are only stored temporarily by us and are removed after delivery. To ensure long-term data persistence and seamless data pipelines, we can upload results directly to cloud storage upon completion.
Retrieve your scraped results directly in Amazon S3, Google Cloud Storage (GCS), Azure Blob Storage, or any S3-compatible cloud provider.
Storage Management
Manage your storage configuration through our REST API endpoints. Instead of defining your storage logic with every request, you’ll save a configuration once and reference it by ID.
access_key, secret_key) are encrypted when stored.Endpoint Reference
| Endpoint | Method | Action |
|---|---|---|
/api/v1/account/storage |
POST |
Create a new storage configuration |
/api/v1/account/storage |
GET |
List all saved storage configurations |
/api/v1/account/storage/{storage_id} |
PUT |
Update an existing storage configuration |
/api/v1/account/storage/{storage_id} |
DELETE |
Remove a storage configuration |
Authentication: Include your API key in the x-api-key header for all requests.
Storage Creation
| Parameter | Type | Description |
|---|---|---|
name |
String | A unique identifier for your storage config. |
storage_type |
String | The provider type: s3_compatible, gcs, or azure_blob. |
config |
Object | Provider-specific credentials. View “Storage Providers” Tab. |
set_as_default |
Boolean | Whether to set this configuration as the default. |
{
"name": "My AWS S3",
"storage_type": "s3_compatible",
"config": {
"bucket": "my-bucket",
"region": "us-east-1",
"access_key": "AKIAIOSFODNN7EXAMPLE",
"secret_key": "wJalrXUt...",
"path_prefix": "scrapers/",
"public_read": false
},
"set_as_default": false
}Scraper API Integration
Specify your storage destination during the scrape request via query parameters:
Option A: Use Default Storage ID
?url=https://example.com&use_default_storage=trueOption B: Specify Storage ID
?url=https://example.com&storage_id={encoded_id}Option C: Temporary Storage
?url=https://example.comWhen using cloud storage, we will upload what you requested. Understanding how different output types are handled ensures you get your data exactly where you need it.
Delivery Modes
The Scraper Api’s delivery parameter controls how results are formatted and uploaded to storage:
| Mode | Description | Storage Upload |
|---|---|---|
raw |
Returns content directly (HTML, markdown, AI response, etc.) | Uploads the raw response. Scraper Api’s ‘content=’ parameter |
json |
Returns structured JSON envelope with all metadata | Uploads complete result as JSON |
Examples
Screenshots
?url=https://example.com&delivery=raw&use_default_storage=true&content=screenshotUploads: Screenshot as .png
Markdown
?url=https://example.com&delivery=raw&use_default_storage=true&content=markdownUploads: Markdown content as .md
Default HTML
?url=https://example.com&delivery=raw&use_default_storage=trueUploads: HTML content as .html
File Name Templateing
Organize your scraped data at scale using dynamic path templates. Instead of manually organizing files, use template variables to automatically structure your storage based on domains, dates, and other metadata.
Template Variables
Template variables use double curly brace syntax: {{variable_name}}
| Variable | Description | Example Output |
|---|---|---|
{{task_id}} |
Unique task identifier | abc123xyz |
{{url}} |
Full URL being scraped | https://example.com/page |
{{domain}} |
Extracted domain name | example.com |
{{status}} |
Task completion status | success, failure |
{{timestamp}} |
Full timestamp (YYYYMMDD_HHMMSS) | 20260205_141530 |
{{date}} |
Date only (YYYY-MM-DD) | 2026-02-05 |
{{time}} |
Time only (HH-MM-SS) | 14-15-30 |
{{year}} |
Four-digit year | 2026 |
{{month}} |
Two-digit month | 02 |
{{day}} |
Two-digit day | 05 |
{{extension}} |
File extension based on output type | json, png, pdf, html, md, txt |
Template Examples
Domain & Date Organization
{
"path_prefix": "scrapes/{{domain}}/{{date}}/{{task_id}}.{{extension}}"
}Result: scrapes/example.com/2026-02-05/abc123xyz.json
Year/Month/Day Hierarchy
{
"path_prefix": "data/{{year}}/{{month}}/{{day}}/{{domain}}_{{task_id}}.{{extension}}"
}Result: data/2026/02/05/example.com_abc123xyz.json
Status-based Filtering
{
"path_prefix": "results/{{status}}/{{domain}}/{{timestamp}}.{{extension}}"
}Result: results/success/example.com/20260205_141530.json
Simple Domain Organization
{
"path_prefix": "{{domain}}/{{task_id}}.{{extension}}"
}Result: example.com/abc123xyz.json
Time-series Data Collection
{
"path_prefix": "timeseries/{{domain}}/{{year}}-{{month}}/{{day}}_{{time}}.{{extension}}"
}Result: timeseries/example.com/2026-02/05_14-15-30.json