Storage Providers
| Provider | Type | Authentication Method |
|---|---|---|
| Amazon S3 | S3-Compatible | Access Key & Secret Key |
| Cloudflare R2 | S3-Compatible | Access Key & Secret Key |
| DigitalOcean Spaces | S3-Compatible | Access Key & Secret Key |
| Google Cloud Storage | Native | Service Account JSON |
| Azure Blob Storage | Native | Connection String |
| Backblaze B2 | S3-Compatible | Application Key & Key ID |
| Other (Wasabi, MinIO) | S3-Compatible | Access Key & Secret Key |
Amazon S3
Setup Steps
-
Create an S3 Bucket
- Log into AWS Console
- Navigate to S3
- Create a new bucket in your desired region
- Note the bucket name and region
-
Create IAM User
- Navigate to IAM → Users → Create User
- Enable “Access key - Programmatic access”
- Attach policy:
AmazonS3FullAccessor create a custom policy
-
Generate Access Keys
- Complete user creation
- Save the Access Key ID and Secret Access Key
Configuration Example
{
"name": "AWS S3 Production",
"storage_type": "s3_compatible",
"config": {
"bucket": "my-scraper-data",
"region": "us-east-1",
"access_key": "AKIAIOSFODNN7EXAMPLE",
"secret_key": "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY",
"endpoint_url": "https://s3.amazonaws.com",
"path_prefix": "scrapes/{{domain}}/{{date}}/{{task_id}}.{{extension}}"
}
}Cloudflare R2
Setup Steps
-
Enable R2 in Cloudflare Dashboard
- Log into Cloudflare Dashboard
- Navigate to R2
- Create a new bucket
-
Generate API Tokens
- Go to R2 → Manage R2 API Tokens
- Create API token with “Object Read & Write” permissions
- Save the Access Key ID and Secret Access Key
-
Get Account ID
- Find your Account ID in the R2 overview page
- Note your endpoint URL format
Configuration Example
{
"name": "Cloudflare R2 Storage",
"storage_type": "s3_compatible",
"config": {
"bucket": "scraper-data",
"region": "auto",
"access_key": "YOUR_R2_ACCESS_KEY",
"secret_key": "YOUR_R2_SECRET_KEY",
"endpoint_url": "https://YOUR_ACCOUNT_ID.r2.cloudflarestorage.com",
"path_prefix": "production/{{domain}}/{{date}}/{{task_id}}.{{extension}}"
}
}DigitalOcean Spaces
Setup Steps
-
Create a Space
- Log into DigitalOcean
- Navigate to Spaces
- Create a new Space in your preferred region
- Note the Space name and region
-
Generate Spaces Access Keys
- Go to API → Spaces Keys
- Generate New Key
- Save the Access Key and Secret Key
Configuration Example
{
"name": "DigitalOcean Spaces",
"storage_type": "s3_compatible",
"config": {
"bucket": "my-scraper-space",
"region": "nyc3",
"access_key": "YOUR_SPACES_ACCESS_KEY",
"secret_key": "YOUR_SPACES_SECRET_KEY",
"endpoint_url": "https://nyc3.digitaloceanspaces.com",
"path_prefix": "scrapes/{{year}}/{{month}}/{{domain}}/{{task_id}}.{{extension}}"
}
}Google Cloud Storage (GCS)
Setup Steps
-
Create a GCS Bucket
- Log into Google Cloud Console
- Navigate to Cloud Storage
- Create a new bucket
- Choose location type and storage class
- Note the bucket name
-
Create Service Account
- Navigate to IAM & Admin → Service Accounts
- Create a new service account
- Grant “Storage Object Admin” role
- Create and download JSON key
-
Note Project ID
- Find your Project ID in the GCS console
- You’ll need this for authentication
Configuration Example
{
"name": "Google Cloud Storage",
"storage_type": "gcs",
"config": {
"bucket": "my-scraper-bucket",
"project_id": "your-project-id",
"credentials_json": "{\"type\":\"service_account\",\"project_id\":\"your-project\",\"private_key_id\":\"...\",\"private_key\":\"-----BEGIN PRIVATE KEY-----\\n...\\n-----END PRIVATE KEY-----\\n\",\"client_email\":\"[email protected]\",\"client_id\":\"...\",\"auth_uri\":\"https://accounts.google.com/o/oauth2/auth\",\"token_uri\":\"https://oauth2.googleapis.com/token\"}",
"path_prefix": "scrapes/{{domain}}/{{date}}/{{task_id}}.{{extension}}"
}
}Azure Blob Storage
Setup Steps
-
Create Storage Account
- Log into Azure Portal
- Create a new Storage Account
- Choose performance tier (Standard or Premium)
- Select redundancy option (LRS, GRS, etc.)
- Note the account name
-
Get Connection String
- Navigate to your Storage Account
- Go to Security + networking → Access keys
- Click “Show keys” button
- Copy the “Connection string” from Key1 or Key2
-
Create Container
- Navigate to Containers (under Data storage)
- Click “+ Container”
- Create a new container (equivalent to S3 bucket)
- Set public access level (Private recommended)
Configuration Example
{
"name": "Azure Blob Storage",
"storage_type": "azure_blob",
"config": {
"container": "scraper-results",
"connection_string": "DefaultEndpointsProtocol=https;AccountName=youraccount;AccountKey=abc123...==;EndpointSuffix=core.windows.net",
"path_prefix": "scrapes/{{domain}}/{{date}}/{{task_id}}.{{extension}}"
}
}Backblaze B2
Setup Steps
-
Create B2 Bucket
- Log into Backblaze
- Navigate to B2 Cloud Storage
- Create a new bucket
- Note the bucket name
-
Generate Application Key
- Go to App Keys
- Create new key with access to your bucket
- Save the keyID and applicationKey
-
Get Endpoint URL
- Find your S3-compatible endpoint in bucket settings
- Format:
https://s3.REGION.backblazeb2.com
Configuration Example
{
"name": "Backblaze B2",
"storage_type": "s3_compatible",
"config": {
"bucket": "scraper-backup",
"region": "us-west-002",
"access_key": "YOUR_B2_KEY_ID",
"secret_key": "YOUR_B2_APPLICATION_KEY",
"endpoint_url": "https://s3.us-west-002.backblazeb2.com",
"path_prefix": "archives/{{year}}/{{month}}/{{domain}}/{{task_id}}.{{extension}}"
}
}