Storage Providers

Provider Type Authentication Method
Amazon S3 S3-Compatible Access Key & Secret Key
Cloudflare R2 S3-Compatible Access Key & Secret Key
DigitalOcean Spaces S3-Compatible Access Key & Secret Key
Google Cloud Storage Native Service Account JSON
Azure Blob Storage Native Connection String
Backblaze B2 S3-Compatible Application Key & Key ID
Other (Wasabi, MinIO) S3-Compatible Access Key & Secret Key

Amazon S3

Setup Steps

  1. Create an S3 Bucket

    • Log into AWS Console
    • Navigate to S3
    • Create a new bucket in your desired region
    • Note the bucket name and region
  2. Create IAM User

    • Navigate to IAM → Users → Create User
    • Enable “Access key - Programmatic access”
    • Attach policy: AmazonS3FullAccess or create a custom policy
  3. Generate Access Keys

    • Complete user creation
    • Save the Access Key ID and Secret Access Key

Configuration Example

{
  "name": "AWS S3 Production",
  "storage_type": "s3_compatible",
  "config": {
    "bucket": "my-scraper-data",
    "region": "us-east-1",
    "access_key": "AKIAIOSFODNN7EXAMPLE",
    "secret_key": "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY",
    "endpoint_url": "https://s3.amazonaws.com",
    "path_prefix": "scrapes/{{domain}}/{{date}}/{{task_id}}.{{extension}}"
  }
}

Cloudflare R2

Setup Steps

  1. Enable R2 in Cloudflare Dashboard

    • Log into Cloudflare Dashboard
    • Navigate to R2
    • Create a new bucket
  2. Generate API Tokens

    • Go to R2 → Manage R2 API Tokens
    • Create API token with “Object Read & Write” permissions
    • Save the Access Key ID and Secret Access Key
  3. Get Account ID

    • Find your Account ID in the R2 overview page
    • Note your endpoint URL format

Configuration Example

{
  "name": "Cloudflare R2 Storage",
  "storage_type": "s3_compatible",
  "config": {
    "bucket": "scraper-data",
    "region": "auto",
    "access_key": "YOUR_R2_ACCESS_KEY",
    "secret_key": "YOUR_R2_SECRET_KEY",
    "endpoint_url": "https://YOUR_ACCOUNT_ID.r2.cloudflarestorage.com",
    "path_prefix": "production/{{domain}}/{{date}}/{{task_id}}.{{extension}}"
  }
}

DigitalOcean Spaces

Setup Steps

  1. Create a Space

    • Log into DigitalOcean
    • Navigate to Spaces
    • Create a new Space in your preferred region
    • Note the Space name and region
  2. Generate Spaces Access Keys

    • Go to API → Spaces Keys
    • Generate New Key
    • Save the Access Key and Secret Key

Configuration Example

{
  "name": "DigitalOcean Spaces",
  "storage_type": "s3_compatible",
  "config": {
    "bucket": "my-scraper-space",
    "region": "nyc3",
    "access_key": "YOUR_SPACES_ACCESS_KEY",
    "secret_key": "YOUR_SPACES_SECRET_KEY",
    "endpoint_url": "https://nyc3.digitaloceanspaces.com",
    "path_prefix": "scrapes/{{year}}/{{month}}/{{domain}}/{{task_id}}.{{extension}}"
  }
}

Google Cloud Storage (GCS)

Setup Steps

  1. Create a GCS Bucket

    • Log into Google Cloud Console
    • Navigate to Cloud Storage
    • Create a new bucket
    • Choose location type and storage class
    • Note the bucket name
  2. Create Service Account

    • Navigate to IAM & Admin → Service Accounts
    • Create a new service account
    • Grant “Storage Object Admin” role
    • Create and download JSON key
  3. Note Project ID

    • Find your Project ID in the GCS console
    • You’ll need this for authentication

Configuration Example

{
  "name": "Google Cloud Storage",
  "storage_type": "gcs",
  "config": {
    "bucket": "my-scraper-bucket",
    "project_id": "your-project-id",
    "credentials_json": "{\"type\":\"service_account\",\"project_id\":\"your-project\",\"private_key_id\":\"...\",\"private_key\":\"-----BEGIN PRIVATE KEY-----\\n...\\n-----END PRIVATE KEY-----\\n\",\"client_email\":\"[email protected]\",\"client_id\":\"...\",\"auth_uri\":\"https://accounts.google.com/o/oauth2/auth\",\"token_uri\":\"https://oauth2.googleapis.com/token\"}",
    "path_prefix": "scrapes/{{domain}}/{{date}}/{{task_id}}.{{extension}}"
  }
}

Azure Blob Storage

Setup Steps

  1. Create Storage Account

    • Log into Azure Portal
    • Create a new Storage Account
    • Choose performance tier (Standard or Premium)
    • Select redundancy option (LRS, GRS, etc.)
    • Note the account name
  2. Get Connection String

    • Navigate to your Storage Account
    • Go to Security + networking → Access keys
    • Click “Show keys” button
    • Copy the “Connection string” from Key1 or Key2
  3. Create Container

    • Navigate to Containers (under Data storage)
    • Click “+ Container”
    • Create a new container (equivalent to S3 bucket)
    • Set public access level (Private recommended)

Configuration Example

{
  "name": "Azure Blob Storage",
  "storage_type": "azure_blob",
  "config": {
    "container": "scraper-results",
    "connection_string": "DefaultEndpointsProtocol=https;AccountName=youraccount;AccountKey=abc123...==;EndpointSuffix=core.windows.net",
    "path_prefix": "scrapes/{{domain}}/{{date}}/{{task_id}}.{{extension}}"
  }
}

Backblaze B2

Setup Steps

  1. Create B2 Bucket

    • Log into Backblaze
    • Navigate to B2 Cloud Storage
    • Create a new bucket
    • Note the bucket name
  2. Generate Application Key

    • Go to App Keys
    • Create new key with access to your bucket
    • Save the keyID and applicationKey
  3. Get Endpoint URL

    • Find your S3-compatible endpoint in bucket settings
    • Format: https://s3.REGION.backblazeb2.com

Configuration Example

{
  "name": "Backblaze B2",
  "storage_type": "s3_compatible",
  "config": {
    "bucket": "scraper-backup",
    "region": "us-west-002",
    "access_key": "YOUR_B2_KEY_ID",
    "secret_key": "YOUR_B2_APPLICATION_KEY",
    "endpoint_url": "https://s3.us-west-002.backblazeb2.com",
    "path_prefix": "archives/{{year}}/{{month}}/{{domain}}/{{task_id}}.{{extension}}"
  }
}