Scrapy

This guide provides instructions on how to integrate Evomi’s proxies with Scrapy, a popular Python framework for web scraping.

Prerequisites

Before you begin, ensure you have the following:

  1. Python installed on your system
  2. Scrapy installed in your project
  3. Your Evomi proxy credentials (username and password)

Installation

If you haven’t already installed Scrapy, you can do so using pip:

pip install scrapy

Configuration

To use Evomi proxies with Scrapy, you need to configure the downloader middleware settings in your Scrapy project. Here’s how you can do it:

  1. In your Scrapy project’s settings.py file, add or modify the following settings:
# Proxy settings
PROXY_HOST = 'rp.evomi.com'
PROXY_PORT = '1000'
PROXY_USER = 'your_username'
PROXY_PASS = 'your_password_session-anychars_mode-speed'

# Enable the proxy middleware
DOWNLOADER_MIDDLEWARES = {
    'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware': 1,
}

# Proxy URL
PROXY_URL = f'http://{PROXY_USER}:{PROXY_PASS}@{PROXY_HOST}:{PROXY_PORT}'

# Set the proxy for Scrapy to use
HTTP_PROXY = PROXY_URL
HTTPS_PROXY = PROXY_URL

Replace your_username with your actual Evomi proxy username.

  1. Create a custom middleware to set the proxy for each request. Create a new file called proxy_middleware.py in your Scrapy project’s directory:
from scrapy.exceptions import NotConfigured

class ProxyMiddleware:
    def __init__(self, proxy_url):
        self.proxy_url = proxy_url

    @classmethod
    def from_crawler(cls, crawler):
        proxy_url = crawler.settings.get('PROXY_URL')
        if not proxy_url:
            raise NotConfigured('PROXY_URL not set')
        return cls(proxy_url)

    def process_request(self, request, spider):
        request.meta['proxy'] = self.proxy_url
  1. Update your settings.py file to include this custom middleware:
DOWNLOADER_MIDDLEWARES = {
    'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware': None,
    'your_project_name.proxy_middleware.ProxyMiddleware': 350,
}

Replace your_project_name with the actual name of your Scrapy project.

Example Spider

Here’s an example of a simple Scrapy spider that uses the configured proxy:

import scrapy

class IPCheckSpider(scrapy.Spider):
    name = 'ipcheck'
    start_urls = ['https://ip.evomi.com/s']

    def parse(self, response):
        yield {
            'ip': response.text.strip()
        }

Explanation

Let’s break down the key parts of this configuration:

  1. We set up the proxy configuration in the settings.py file, including the host, port, username, and password.
  2. We create a custom ProxyMiddleware to apply the proxy settings to each request.
  3. We update the DOWNLOADER_MIDDLEWARES setting to use our custom middleware.
  4. The example spider visits https://ip.evomi.com/s to verify the proxy connection and prints the IP address.

Evomi Proxy Endpoints

Depending on the Evomi product you’re using, you’ll need to adjust the proxy host and port in your settings.py file. Here are the endpoints for different Evomi products:

Residential Proxies

  • HTTP Proxy: rp.evomi.com:1000
  • HTTPs Proxy: rp.evomi.com:1001
  • SOCKS5 Proxy: rp.evomi.com:1002

Mobile Proxies

  • HTTP Proxy: mp.evomi.com:3000
  • HTTPs Proxy: mp.evomi.com:3001
  • SOCKS5 Proxy: mp.evomi.com:3002

Datacenter Proxies

  • HTTP Proxy: dcp.evomi.com:2000
  • HTTPs Proxy: dcp.evomi.com:2001
  • SOCKS5 Proxy: dcp.evomi.com:2002

To use a different product or protocol, simply replace the PROXY_HOST and PROXY_PORT variables in your settings.py file with the appropriate endpoint and port from the list above.

Running the Spider

To run the example spider, use the following command in your Scrapy project directory:

scrapy crawl ipcheck

If everything is set up correctly, you should see the IP address of the Evomi proxy you’re using in the output.

Tips and Troubleshooting

  • If you’re having connection issues, double-check your proxy credentials and make sure you’re using the correct endpoint and port for your chosen product.
  • The proxy password format (your_password_session-anychars_mode-speed) includes additional parameters. Make sure to replace your_password with your actual password while keeping the session-anychars_mode-speed part intact.
  • For HTTPS connections, you might need to disable SSL verification (use with caution in production):
    DOWNLOADER_MIDDLEWARES = {
        'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware': None,
        'your_project_name.proxy_middleware.ProxyMiddleware': 350,
        'scrapy.downloadermiddlewares.ssl.HttpsProxyMiddleware': 400,
    }
    
    DOWNLOAD_HANDLERS = {
        "https": "scrapy.core.downloader.handlers.http2.H2DownloadHandler",
    }
  • If you need to use a SOCKS5 proxy, you’ll need to install the scrapy-socks package:
    pip install scrapy-socks
    Then update your settings.py file:
    PROXY_URL = f'socks5://{PROXY_USER}:{PROXY_PASS}@{PROXY_HOST}:{PROXY_PORT}'
    DOWNLOAD_HANDLERS = {
        "http": "scrapy_socks.handlers.http.SOCKSDownloadHandler",
        "https": "scrapy_socks.handlers.http.SOCKSDownloadHandler",
    }
  • Remember to handle errors appropriately in your production code, as network requests can fail for various reasons.

By following this guide, you should now be able to successfully integrate Evomi’s proxies with your Scrapy projects for web scraping tasks.