Scrapy
This guide provides instructions on how to integrate Evomi’s proxies with Scrapy, a popular Python framework for web scraping.
Prerequisites
Before you begin, ensure you have the following:
- Python installed on your system
- Scrapy installed in your project
- Your Evomi proxy credentials (username and password)
Installation
If you haven’t already installed Scrapy, you can do so using pip:
pip install scrapy
Configuration
To use Evomi proxies with Scrapy, you need to configure the downloader middleware settings in your Scrapy project. Here’s how you can do it:
- In your Scrapy project’s
settings.py
file, add or modify the following settings:
# Proxy settings
PROXY_HOST = 'rp.evomi.com'
PROXY_PORT = '1000'
PROXY_USER = 'your_username'
PROXY_PASS = 'your_password_session-anychars_mode-speed'
# Enable the proxy middleware
DOWNLOADER_MIDDLEWARES = {
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware': 1,
}
# Proxy URL
PROXY_URL = f'http://{PROXY_USER}:{PROXY_PASS}@{PROXY_HOST}:{PROXY_PORT}'
# Set the proxy for Scrapy to use
HTTP_PROXY = PROXY_URL
HTTPS_PROXY = PROXY_URL
Replace your_username
with your actual Evomi proxy username.
- Create a custom middleware to set the proxy for each request. Create a new file called
proxy_middleware.py
in your Scrapy project’s directory:
from scrapy.exceptions import NotConfigured
class ProxyMiddleware:
def __init__(self, proxy_url):
self.proxy_url = proxy_url
@classmethod
def from_crawler(cls, crawler):
proxy_url = crawler.settings.get('PROXY_URL')
if not proxy_url:
raise NotConfigured('PROXY_URL not set')
return cls(proxy_url)
def process_request(self, request, spider):
request.meta['proxy'] = self.proxy_url
- Update your
settings.py
file to include this custom middleware:
DOWNLOADER_MIDDLEWARES = {
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware': None,
'your_project_name.proxy_middleware.ProxyMiddleware': 350,
}
Replace your_project_name
with the actual name of your Scrapy project.
Example Spider
Here’s an example of a simple Scrapy spider that uses the configured proxy:
import scrapy
class IPCheckSpider(scrapy.Spider):
name = 'ipcheck'
start_urls = ['https://ip.evomi.com/s']
def parse(self, response):
yield {
'ip': response.text.strip()
}
Explanation
Let’s break down the key parts of this configuration:
- We set up the proxy configuration in the
settings.py
file, including the host, port, username, and password. - We create a custom
ProxyMiddleware
to apply the proxy settings to each request. - We update the
DOWNLOADER_MIDDLEWARES
setting to use our custom middleware. - The example spider visits
https://ip.evomi.com/s
to verify the proxy connection and prints the IP address.
Evomi Proxy Endpoints
Depending on the Evomi product you’re using, you’ll need to adjust the proxy host and port in your settings.py
file. Here are the endpoints for different Evomi products:
Residential Proxies
- HTTP Proxy: rp.evomi.com:1000
- HTTPs Proxy: rp.evomi.com:1001
- SOCKS5 Proxy: rp.evomi.com:1002
Mobile Proxies
- HTTP Proxy: mp.evomi.com:3000
- HTTPs Proxy: mp.evomi.com:3001
- SOCKS5 Proxy: mp.evomi.com:3002
Datacenter Proxies
- HTTP Proxy: dcp.evomi.com:2000
- HTTPs Proxy: dcp.evomi.com:2001
- SOCKS5 Proxy: dcp.evomi.com:2002
To use a different product or protocol, simply replace the PROXY_HOST
and PROXY_PORT
variables in your settings.py
file with the appropriate endpoint and port from the list above.
Running the Spider
To run the example spider, use the following command in your Scrapy project directory:
scrapy crawl ipcheck
If everything is set up correctly, you should see the IP address of the Evomi proxy you’re using in the output.
Tips and Troubleshooting
- If you’re having connection issues, double-check your proxy credentials and make sure you’re using the correct endpoint and port for your chosen product.
- The proxy password format (
your_password_session-anychars_mode-speed
) includes additional parameters. Make sure to replaceyour_password
with your actual password while keeping thesession-anychars_mode-speed
part intact. - For HTTPS connections, you might need to disable SSL verification (use with caution in production):
DOWNLOADER_MIDDLEWARES = { 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware': None, 'your_project_name.proxy_middleware.ProxyMiddleware': 350, 'scrapy.downloadermiddlewares.ssl.HttpsProxyMiddleware': 400, } DOWNLOAD_HANDLERS = { "https": "scrapy.core.downloader.handlers.http2.H2DownloadHandler", }
- If you need to use a SOCKS5 proxy, you’ll need to install the
scrapy-socks
package:Then update yourpip install scrapy-socks
settings.py
file:PROXY_URL = f'socks5://{PROXY_USER}:{PROXY_PASS}@{PROXY_HOST}:{PROXY_PORT}' DOWNLOAD_HANDLERS = { "http": "scrapy_socks.handlers.http.SOCKSDownloadHandler", "https": "scrapy_socks.handlers.http.SOCKSDownloadHandler", }
- Remember to handle errors appropriately in your production code, as network requests can fail for various reasons.
By following this guide, you should now be able to successfully integrate Evomi’s proxies with your Scrapy projects for web scraping tasks.