Beautiful Soup
This guide provides instructions on how to use Evomi’s proxies with Beautiful Soup, a popular Python library for web scraping and parsing HTML and XML documents.
Prerequisites
Before you begin, ensure you have the following:
- Python installed on your system
- Beautiful Soup and requests libraries installed
- Your Evomi proxy credentials (username and password)
Installation
If you haven’t already installed Beautiful Soup and requests, you can do so using pip:
pip install beautifulsoup4 requests
Configuration
To use Evomi proxies with Beautiful Soup, we’ll use the requests library to handle the HTTP requests. Here’s a basic setup:
import requests
from bs4 import BeautifulSoup
# Evomi proxy configuration
proxy_host = "rp.evomi.com"
proxy_port = "1000"
proxy_username = "your_username"
proxy_password = "your_password_session-anychars_mode-speed"
# Proxy URL
proxy_url = f"http://{proxy_username}:{proxy_password}@{proxy_host}:{proxy_port}"
# Proxy dictionary for requests
proxies = {
"http": proxy_url,
"https": proxy_url
}
# Make a request through the proxy
url = "https://ip.evomi.com/s"
response = requests.get(url, proxies=proxies)
# Parse the HTML content
soup = BeautifulSoup(response.content, 'html.parser')
# Print the IP address (which should be the proxy's IP)
print(soup.get_text().strip())
Replace your_username
with your actual Evomi proxy username and your_password
with your actual password.
Explanation
Let’s break down the key parts of this script:
- We import the necessary libraries:
requests
for making HTTP requests andBeautifulSoup
for parsing HTML. - We set up the proxy configuration with the Evomi proxy details.
- We create a proxy URL string that includes the authentication details.
- We create a proxies dictionary that requests will use to route traffic through the proxy.
- We make a GET request to
https://ip.evomi.com/s
using the configured proxy. - We create a BeautifulSoup object from the response content.
- We print the text content of the page, which should show the IP address of the proxy we’re using.
Evomi Proxy Endpoints
Depending on the Evomi product you’re using, you’ll need to adjust the proxy host and port in your code. Here are the endpoints for different Evomi products:
Residential Proxies
- HTTP Proxy: rp.evomi.com:1000
- HTTPs Proxy: rp.evomi.com:1001
- SOCKS5 Proxy: rp.evomi.com:1002
Mobile Proxies
- HTTP Proxy: mp.evomi.com:3000
- HTTPs Proxy: mp.evomi.com:3001
- SOCKS5 Proxy: mp.evomi.com:3002
Datacenter Proxies
- HTTP Proxy: dcp.evomi.com:2000
- HTTPs Proxy: dcp.evomi.com:2001
- SOCKS5 Proxy: dcp.evomi.com:2002
To use a different product or protocol, simply replace the proxy_host
and proxy_port
variables in your code with the appropriate endpoint and port from the list above.
Advanced Usage
Here’s an example of how to use Beautiful Soup with Evomi proxies to scrape a website:
import requests
from bs4 import BeautifulSoup
def get_soup(url, proxies):
response = requests.get(url, proxies=proxies)
return BeautifulSoup(response.content, 'html.parser')
# Evomi proxy configuration
proxy_url = f"http://your_username:[email protected]:1000"
proxies = {"http": proxy_url, "https": proxy_url}
# Example: Scraping Python.org
url = "https://www.python.org"
soup = get_soup(url, proxies)
# Find all 'a' tags with class 'reference internal'
links = soup.find_all('a', class_='reference internal')
# Print the text and href of each link
for link in links[:5]: # Limiting to first 5 for brevity
print(f"Text: {link.text}, URL: {link['href']}")
This script demonstrates how to use Beautiful Soup to parse and extract information from a web page while using an Evomi proxy.
Tips and Troubleshooting
- If you’re having connection issues, double-check your proxy credentials and make sure you’re using the correct endpoint and port for your chosen product.
- The proxy password format (
your_password_session-anychars_mode-speed
) includes additional parameters. Make sure to replaceyour_password
with your actual password while keeping thesession-anychars_mode-speed
part intact. - For HTTPS connections, you might need to disable SSL verification (use with caution in production):
response = requests.get(url, proxies=proxies, verify=False)
- If you need to use a SOCKS5 proxy, you’ll need to install the
requests[socks]
extra:Then update your proxy URL to use the SOCKS5 protocol:pip install requests[socks]
proxy_url = f"socks5://{proxy_username}:{proxy_password}@{proxy_host}:{proxy_port}"
- Remember to handle errors appropriately in your production code, as network requests can fail for various reasons.
- Be respectful of websites’ robots.txt files and implement proper rate limiting to avoid getting your IP blocked.
By following this guide, you should now be able to successfully integrate Evomi’s proxies with Beautiful Soup for your web scraping tasks.