Beautiful Soup

Beautiful Soup

This guide provides instructions on how to use Evomi’s proxies with Beautiful Soup, a popular Python library for web scraping and parsing HTML and XML documents.

Prerequisites

Before you begin, ensure you have the following:

  1. Python installed on your system
  2. Beautiful Soup and requests libraries installed
  3. Your Evomi proxy credentials (username and password)

Installation

If you haven’t already installed Beautiful Soup and requests, you can do so using pip:

pip install beautifulsoup4 requests

Configuration

To use Evomi proxies with Beautiful Soup, we’ll use the requests library to handle the HTTP requests. Here’s a basic setup:

import requests
from bs4 import BeautifulSoup

# Evomi proxy configuration
proxy_host = "rp.evomi.com"
proxy_port = "1000"
proxy_username = "your_username"
proxy_password = "your_password_session-anychars_mode-speed"

# Proxy URL
proxy_url = f"http://{proxy_username}:{proxy_password}@{proxy_host}:{proxy_port}"

# Proxy dictionary for requests
proxies = {
    "http": proxy_url,
    "https": proxy_url
}

# Make a request through the proxy
url = "https://ip.evomi.com/s"
response = requests.get(url, proxies=proxies)

# Parse the HTML content
soup = BeautifulSoup(response.content, 'html.parser')

# Print the IP address (which should be the proxy's IP)
print(soup.get_text().strip())

Replace your_username with your actual Evomi proxy username and your_password with your actual password.

Explanation

Let’s break down the key parts of this script:

  1. We import the necessary libraries: requests for making HTTP requests and BeautifulSoup for parsing HTML.
  2. We set up the proxy configuration with the Evomi proxy details.
  3. We create a proxy URL string that includes the authentication details.
  4. We create a proxies dictionary that requests will use to route traffic through the proxy.
  5. We make a GET request to https://ip.evomi.com/s using the configured proxy.
  6. We create a BeautifulSoup object from the response content.
  7. We print the text content of the page, which should show the IP address of the proxy we’re using.

Evomi Proxy Endpoints

Depending on the Evomi product you’re using, you’ll need to adjust the proxy host and port in your code. Here are the endpoints for different Evomi products:

Residential Proxies

  • HTTP Proxy: rp.evomi.com:1000
  • HTTPs Proxy: rp.evomi.com:1001
  • SOCKS5 Proxy: rp.evomi.com:1002

Mobile Proxies

  • HTTP Proxy: mp.evomi.com:3000
  • HTTPs Proxy: mp.evomi.com:3001
  • SOCKS5 Proxy: mp.evomi.com:3002

Datacenter Proxies

  • HTTP Proxy: dcp.evomi.com:2000
  • HTTPs Proxy: dcp.evomi.com:2001
  • SOCKS5 Proxy: dcp.evomi.com:2002

To use a different product or protocol, simply replace the proxy_host and proxy_port variables in your code with the appropriate endpoint and port from the list above.

Advanced Usage

Here’s an example of how to use Beautiful Soup with Evomi proxies to scrape a website:

import requests
from bs4 import BeautifulSoup

def get_soup(url, proxies):
    response = requests.get(url, proxies=proxies)
    return BeautifulSoup(response.content, 'html.parser')

# Evomi proxy configuration
proxy_url = f"http://your_username:[email protected]:1000"
proxies = {"http": proxy_url, "https": proxy_url}

# Example: Scraping Python.org
url = "https://www.python.org"
soup = get_soup(url, proxies)

# Find all 'a' tags with class 'reference internal'
links = soup.find_all('a', class_='reference internal')

# Print the text and href of each link
for link in links[:5]:  # Limiting to first 5 for brevity
    print(f"Text: {link.text}, URL: {link['href']}")

This script demonstrates how to use Beautiful Soup to parse and extract information from a web page while using an Evomi proxy.

Tips and Troubleshooting

  • If you’re having connection issues, double-check your proxy credentials and make sure you’re using the correct endpoint and port for your chosen product.
  • The proxy password format (your_password_session-anychars_mode-speed) includes additional parameters. Make sure to replace your_password with your actual password while keeping the session-anychars_mode-speed part intact.
  • For HTTPS connections, you might need to disable SSL verification (use with caution in production):
    response = requests.get(url, proxies=proxies, verify=False)
  • If you need to use a SOCKS5 proxy, you’ll need to install the requests[socks] extra:
    pip install requests[socks]
    Then update your proxy URL to use the SOCKS5 protocol:
    proxy_url = f"socks5://{proxy_username}:{proxy_password}@{proxy_host}:{proxy_port}"
  • Remember to handle errors appropriately in your production code, as network requests can fail for various reasons.
  • Be respectful of websites’ robots.txt files and implement proper rate limiting to avoid getting your IP blocked.

By following this guide, you should now be able to successfully integrate Evomi’s proxies with Beautiful Soup for your web scraping tasks.