<![CDATA[Roundproxies Blog]]>https://roundproxies.com/blog/https://roundproxies.com/blog/favicon.pngRoundproxies Bloghttps://roundproxies.com/blog/Ghost 5.105Fri, 30 Jan 2026 17:33:53 GMT60<![CDATA[Why your Cloudflare Bypass always fails]]>https://roundproxies.com/blog/why-your-cloudflare-bypass-always-fails/697bff5a26f439f88a95b603Fri, 30 Jan 2026 01:19:55 GMTEveryone's been lying to you about Cloudflare.

Not maliciously. They just don't understand what they're dealing with.

Every guide says the same thing: rotate your proxies, randomize your user-agent, add some delays. Maybe throw in a headless browser if you're feeling fancy.

LITTLE ME: Me trying to explain how modern anti-bots work and how to bypass them.

Then you try it. And you're still getting blocked.

I spent months figuring out why. What I found changed everything I thought I knew about bypassing anti-bot systems.

The answer wasn't in my headers. It wasn't in my proxies. It was happening before my HTTP request even reached the server.

The Handshake That Betrays You

Here's what nobody tells you about Cloudflare:

Your scraper fails during the TLS handshake.

Before your beautifully crafted headers arrive. Before your residential IP impresses anyone. Before any of your careful work matters.

When your client initiates an HTTPS connection, it sends a "ClientHello" message. This message contains your TLS version, cipher suites, extensions, and elliptic curves.

Cloudflare hashes these values into something called a JA3 fingerprint.

Python's requests library produces a JA3 hash that looks nothing like Chrome's. Your scraper announces itself as a bot before sending a single byte of actual data.

It's like showing up to a costume party. You've spent hours on your outfit. Perfect mask. Perfect shoes. But your car is parked out front with "I'M NOT ACTUALLY INVITED" spray-painted on the side.

That's what your TLS fingerprint does.

The Five-Layer Problem

Cloudflare doesn't have one detection system. It has five. Maybe more.

Each layer feeds into a bot score between 1 and 99. Every single request gets scored.

Layer 1: TLS Fingerprinting

Your JA3/JA4 hash reveals your client implementation. Chrome, Firefox, Safari, and Python all produce different fingerprints. Cloudflare maintains a database of legitimate browser signatures.

Miss this layer? Game over before you start.

Layer 2: IP Reputation

Not just "is this a datacenter IP." Cloudflare's v8 machine learning model now classifies 17 million unique residential proxy IPs every hour.

They're specifically training on residential proxy traffic patterns.

That "just use residential proxies" advice? Increasingly worthless.

Layer 3: HTTP/2 Fingerprinting

Browsers send HTTP/2 settings in a specific order with specific values. Your HTTP client probably doesn't match.

This happens after TLS but before your actual request.

Another opportunity to fail.

Layer 4: JavaScript Detection

Cloudflare injects scripts that check for navigator.webdriver, canvas fingerprints, WebGL data, installed plugins, and timezone information.

Headless browsers often expose automation markers. Missing browser APIs. Wrong property values. Timing differences in how scripts execute.

Layer 5: Behavioral Analysis

Request frequency. Mouse movements. Click patterns. Navigation flow.

Bots request pages too fast. They skip CSS and images. They follow unnatural paths.

Even with perfect fingerprinting, requesting 50 pages per second gets you flagged.

Why "Rotate Proxies" Is Dead Advice

The standard guidance sounds reasonable: rotate through proxy IPs so Cloudflare can't track you.

Here's the problem.

Cloudflare correlates signals across requests. Same JA3 hash from different IPs? That's a bot moving between proxies. Same request patterns from different IPs? Coordinated automation.

Their machine learning doesn't need to see the same IP twice.

It recognizes your fingerprint across your entire proxy pool.

Worse: residential proxy providers route traffic through networks that Cloudflare specifically monitors. The v8 model I mentioned earlier? It's designed to catch exactly this pattern.

You're paying premium prices to use IPs that are already flagged.

The Consistency Problem Nobody Solves

Here's what actually matters: consistency across layers.

Your TLS fingerprint must match your User-Agent header. Your HTTP/2 settings must match your claimed browser. Your JavaScript environment must pass the checks your fingerprint claims you can pass.

One mismatch kills everything.

Consider this scenario:

You use curl_cffi to spoof a Chrome TLS fingerprint. Smart move. But you send Firefox headers. Or your HTTP/2 settings don't match Chrome's defaults.

Cloudflare sees a Chrome TLS handshake followed by Firefox behavior.

Instant flag.

The same applies to session persistence. Say you solve a Cloudflare challenge and get a cf_clearance cookie. That cookie is bound to your session's fingerprint.

Use it with a different TLS client? Invalid.

Use it from a different IP? Potentially invalid.

Send it with different headers? Invalid.

This is why so many bypass attempts fail intermittently. The solution worked once. Then something subtle changed. The session fingerprint no longer matches.


What Actually Works

Let me be direct about the options.

Browser Automation With Stealth

Tools like Puppeteer or Playwright launch real browsers. Real browsers produce real fingerprints. The TLS handshake is authentic because it's Chrome actually doing it.

The tradeoff: massive resource consumption.

A real browser uses 10-50x more CPU and memory than an HTTP request. For large-scale scraping, this gets expensive fast.

Add puppeteer-extra-plugin-stealth or equivalent. It patches obvious automation markers like navigator.webdriver. Without it, Cloudflare's JavaScript detection catches you.

But stealth plugins aren't magic. They hide surface-level markers. Advanced fingerprinting still catches subtle differences in how automated browsers behave.

TLS-Impersonating HTTP Clients

Libraries like curl_cffi and tls-client wrap browser-like TLS implementations.

They let you make HTTP requests that produce authentic JA3 fingerprints without running a full browser.

This works for lighter Cloudflare protection. Bot Fight Mode. Super Bot Fight Mode sometimes.

It fails against Enterprise Bot Management. The JavaScript detection layer requires actual JavaScript execution. HTTP clients can't provide that.

FlareSolverr and Similar Tools

FlareSolverr runs Selenium with undetected-chromedriver to solve Cloudflare challenges. You send requests through it as a proxy.

It works. Sometimes.

The "sometimes" matters. Cloudflare continuously updates detection. Tools that work today break tomorrow. The cat-and-mouse game never ends.

Know Which Protection You're Facing

This part is critical.

Cloudflare offers different protection tiers:

  • Bot Fight Mode (free plans)
  • Super Bot Fight Mode (Pro/Business)
  • Enterprise Bot Management

The techniques that bypass one tier fail against another.

Bot Fight Mode might fall to TLS impersonation alone. Enterprise Bot Management requires full browser automation with behavioral mimicry.

Figure out what you're facing before choosing your approach.

The Uncomfortable Truth

Cloudflare protects about 20% of the web. Their systems process hundreds of billions of requests daily. Their machine learning trains on this data continuously.

They're getting better faster than most bypass techniques improve.

This isn't a problem you solve once.

The solution you build today will need maintenance. Updates. Adaptation. Cloudflare pushes changes constantly. Detection methods that worked last month might fail next week.

If you need reliable, long-term access to Cloudflare-protected sites, you have three realistic options:

  1. Build and maintain a sophisticated bypass system. Budget for ongoing development. Expect failures. Plan for recovery.
  2. Use official APIs when available. Many sites offer APIs for legitimate data access. It's less glamorous but vastly more reliable.
  3. Accept the resource cost of browser automation. Real browsers with proper fingerprinting have the highest success rate. Pay for the compute.

There's no magic trick. No single library that just works. No proxy provider that guarantees success.

Anyone selling you a simple solution either doesn't understand the problem or doesn't care if their solution actually works for you.

Practical Starting Points

If you're going to try this anyway, here's where to start.

Check your TLS fingerprint first.

Visit https://tls.peet.ws/api/all with your scraper. Compare the JA3 hash against known browser signatures.

If it doesn't match a browser, everything else is wasted effort.

For Python, start with curl_cffi:

from curl_cffi import requests

response = requests.get(
    "https://target-site.com",
    impersonate="chrome"
)

This produces a Chrome-like TLS fingerprint. Good for testing whether TLS is your primary blocker.

For browser automation, use stealth plugins:

const puppeteer = require('puppeteer-extra');
const StealthPlugin = require('puppeteer-extra-plugin-stealth');
puppeteer.use(StealthPlugin());

const browser = await puppeteer.launch({ headless: 'new' });
const page = await browser.newPage();
await page.goto('https://target-site.com');

Better fingerprinting but higher resource cost.

Add human-like delays:

import random
import time

delay = random.uniform(2.0, 5.0)
time.sleep(delay)

Consistent timing patterns expose automation. Randomize everything.

Maintain session consistency:

Keep the same proxy IP, TLS fingerprint, and headers throughout a session. Switching mid-session invalidates solved challenges.

The Real Lesson

Cloudflare bypass isn't about tricking one system.

It's about maintaining perfect consistency across five or more detection layers that all feed into a single bot score.

Miss any layer and you fail.

Most guides focus on one technique because it's easier to explain. Easier to write a tutorial. Easier to sell a product.

But Cloudflare doesn't use one technique.

They use machine learning trained on global traffic. Heuristics updated constantly. JavaScript challenges that evolve. Behavioral analysis that learns patterns.

The guides that promise easy solutions aren't lying on purpose. They're just describing a fight against yesterday's defenses.

Cloudflare moved on. You need to as well.

Key Takeaways

The TLS handshake reveals your client before HTTP requests arrive. Fix this first or nothing else matters.

Cloudflare uses multiple detection layers. Success requires consistency across all of them.

Residential proxies are increasingly detected. Machine learning specifically targets these networks now.

Browser automation has the highest success rate but costs significantly more resources.

Session fingerprints must remain consistent. Changing IP, TLS client, or headers mid-session invalidates solved challenges.

No solution is permanent. Cloudflare updates continuously. Budget for ongoing maintenance.

]]>
<![CDATA[Web Scraping with Gologin tutorial [2026]]]>https://roundproxies.com/blog/web-scraping-gologin/69788aed26f439f88a95b550Tue, 27 Jan 2026 22:37:15 GMTEver spent hours building the perfect scraper only to watch it get bloxcked within minutes? That's the frustrating reality of scraping protected websites in 2026.

Modern anti-bot systems don't just check your IP address. They analyze browser fingerprints, detect automation patterns, and flag anything that doesn't look like a genuine user.

Gologin solves this problem by letting you create multiple browser profiles, each with unique fingerprints that pass detection. If you're scraping sites protected by Cloudflare, DataDome, or PerimeterX, this tool can be a game-changer.

In this guide, I'll show you how to set up Gologin for web scraping. You'll learn to integrate it with Selenium, Playwright, and Puppeteer—complete with working code examples.

What is Gologin?

Gologin is an antidetect browser designed for managing multiple browser profiles with unique digital fingerprints. Each profile appears as a completely different user to websites, making it nearly impossible for anti-bot systems to connect your scraping activities.

In practice, Gologin lets you:

  • Create hundreds of isolated browser profiles
  • Spoof browser fingerprints including canvas, WebGL, and audio
  • Integrate residential or datacenter proxies per profile
  • Connect to automation tools via Selenium, Playwright, or Puppeteer
  • Run profiles locally or through Gologin cloud infrastructure
  • Save and restore session data including cookies and localStorage

Gologin is popular among web scrapers, affiliate marketers, and anyone managing multiple online accounts. Founded in 2019, it's become one of the most accessible antidetect browsers thanks to its Python SDK and competitive pricing.

The key differentiator? Gologin uses its custom Orbita browser engine built on Chromium. This engine is specifically designed to resist fingerprinting detection—something standard Chrome with automation flags simply can't match.

Why use Gologin for web scraping?

Standard scraping approaches fail against modern protection for one simple reason: they look automated.

Even with rotating proxies and random user agents, sites detect patterns in how your browser behaves. Canvas fingerprinting, WebGL hashes, and navigator properties all reveal automation.

Gologin addresses this at the browser level. Instead of trying to mask an automated browser, you're running what appears to be a legitimate user's browser.

Here's when Gologin makes sense:

Sites with aggressive bot protection. Cloudflare, Akamai, and PerimeterX all analyze browser fingerprints. Gologin profiles pass these checks consistently.

Multi-account operations. Need to scrape from multiple logged-in accounts? Each Gologin profile maintains separate cookies and sessions.

Long-running scraping sessions. Standard automation gets flagged over time. Gologin profiles maintain realistic fingerprints across sessions.

Sites requiring human-like behavior. Some targets need mouse movements, scrolling, and interaction patterns. Gologin combined with automation tools handles this well.

The tradeoff is complexity and cost. For simple targets without bot protection, requests or basic Selenium works fine. Gologin shines when those approaches fail.

Gologin vs other approaches

Before committing to Gologin, consider how it compares to alternatives:

Approach Best For Fingerprint Protection Cost Complexity
Gologin Protected sites, multi-account Excellent $24-199/mo Medium
Selenium + undetected-chromedriver Moderate protection Good Free Low
Playwright stealth JS-heavy sites Moderate Free Low
Residential proxies only IP-based blocking None $10-50/GB Low
Multilogin Enterprise, maximum stealth Excellent $99-399/mo Medium
Kameleo Mobile fingerprints Excellent $59-199/mo Medium

Choose Gologin when:

  • Free tools like undetected-chromedriver still get blocked
  • You need multiple persistent browser profiles
  • You want a balance of capability and cost
  • Python is your primary language

Consider alternatives when:

  • You're scraping unprotected sites (use requests or Scrapy)
  • Budget is extremely limited (try undetected-chromedriver first)
  • You need enterprise features (Multilogin offers more)

Getting started: Installation and setup

Prerequisites

Before starting, ensure you have:

  • Python 3.8 or higher
  • A Gologin account (free tier available)
  • Your Gologin API token

Step 1: Create a Gologin account

Head to gologin.com and sign up. The free plan includes 3 browser profiles—enough to test your scraping workflow.

After registration, you'll get a 7-day trial of premium features with access to 1,000 profiles.

Step 2: Get your API token

Your API token authenticates Python scripts with Gologin's service.

  1. Log into Gologin
  2. Navigate to Settings → API
  3. Click "New Token"
  4. Copy and save the token securely

Never commit this token to version control. Use environment variables instead.

Step 3: Install the Python SDK

Install Gologin's official package:

pip install gologin

For Selenium integration, also install:

pip install selenium webdriver-manager

For Playwright:

pip install playwright
playwright install chromium

Step 4: Verify installation

Create a test script to confirm everything works:

import os
from gologin import GoLogin

# Load token from environment variable
token = os.environ.get('GL_API_TOKEN')

gl = GoLogin({
    'token': token
})

# Create a test profile
profile = gl.create({
    'name': 'Test Profile',
    'os': 'win',
    'navigator': {
        'language': 'en-US',
        'platform': 'Win32'
    }
})

print(f"Created profile: {profile['id']}")

Run with your token:

GL_API_TOKEN=your_token_here python test_gologin.py

If you see a profile ID, you're ready to scrape.

Gologin core concepts

Before writing scrapers, understand these key concepts:

Browser profiles

A profile is an isolated browser environment with its own fingerprint, cookies, and settings. Think of each profile as a separate person's computer.

Profiles persist across sessions. Close the browser, start it later, and you're still "logged in" with the same cookies and history.

Browser fingerprints

A fingerprint is the unique combination of browser properties websites use to identify you. This includes:

  • User agent string
  • Screen resolution
  • Installed fonts
  • Canvas rendering patterns
  • WebGL capabilities
  • Audio context properties
  • Hardware concurrency
  • Device memory

Gologin generates realistic fingerprints for each profile. The values come from real device databases, not random generation.

Orbita browser

Gologin custom browser engine based on Chromium. It's modified to prevent fingerprint leakage that occurs in standard Chrome automation.

When you connect via Selenium or Playwright, you're controlling an Orbita instance—not regular Chrome.

Local vs cloud profiles

Local profiles run on your machine. The browser opens visibly (or headless) and you control it directly.

Cloud profiles run on Gologin's servers. You connect via WebSocket and control a remote browser. This is useful for scaling or when you can't run browsers locally.

Debugger address

When you start a profile, Gologin returns a debugger address like 127.0.0.1:35421. This is the Chrome DevTools Protocol endpoint your automation tools connect to.

Your first Gologin scraper

Let's build a complete scraper that extracts product data from a protected eCommerce site.

Step 1: Create a browser profile

First, create a profile configured for scraping:

import os
from gologin import GoLogin

token = os.environ.get('GL_API_TOKEN')

gl = GoLogin({
    'token': token
})

# Create profile with scraping-optimized settings
profile = gl.create({
    'name': 'Scraper Profile 1',
    'os': 'win',
    'navigator': {
        'language': 'en-US',
        'platform': 'Win32',
        'userAgent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
    },
    'proxyEnabled': False,  # We'll add proxy later
    'webRTC': {
        'mode': 'alerted',
        'enabled': True
    }
})

profile_id = profile['id']
print(f"Profile created: {profile_id}")

Save the profile ID—you'll use it to launch the browser.

Step 2: Start the profile and connect Selenium

Now connect Selenium to the running Gologin profile:

import os
import time
from gologin import GoLogin
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager

token = os.environ.get('GL_API_TOKEN')
profile_id = 'your_profile_id_here'

# Initialize GoLogin
gl = GoLogin({
    'token': token,
    'profile_id': profile_id
})

# Start the browser and get debugger address
debugger_address = gl.start()
print(f"Browser started at: {debugger_address}")

# Connect Selenium to the running browser
chrome_options = Options()
chrome_options.add_experimental_option('debuggerAddress', debugger_address)

# Get matching ChromeDriver version
chromium_version = gl.get_chromium_version()
service = Service(
    ChromeDriverManager(driver_version=chromium_version).install()
)

driver = webdriver.Chrome(service=service, options=chrome_options)

Step 3: Scrape the target site

With Selenium connected, scrape like normal:

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

# Navigate to target
driver.get('https://example-store.com/products')

# Wait for products to load
wait = WebDriverWait(driver, 10)
products = wait.until(
    EC.presence_of_all_elements_located((By.CSS_SELECTOR, '.product-card'))
)

# Extract data
results = []
for product in products:
    name = product.find_element(By.CSS_SELECTOR, '.product-name').text
    price = product.find_element(By.CSS_SELECTOR, '.product-price').text
    
    results.append({
        'name': name,
        'price': price
    })

print(f"Scraped {len(results)} products")

Step 4: Clean up properly

Always stop the profile when finished:

# Close Selenium
driver.quit()

# Wait briefly for clean shutdown
time.sleep(2)

# Stop the GoLogin profile
gl.stop()
print("Profile stopped successfully")

Failing to stop profiles leaves orphan processes and can consume your account limits.

Selenium integration

For production scrapers, wrap Gologin in a reusable class:

import os
import time
from gologin import GoLogin
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager


class GoLoginScraper:
    """Wrapper for GoLogin + Selenium scraping."""
    
    def __init__(self, profile_id, token=None):
        self.token = token or os.environ.get('GL_API_TOKEN')
        self.profile_id = profile_id
        self.driver = None
        self.gl = None
    
    def start(self):
        """Start browser and return Selenium driver."""
        self.gl = GoLogin({
            'token': self.token,
            'profile_id': self.profile_id
        })
        
        debugger_address = self.gl.start()
        
        chrome_options = Options()
        chrome_options.add_experimental_option(
            'debuggerAddress', 
            debugger_address
        )
        
        chromium_version = self.gl.get_chromium_version()
        service = Service(
            ChromeDriverManager(driver_version=chromium_version).install()
        )
        
        self.driver = webdriver.Chrome(
            service=service, 
            options=chrome_options
        )
        
        return self.driver
    
    def stop(self):
        """Clean shutdown of browser and profile."""
        if self.driver:
            self.driver.quit()
            time.sleep(1)
        
        if self.gl:
            self.gl.stop()
    
    def __enter__(self):
        return self.start()
    
    def __exit__(self, exc_type, exc_val, exc_tb):
        self.stop()
        return False


# Usage with context manager
with GoLoginScraper('your_profile_id') as driver:
    driver.get('https://example.com')
    print(driver.title)
# Profile automatically stopped

Handling dynamic content

Many protected sites load content via JavaScript. Wait for elements properly:

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException


def scrape_with_waits(driver, url):
    """Scrape page with proper wait handling."""
    driver.get(url)
    
    wait = WebDriverWait(driver, 15)
    
    try:
        # Wait for specific element indicating page loaded
        wait.until(
            EC.presence_of_element_located(
                (By.CSS_SELECTOR, '[data-loaded="true"]')
            )
        )
    except TimeoutException:
        print("Page load timeout - continuing anyway")
    
    # Additional wait for any lazy-loaded content
    time.sleep(2)
    
    return driver.page_source

Running multiple profiles concurrently

Scale your scraping with multiple profiles:

from concurrent.futures import ThreadPoolExecutor, as_completed


def scrape_url(profile_id, url):
    """Scrape single URL with dedicated profile."""
    with GoLoginScraper(profile_id) as driver:
        driver.get(url)
        title = driver.title
        return {'url': url, 'title': title}


# Define your profiles and URLs
profiles = ['profile_1', 'profile_2', 'profile_3']
urls = [
    'https://site1.com',
    'https://site2.com', 
    'https://site3.com'
]

# Map URLs to profiles
tasks = list(zip(profiles, urls))

# Run concurrently
results = []
with ThreadPoolExecutor(max_workers=3) as executor:
    futures = {
        executor.submit(scrape_url, pid, url): (pid, url)
        for pid, url in tasks
    }
    
    for future in as_completed(futures):
        try:
            result = future.result()
            results.append(result)
        except Exception as e:
            print(f"Error: {e}")

print(f"Completed {len(results)} scrapes")

Playwright integration

Playwright offers better performance and auto-waiting compared to Selenium. Here's how to integrate it with Gologin:

import os
from gologin import GoLogin
from playwright.sync_api import sync_playwright


def scrape_with_playwright(profile_id, url):
    """Use Playwright with GoLogin profile."""
    token = os.environ.get('GL_API_TOKEN')
    
    gl = GoLogin({
        'token': token,
        'profile_id': profile_id
    })
    
    # Start profile and get WebSocket endpoint
    debugger_address = gl.start()
    ws_endpoint = f"ws://{debugger_address}/devtools/browser"
    
    with sync_playwright() as p:
        # Connect to running GoLogin browser
        browser = p.chromium.connect_over_cdp(
            f"http://{debugger_address}"
        )
        
        # Use existing context (preserves cookies)
        context = browser.contexts[0]
        page = context.pages[0] if context.pages else context.new_page()
        
        # Scrape
        page.goto(url)
        page.wait_for_load_state('networkidle')
        
        title = page.title()
        content = page.content()
        
        browser.close()
    
    gl.stop()
    
    return {'title': title, 'html': content}

Async Playwright for better performance

For high-volume scraping, use async Playwright:

import asyncio
import os
from gologin import GoLogin
from playwright.async_api import async_playwright


async def async_scrape(profile_id, url):
    """Async Playwright scraping with GoLogin."""
    token = os.environ.get('GL_API_TOKEN')
    
    gl = GoLogin({
        'token': token,
        'profile_id': profile_id
    })
    
    debugger_address = gl.start()
    
    async with async_playwright() as p:
        browser = await p.chromium.connect_over_cdp(
            f"http://{debugger_address}"
        )
        
        context = browser.contexts[0]
        page = await context.new_page()
        
        await page.goto(url)
        await page.wait_for_load_state('networkidle')
        
        data = await page.evaluate('''() => {
            return {
                title: document.title,
                h1: document.querySelector('h1')?.innerText
            }
        }''')
        
        await browser.close()
    
    gl.stop()
    
    return data


# Run async scraping
async def main():
    result = await async_scrape('your_profile_id', 'https://example.com')
    print(result)

asyncio.run(main())

Puppeteer integration

For Node.js projects, connect Puppeteer to Gologin:

const puppeteer = require('puppeteer-core');

async function scrapeWithGoLogin(profileId, token, url) {
    // GoLogin cloud browser endpoint
    const wsEndpoint = `wss://cloudbrowser.gologin.com/connect?token=${token}&profile=${profileId}`;
    
    const browser = await puppeteer.connect({
        browserWSEndpoint: wsEndpoint,
        defaultViewport: null
    });
    
    const page = await browser.newPage();
    
    await page.goto(url, { 
        waitUntil: 'networkidle2',
        timeout: 30000 
    });
    
    const data = await page.evaluate(() => ({
        title: document.title,
        url: window.location.href
    }));
    
    await browser.close();
    
    return data;
}

// Usage
const token = process.env.GL_API_TOKEN;
const profileId = 'your_profile_id';

scrapeWithGoLogin(profileId, token, 'https://example.com')
    .then(console.log)
    .catch(console.error);

The cloud browser approach is useful when you can't install the GoLogin desktop app.

Configuring proxies in Gologin

Proxies are essential for avoiding IP-based blocks. Gologin supports several proxy types.

Adding a proxy to a profile

from gologin import GoLogin

gl = GoLogin({'token': token})

# Create profile with proxy
profile = gl.create({
    'name': 'Proxied Profile',
    'os': 'win',
    'proxy': {
        'mode': 'http',
        'host': '192.168.1.1',
        'port': 8080,
        'username': 'user',
        'password': 'pass'
    },
    'proxyEnabled': True
})

Using GoLogin's built-in proxies

GoLogin offers free proxies for testing. Add them programmatically:

# Add GoLogin proxy to existing profile
gl.addGologinProxyToProfile(profile_id, 'us')  # US proxy
gl.addGologinProxyToProfile(profile_id, 'uk')  # UK proxy

Available country codes include: us, uk, de, fr, ca, au, and more.

Rotating proxies per session

For large-scale scraping, rotate proxies:

import random


def get_random_proxy(proxy_list):
    """Select random proxy from list."""
    proxy = random.choice(proxy_list)
    return {
        'mode': 'http',
        'host': proxy['host'],
        'port': proxy['port'],
        'username': proxy.get('username', ''),
        'password': proxy.get('password', '')
    }


def create_profile_with_rotation(gl, proxy_list):
    """Create profile with random proxy."""
    proxy = get_random_proxy(proxy_list)
    
    profile = gl.create({
        'name': f'Rotated Profile',
        'os': 'win',
        'proxy': proxy,
        'proxyEnabled': True
    })
    
    return profile

Advanced techniques

Headless mode

Run profiles without visible browser windows:

gl = GoLogin({
    'token': token,
    'profile_id': profile_id,
    'extra_params': ['--headless=new']
})

debugger_address = gl.start()

Note: Some sites detect headless mode. Test thoroughly before production use.

Persisting sessions across runs

Gologin automatically saves cookies and localStorage. To ensure persistence:

# Start profile - previous session data loads automatically
gl = GoLogin({
    'token': token,
    'profile_id': profile_id
})

debugger_address = gl.start()

# ... do work ...

# Stop profile - session data persists for next run
gl.stop()

Custom fingerprint parameters

Override specific fingerprint values:

profile = gl.create({
    'name': 'Custom Fingerprint',
    'os': 'mac',
    'navigator': {
        'userAgent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7)...',
        'resolution': '1920x1080',
        'language': 'en-US',
        'platform': 'MacIntel',
        'hardwareConcurrency': 8,
        'deviceMemory': 8,
        'maxTouchPoints': 0
    },
    'webGL': {
        'mode': 'noise',  # Adds noise to WebGL fingerprint
    },
    'canvas': {
        'mode': 'noise'  # Adds noise to canvas fingerprint
    }
})

Handling CAPTCHAs

Gologin reduces CAPTCHA frequency but doesn't eliminate them. For sites that still trigger CAPTCHAs:

  1. Slow down requests. Add delays between page loads.
  2. Simulate human behavior. Random scrolling and mouse movements help.
  3. Use residential proxies. Datacenter IPs trigger more CAPTCHAs.
  4. Integrate CAPTCHA solvers. Services like 2Captcha work with any browser automation.
import random
import time


def human_like_delay():
    """Random delay mimicking human behavior."""
    time.sleep(random.uniform(1.5, 4.0))


def random_scroll(driver):
    """Scroll randomly like a human would."""
    scroll_amount = random.randint(300, 700)
    driver.execute_script(f"window.scrollBy(0, {scroll_amount})")
    time.sleep(random.uniform(0.5, 1.5))

Common errors and troubleshooting

"Profile not found"

Cause: Invalid profile ID or profile was deleted.

Fix: List your profiles and verify the ID:

profiles = gl.getProfiles()
for p in profiles:
    print(f"{p['id']}: {p['name']}")

"Token invalid or expired"

Cause: API token was revoked or incorrectly copied.

Fix: Generate a new token from Settings → API in Gologin dashboard.

"Connection refused" when connecting Selenium

Cause: Browser didn't start or wrong debugger address.

Fix: Ensure gl.start() completed successfully and use the returned address:

address = gl.start()
print(f"Connect to: {address}")  # Should show host:port

ChromeDriver version mismatch

Cause: Gologin Orbita version doesn't match your ChromeDriver.

Fix: Use the version from Gologin:

chromium_version = gl.get_chromium_version()
service = Service(
    ChromeDriverManager(driver_version=chromium_version).install()
)

Profile takes too long to start

Cause: Large profile with many cookies or slow network.

Fix: Create fresh profiles periodically and clear unnecessary data:

# Delete old profile
gl.delete(old_profile_id)

# Create clean replacement
new_profile = gl.create({...})

Best practices

Rotate profiles for large jobs

Don't hammer one profile. Spread requests across multiple:

def get_profile_for_request(request_num, profiles):
    """Round-robin profile selection."""
    index = request_num % len(profiles)
    return profiles[index]

Implement polite scraping delays

Even with fingerprint protection, aggressive scraping gets noticed:

import random
import time


def polite_request(driver, url, min_delay=2, max_delay=5):
    """Request with random delay."""
    driver.get(url)
    delay = random.uniform(min_delay, max_delay)
    time.sleep(delay)
    return driver.page_source

Monitor profile health

Track success rates per profile:

from collections import defaultdict

profile_stats = defaultdict(lambda: {'success': 0, 'failed': 0})


def record_result(profile_id, success):
    if success:
        profile_stats[profile_id]['success'] += 1
    else:
        profile_stats[profile_id]['failed'] += 1


def get_healthy_profiles(min_success_rate=0.8):
    """Return profiles with good success rates."""
    healthy = []
    
    for pid, stats in profile_stats.items():
        total = stats['success'] + stats['failed']
        if total > 10:  # Minimum sample size
            rate = stats['success'] / total
            if rate >= min_success_rate:
                healthy.append(pid)
    
    return healthy

Handle failures gracefully

Implement retry logic with exponential backoff:

import time
from functools import wraps


def retry_on_failure(max_retries=3, base_delay=1):
    """Decorator for retry logic."""
    def decorator(func):
        @wraps(func)
        def wrapper(*args, **kwargs):
            for attempt in range(max_retries):
                try:
                    return func(*args, **kwargs)
                except Exception as e:
                    if attempt == max_retries - 1:
                        raise
                    delay = base_delay * (2 ** attempt)
                    print(f"Attempt {attempt + 1} failed: {e}")
                    print(f"Retrying in {delay}s...")
                    time.sleep(delay)
        return wrapper
    return decorator


@retry_on_failure(max_retries=3)
def scrape_page(driver, url):
    driver.get(url)
    return driver.page_source

Save data incrementally

Don't wait until the end to save results:

import json


def save_result(data, filename='results.jsonl'):
    """Append single result to file."""
    with open(filename, 'a') as f:
        f.write(json.dumps(data) + '\n')


# In your scraping loop
for url in urls:
    result = scrape_page(driver, url)
    save_result(result)  # Saved immediately

Gologin pricing

Gologin offers several pricing tiers (as of 2026):

Plan Profiles Team Members Price (Monthly) Price (Annual)
Forever Free 3 1 $0 $0
Professional 100 1 $79 $24/mo
Business 300 1 $119 $59/mo
Enterprise 1,000 1 $299 $149/mo
Custom 2,000+ 1 $499+ $224+

All paid plans include:

  • API access for automation
  • Cloud browser profiles
  • Proxy management
  • Profile sharing
  • Priority support

The free plan works for testing but limits you to 3 profiles with no API access. For serious scraping, the Professional plan at $24/month (annual) offers good value.

FAQs

Gologin itself is legal. The legality depends on what you scrape and how.

Scraping public data is generally legal, but violating a site's terms of service can have consequences. Always check the target site's ToS and robots.txt.

Can Gologin bypass all anti-bot protection?

No tool bypasses everything. Gologin handles most protection including Cloudflare's JavaScript challenges, but extremely aggressive systems may still detect automated patterns.

Combine Gologin with realistic behavior patterns and quality residential proxies for best results.

How many profiles can I run simultaneously?

This depends on your hardware. Each profile is a separate browser process.

On a typical 8GB RAM machine, expect to run 3-5 profiles comfortably. For more, use cloud profiles or scale horizontally across machines.

Does Gologin work with headless mode?

Yes, but with caveats. Some sites specifically detect headless browsers. Gologin's Orbita browser resists some headless detection, but test thoroughly.

For maximum stealth, run headed (visible) browsers when possible.

Can I use my existing proxies with Gologin?

Yes. Gologin supports HTTP, HTTPS, and SOCKS5 proxies. You can configure proxies per profile or use Gologin's built-in free proxies.

For best results, residential proxies outperform datacenter proxies on protected sites.

Conclusion

Gologin fills an important gap in the web scraping toolkit. When standard approaches fail against fingerprint-based detection, it provides a reliable way to appear as legitimate users.

The key takeaways:

  • Use Gologin when simpler tools get blocked
  • Create separate profiles for different scraping jobs
  • Combine with quality proxies for IP rotation
  • Implement polite delays and human-like behavior
  • Monitor profile health and rotate as needed

Start with the free tier to test against your target sites. If Gologin consistently bypasses their protection, upgrade to Professional for API access and more profiles.

For sites without aggressive protection, stick with simpler tools. Gologin adds complexity—only use it when you need to.

]]>
<![CDATA[How to bypass Cloudflare with Selenium (2026 easy guide)]]>https://roundproxies.com/blog/selenium-cloudflare-bypass/6978a2b126f439f88a95b571Tue, 27 Jan 2026 11:54:39 GMTYour scraper works perfectly on test pages. Then you point it at a real website and hit the dreaded Cloudflare challenge page.

Cloudflare protects over 20% of all websites. If you're building any kind of automation or data collection tool, you'll encounter it constantly.

SeleniumBase UC Mode offers the most reliable free solution for bypassing Cloudflare's Turnstile CAPTCHA in 2026. Unlike deprecated tools like puppeteer-stealth, it's actively maintained and handles the tricky parts automatically.

In this guide, I'll show you how to use SeleniumBase UC Mode to bypass Cloudflare protection. You'll get working code examples, troubleshooting tips, and a ready-to-use tool from GitHub.

What is Cloudflare Turnstile and why does it block scrapers?

Cloudflare Turnstile is Cloudflare's CAPTCHA replacement introduced in 2022. It runs invisible challenges in the background to detect automated traffic without disrupting human users.

Unlike traditional image-based CAPTCHAs, Turnstile doesn't require users to identify traffic lights or crosswalks. Instead, it uses browser fingerprinting and behavioral analysis to determine if a visitor is human.

This makes Turnstile harder to bypass than older CAPTCHAs. You can't just send an image to a solving service. The system requires actual browser execution and realistic interaction patterns.

When you visit a Turnstile-protected site, Cloudflare analyzes multiple signals:

Browser fingerprinting. Cloudflare checks navigator properties, WebGL rendering, canvas fingerprints, and dozens of other browser characteristics. Headless browsers expose telltale signs like navigator.webdriver = true.

JavaScript execution. Turnstile runs cryptographic challenges that require real browser JavaScript execution. Simple HTTP requests fail immediately.

Behavioral analysis. Mouse movements, click patterns, and typing rhythms get analyzed. Bot-like behavior triggers challenges.

IP reputation. Datacenter IPs and known proxy ranges face stricter scrutiny than residential connections.

Standard Selenium fails these checks because ChromeDriver leaves detectable traces. The webdriver property gets set to true, automation-related console variables exist, and the browser fingerprint looks artificial.

Turnstile has three protection modes:

Non-interactive. Runs completely in the background. No checkbox appears if you pass the initial checks.

Invisible. Shows a brief "Verifying you are human" message for 1-2 seconds while running background checks.

Interactive. Requires clicking a checkbox when trust scores are low or suspicious patterns are detected.

What is SeleniumBase UC Mode?

SeleniumBase UC Mode (Undetected ChromeDriver Mode) patches Selenium to evade bot detection. It's built on top of undetected-chromedriver but adds critical improvements.

UC Mode works through three primary techniques:

First, it renames Chrome DevTools Console variables that anti-bot systems scan for. Standard Selenium exposes automation indicators that Cloudflare specifically looks for.

Second, UC Mode launches Chrome browsers before attaching ChromeDriver. Normal Selenium does the reverse, creating detectable browser configurations.

Third, it disconnects ChromeDriver during sensitive actions like page loads and button clicks. Websites typically check for automation during these specific events.

The result is a browser that passes most fingerprint checks and Turnstile challenges that would block standard Selenium.

SeleniumBase UC Mode vs alternatives

Why use SeleniumBase over other options?

vs raw undetected-chromedriver: SeleniumBase handles driver downloads and version matching automatically. It includes specialized CAPTCHA-handling methods that undetected-chromedriver lacks. The SB manager format provides context management and cleanup.

vs Puppeteer Stealth: As of February 2025, puppeteer-stealth is no longer maintained. The original maintainer stopped updates, and Cloudflare has adapted to detect it. SeleniumBase remains actively maintained with regular updates.

vs Nodriver: Nodriver offers slightly better evasion but less stability. SeleniumBase is more mature with better documentation. Choose Nodriver for maximum stealth, SeleniumBase for reliability.

vs Camoufox: Camoufox uses Firefox instead of Chrome, which can bypass Chrome-specific detection. However, some sites work better with Chrome. SeleniumBase offers broader compatibility.

SeleniumBase includes specialized uc_*() methods for handling CAPTCHAs:

sb.uc_open_with_reconnect(url, reconnect_time)  # Opens URL with stealth
sb.uc_gui_click_captcha()  # Clicks Turnstile checkbox via PyAutoGUI
sb.uc_gui_handle_captcha()  # Auto-detects and handles CAPTCHA
sb.uc_click(selector)  # Stealthy element clicking

These methods time the ChromeDriver disconnection precisely to avoid triggering detection.

Quick start: Basic Cloudflare bypass

Let's start with the simplest working example.

Install SeleniumBase:

pip install seleniumbase

Create a basic bypass script:

from seleniumbase import SB

with SB(uc=True) as sb:
    url = "https://www.scrapingcourse.com/cloudflare-challenge"
    
    # Open URL with 4-second reconnect time
    sb.uc_open_with_reconnect(url, reconnect_time=4)
    
    # Handle Turnstile if it appears
    sb.uc_gui_click_captcha()
    
    # Wait for page content
    sb.sleep(2)
    
    # Now you can scrape
    print(sb.get_page_source())

The reconnect_time parameter controls how long ChromeDriver stays disconnected after navigating. This window lets the page load and run initial bot checks without ChromeDriver being attached.

Four seconds works for most sites. Increase it to 6-8 seconds for heavily protected pages.

Method 1: Simple bypass with Driver format

The Driver format gives you raw WebDriver access with UC Mode enabled.

Best for: Quick scripts, testing, simple scraping tasks
Difficulty: Easy
Success rate: High for basic Cloudflare protection
from seleniumbase import Driver

# Initialize driver with UC Mode
driver = Driver(uc=True)

try:
    url = "https://example.com"
    
    # Navigate with stealth
    driver.uc_open_with_reconnect(url, 4)
    
    # Click CAPTCHA if present
    driver.uc_gui_click_captcha()
    
    # Your scraping logic here
    title = driver.title
    print(f"Page title: {title}")
    
finally:
    driver.quit()

Pros:

  • Familiar Selenium syntax
  • Minimal code changes from regular Selenium
  • Full WebDriver access

Cons:

  • No virtual display support on Linux servers
  • Manual cleanup required
  • Less error handling

Use the Driver format when you're prototyping or need direct WebDriver control.

The SB Manager format provides context management and additional features.

Best for: Production scripts, Linux servers, robust applications
Difficulty: Easy
Success rate: High to Very High
from seleniumbase import SB

with SB(uc=True, test=True) as sb:
    url = "https://example.com"
    
    # Open with reconnect
    sb.uc_open_with_reconnect(url, reconnect_time=4)
    
    # Handle CAPTCHA
    sb.uc_gui_click_captcha()
    
    # Wait for content
    sb.sleep(2)
    
    # Scrape data
    content = sb.get_text("body")
    print(content[:500])

The test=True parameter enables additional logging and screenshots on failure.

Pros:

  • Automatic resource cleanup
  • Virtual display support for headless Linux
  • Better error handling
  • More utility methods

Cons:

  • Slightly different syntax than raw Selenium
  • Learning curve for advanced features

This is the recommended format for most use cases.

Method 3: Handling Turnstile on forms

Some sites show Turnstile after form submission, not on page load. This requires a different approach.

Best for: Login forms, checkout pages, any page with embedded Turnstile
Difficulty: Medium
Success rate: High
from seleniumbase import SB

with SB(uc=True, test=True, incognito=True, locale="en") as sb:
    url = "https://example.com/login"
    
    # Open the page
    sb.uc_open_with_reconnect(url)
    
    # Fill the form
    sb.type('input[name="username"]', "your_username")
    sb.type('input[name="password"]', "your_password")
    
    # Click submit with stealthy click and reconnect time
    sb.uc_click('button[type="submit"]', reconnect_time=3.25)
    
    # Handle the CAPTCHA that appears after submission
    sb.uc_gui_click_captcha()
    
    # Wait for redirect
    sb.sleep(2)

The uc_click() method schedules the click, disconnects ChromeDriver, waits, and reconnects. This prevents detection during the critical form submission moment.

Key points for form handling:

  • Use incognito=True to maximize anti-detection
  • Set locale="en" for consistent behavior
  • Adjust reconnect_time based on server response speed
  • Call uc_gui_click_captcha() after the CAPTCHA appears

Method 4: CDP Mode for advanced protection

Some sites have detection that UC Mode alone can't bypass. CDP Mode (Chrome DevTools Protocol Mode) offers stronger evasion.

Best for: Heavily protected sites, sites that detect UC Mode
Difficulty: Hard
Success rate: Very High
from seleniumbase import SB

with SB(uc=True, test=True, locale="en") as sb:
    url = "https://heavily-protected-site.com"
    
    # Activate CDP Mode
    sb.activate_cdp_mode(url)
    
    # Wait for page load
    sb.sleep(2)
    
    # Use CDP methods for interaction
    sb.cdp.gui_click_element("#turnstile-widget div")
    
    # Wait for verification
    sb.sleep(2)
    
    # Continue with scraping
    content = sb.cdp.get_text("body")

CDP Mode maintains a persistent connection to Chrome without the usual WebDriver attachment. This makes detection significantly harder.

For Turnstile challenges in CDP Mode, use sb.cdp.gui_click_element() with the parent selector above the shadow root:

# Find the Turnstile widget container
sb.cdp.gui_click_element("#turnstile-widget div")

Method 5: Using the cloudflare-bypass-2026 tool

The cloudflare-bypass-2026 repository provides a ready-to-use tool built on SeleniumBase UC Mode.

Best for: Quick deployment, cookie harvesting, parallel bypass
Difficulty: Easy
Success rate: High

Installation

git clone https://github.com/1837620622/cloudflare-bypass-2026.git
cd cloudflare-bypass-2026
pip install -r requirements.txt

For Linux servers, run the install script:

sudo bash install_linux.sh

Basic usage

# Simple bypass
python bypass.py https://example.com

# With proxy
python bypass.py https://example.com -p http://127.0.0.1:7890

# With custom timeout
python bypass.py https://example.com -t 60

Parallel mode

For higher throughput, use parallel mode:

# Run 3 browsers simultaneously
python simple_bypass.py https://example.com -P -b 3 -t 60

# With proxy rotation
python simple_bypass.py https://example.com -P -c -b 3 -n 30 -f proxy.txt

Python API usage

from bypass import bypass_cloudflare

result = bypass_cloudflare("https://example.com")

if result["success"]:
    print(f"cf_clearance: {result['cf_clearance']}")
    print(f"User-Agent: {result['user_agent']}")

The tool exports cookies in both JSON and Netscape formats to the output/cookies/ directory.

Method 6: Using proxies for IP rotation

Cloudflare tracks IP reputation. Rotating proxies helps avoid IP-based blocking.

Best for: Large-scale scraping, avoiding rate limits
Difficulty: Medium
Success rate: Depends on proxy quality

from seleniumbase import SB

# Residential proxy (recommended)
proxy = "http://user:pass@residential-proxy.com:8080"

with SB(uc=True, proxy=proxy) as sb:
    sb.uc_open_with_reconnect("https://example.com", 4)
    sb.uc_gui_click_captcha()
    sb.sleep(2)

Proxy type matters significantly:

Datacenter proxies. Cheap but easily detected. Success rate around 30-50% against Cloudflare.

Residential proxies. Real ISP IPs with higher trust scores. Success rate 70-90%.

Mobile proxies. Highest trust but most expensive. Success rate 85-95%.

For the cloudflare-bypass-2026 tool, create a proxy.txt file:

http://127.0.0.1:7890
socks5://127.0.0.1:1080
http://user:pass@host:port

Then use proxy rotation:

python simple_bypass.py https://example.com -r -f proxy.txt

Handling cf_clearance cookies

Successful Cloudflare bypass generates a cf_clearance cookie. This cookie acts as a pass-through ticket for subsequent requests.

The cf_clearance cookie typically remains valid for 15 minutes to several hours. Validity depends on:

  • Site configuration
  • Your behavioral patterns
  • IP consistency
  • Browser fingerprint consistency

Reusing cookies

Extract and reuse cookies to avoid repeated bypasses:

from seleniumbase import SB
import json

with SB(uc=True) as sb:
    sb.uc_open_with_reconnect("https://example.com", 4)
    sb.uc_gui_click_captcha()
    sb.sleep(2)
    
    # Extract cookies
    cookies = sb.get_cookies()
    
    # Find cf_clearance
    cf_clearance = None
    for cookie in cookies:
        if cookie['name'] == 'cf_clearance':
            cf_clearance = cookie['value']
            break
    
    # Save for later use
    with open('cookies.json', 'w') as f:
        json.dump(cookies, f)

Use saved cookies with requests:

import requests
import json

with open('cookies.json', 'r') as f:
    cookies_list = json.load(f)

# Convert to requests format
session = requests.Session()
for cookie in cookies_list:
    session.cookies.set(cookie['name'], cookie['value'])

# Add matching User-Agent
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
}

response = session.get("https://example.com/api/data", headers=headers)

Linux server configuration

Running UC Mode on Linux servers requires special setup because GUI methods need a display.

Install dependencies

# Install Xvfb for virtual display
sudo apt-get update
sudo apt-get install -y xvfb

# Install Chrome dependencies
sudo apt-get install -y \
    libglib2.0-0 libnss3 libatk1.0-0 libatk-bridge2.0-0 \
    libcups2 libdrm2 libxkbcommon0 libgbm1 libasound2

# Install Chrome
wget -q https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
sudo dpkg -i google-chrome-stable_current_amd64.deb
sudo apt-get install -f -y

# Install Python dependencies
pip install seleniumbase pyvirtualdisplay

Using xvfb mode

SeleniumBase automatically handles virtual display with the xvfb parameter:

from seleniumbase import SB

with SB(uc=True, xvfb=True) as sb:
    sb.uc_open_with_reconnect("https://example.com", 4)
    sb.uc_gui_click_captcha()

The xvfb=True parameter creates a virtual display that allows PyAutoGUI to function on headless servers.

Set display resolution:

with SB(uc=True, xvfb=True, xvfb_metrics="1920,1080") as sb:
    sb.set_window_size(1920, 1080)
    # ...

Troubleshooting common issues

CAPTCHA not clicking

Symptom: uc_gui_click_captcha() runs but the checkbox isn't clicked.

Fixes:

  1. Update SeleniumBase to the latest version:
pip install --upgrade seleniumbase
  1. Use uc_gui_click_captcha() instead of uc_gui_handle_captcha() for stricter implementations.
  2. Try explicit coordinates if auto-detection fails:
sb.uc_gui_click_x_y(x, y)
  1. Check if the CAPTCHA is in a different position. Some sites right-align the checkbox:
# Debug: take a screenshot to see CAPTCHA position
sb.save_screenshot("debug.png")

"X11 display failed" error

Symptom: Error about display or X11 on Linux.

Fix: Install Xvfb and use xvfb=True:

sudo apt-get install -y xvfb
with SB(uc=True, xvfb=True) as sb:
    # ...

PyAutoGUI screen size mismatch

Symptom: Error like "PyAutoGUI cannot click on point (x, y) outside screen"

Fix: Set explicit xvfb metrics:

with SB(uc=True, xvfb=True, xvfb_metrics="1920,1080") as sb:
    sb.set_window_size(1920, 1080)
    # ...

Blocked after first request

Symptom: First request works, subsequent requests get blocked.

Fixes:

  1. Reuse the same browser session instead of creating new ones.
  2. Maintain consistent User-Agent and cookies across requests.
  3. Add delays between requests:
sb.sleep(2)  # Wait between actions
  1. Avoid rapid-fire clicking. Add small delays between interactions:
sb.sleep(random.uniform(0.3, 0.8))

Turnstile loops endlessly

Symptom: Turnstile keeps showing "Verifying..." but never completes.

Fixes:

  1. Don't use headless mode. UC Mode is detectable in headless mode.
  2. Use incognito mode:
with SB(uc=True, incognito=True) as sb:
    # ...
  1. Switch to CDP Mode for better evasion:
sb.activate_cdp_mode(url)
  1. Check your IP reputation. Try a different network or proxy.

Proxy not working

Symptom: Connection errors or immediate blocking with proxy.

Fixes:

  1. Test proxy connectivity first:
curl -x http://proxy:port https://httpbin.org/ip
  1. Use residential proxies. Most public proxies don't support HTTPS tunneling.
  2. Ensure proxy format is correct:
proxy = "http://user:pass@host:port"  # Correct
proxy = "host:port"  # May not work
  1. Try SOCKS5 format for better compatibility:
proxy = "socks5://host:port"

WebDriver detected despite UC Mode

Symptom: Site shows "Automated browser detected" or similar.

Fixes:

  1. Enable incognito mode:
with SB(uc=True, incognito=True) as sb:
  1. Switch to CDP Mode:
sb.activate_cdp_mode(url)
  1. Verify no other extensions or flags are leaking automation signals.
  2. Try a fresh Chrome profile:
with SB(uc=True, user_data_dir=None) as sb:

Which method should you use?

Your choice depends on your specific situation.

Situation Recommended Method
Quick testing Method 1: Driver format
Production scripts Method 2: SB Manager
Form submission Method 3: Form handling
Heavily protected sites Method 4: CDP Mode
Cookie harvesting Method 5: cloudflare-bypass-2026
Large-scale scraping Method 6: Proxy rotation

Start with Method 2 (SB Manager format). It works for most sites and handles cleanup automatically.

If you're getting blocked, try Method 4 (CDP Mode) before adding proxies. CDP Mode has stronger evasion capabilities and often succeeds where regular UC Mode fails.

For production deployments, combine Method 2 with Method 6 (proxy rotation) for best results. Residential proxies significantly improve success rates.

Decision flowchart

Is the site Cloudflare-protected?
├── No → Use regular Selenium
└── Yes → Does basic UC Mode work?
    ├── Yes → Use Method 2 (SB Manager)
    └── No → Does CDP Mode work?
        ├── Yes → Use Method 4
        └── No → Add proxy rotation (Method 6)
            └── Still blocked? → Try residential proxies

Performance considerations

Each method has different resource requirements:

UC Mode (Methods 1-3): Uses approximately 200-400MB RAM per browser instance. Suitable for moderate-scale scraping.

CDP Mode (Method 4): Slightly lower memory usage but requires more careful session management.

Parallel mode (Method 5): Memory scales linearly with browser count. Three browsers need approximately 600-1200MB.

For high-volume scraping, consider harvesting cf_clearance cookies with browser methods, then using them with lightweight HTTP requests.

Best practices for reliable bypass

Following these practices improves your success rate significantly.

Timing and delays

Don't rush through pages. Real users don't click instantly after page load.

from seleniumbase import SB
import random

with SB(uc=True) as sb:
    sb.uc_open_with_reconnect(url, 4)
    
    # Random delay mimics human behavior
    sb.sleep(random.uniform(1.5, 3.0))
    
    sb.uc_gui_click_captcha()
    
    # Wait between actions
    sb.sleep(random.uniform(0.5, 1.5))

Randomized delays between 1-3 seconds make your automation less predictable.

Session consistency

Keep fingerprints consistent within a session. Changing User-Agent or other browser properties mid-session raises red flags.

# Use the same browser instance for related requests
with SB(uc=True) as sb:
    sb.uc_open_with_reconnect("https://example.com", 4)
    sb.uc_gui_click_captcha()
    
    # Stay in the same session
    sb.click("a.next-page")  # Internal navigation
    sb.sleep(2)
    
    sb.click("a.next-page")  # Same session

Error handling

Cloudflare bypass isn't 100% reliable. Build retry logic into your scripts.

from seleniumbase import SB
import time

def bypass_with_retry(url, max_attempts=3):
    for attempt in range(max_attempts):
        try:
            with SB(uc=True) as sb:
                sb.uc_open_with_reconnect(url, 4)
                sb.uc_gui_click_captcha()
                sb.sleep(2)
                
                # Verify we passed the challenge
                if "challenge" not in sb.get_current_url().lower():
                    return sb.get_page_source()
                    
        except Exception as e:
            print(f"Attempt {attempt + 1} failed: {e}")
            time.sleep(5)  # Wait before retry
    
    raise Exception("Failed after all attempts")

Rate limiting yourself

Even successful bypasses can lead to blocks if you scrape too aggressively.

import time
from collections import deque

class RateLimiter:
    def __init__(self, requests_per_minute=10):
        self.requests_per_minute = requests_per_minute
        self.timestamps = deque()
    
    def wait(self):
        now = time.time()
        
        # Remove timestamps older than 1 minute
        while self.timestamps and now - self.timestamps[0] > 60:
            self.timestamps.popleft()
        
        if len(self.timestamps) >= self.requests_per_minute:
            sleep_time = 60 - (now - self.timestamps[0])
            time.sleep(sleep_time)
        
        self.timestamps.append(time.time())

# Usage
limiter = RateLimiter(requests_per_minute=10)

for url in urls:
    limiter.wait()
    # Make request...

Ethical considerations

Bypassing Cloudflare protection should be done responsibly.

Do:

  • Scrape public data for legitimate purposes
  • Respect rate limits even when you can bypass them
  • Cache data to minimize requests
  • Identify yourself with a contact email when appropriate

Don't:

  • Scrape personal or private data without consent
  • Overload servers with requests
  • Bypass protections on sensitive sites (financial, healthcare, government)
  • Resell scraped data without proper rights

Check the website's robots.txt and Terms of Service. Many sites allow scraping of public data but prohibit automated access.

Frequently asked questions

Can SeleniumBase bypass all Cloudflare protection?

No tool bypasses Cloudflare 100% of the time. SeleniumBase UC Mode works reliably against basic and intermediate Cloudflare protection. For enterprise-level protection with per-customer ML models, you may need to combine it with residential proxies or try CDP Mode.

Success rates depend on the specific site configuration, your IP reputation, and Cloudflare's current detection methods. Expect 70-90% success on most sites.

Does SeleniumBase work in headless mode?

UC Mode is detectable in true headless mode. Cloudflare specifically checks for headless browser indicators.

Instead, use xvfb=True on Linux servers to run a headed browser in a virtual display. This maintains the stealth benefits while running without a physical screen.

# Correct approach for servers
with SB(uc=True, xvfb=True) as sb:
    # ...

The cf_clearance cookie validity varies from 15 minutes to several hours. Most commonly, it lasts 30 minutes to 2 hours.

Factors affecting validity:

  • Site configuration
  • Your behavioral patterns
  • IP consistency
  • Session activity

For long-running scrapers, implement cookie refresh logic that re-bypasses before expiration.

The legality depends on your jurisdiction, the website's terms of service, and what you do with the data.

Generally, scraping publicly available data for personal use, research, or journalism is legal in most jurisdictions. However, violating a website's Terms of Service could have civil consequences even if not criminal.

Key legal considerations:

  • CFAA in the United States
  • GDPR in Europe for personal data
  • Local computer access laws

Consult legal counsel for commercial applications.

Why doesn't simple HTTP requests work against Cloudflare?

Cloudflare requires JavaScript execution to verify visitors. Simple HTTP libraries like requests or curl can't execute JavaScript.

The Turnstile challenge runs cryptographic proofs in the browser. Without executing that JavaScript, you can't generate the valid response token.

Browser automation tools like SeleniumBase execute JavaScript, allowing them to complete these challenges.

SeleniumBase vs undetected-chromedriver: Which is better?

SeleniumBase is generally better for most use cases.

SeleniumBase advantages:

  • Automatic driver management
  • Built-in CAPTCHA handling methods
  • Context management (SB manager)
  • Virtual display support
  • Better documentation
  • Active maintenance

undetected-chromedriver advantages:

  • Lighter weight
  • Simpler if you need raw control
  • No extra abstractions

Use SeleniumBase unless you have specific reasons to need raw undetected-chromedriver.

How do I handle multiple Cloudflare challenges on one site?

Some sites have Turnstile on both the landing page and after form submission.

from seleniumbase import SB

with SB(uc=True) as sb:
    # First challenge on page load
    sb.uc_open_with_reconnect(url, 4)
    sb.uc_gui_click_captcha()
    sb.sleep(2)
    
    # Fill and submit form
    sb.type('input[name="email"]', "test@example.com")
    sb.uc_click('button[type="submit"]', reconnect_time=3)
    
    # Second challenge after submission
    sb.uc_gui_click_captcha()
    sb.sleep(2)

Call uc_gui_click_captcha() after each point where a challenge might appear.

Conclusion

SeleniumBase UC Mode provides the most reliable free method for bypassing Cloudflare Turnstile in 2026. The combination of ChromeDriver patching, strategic disconnection, and PyAutoGUI-based CAPTCHA clicking handles most protection levels.

Quick summary:

  1. Install SeleniumBase: pip install seleniumbase
  2. Use SB(uc=True) for UC Mode
  3. Open pages with uc_open_with_reconnect(url, 4)
  4. Handle CAPTCHAs with uc_gui_click_captcha()
  5. Switch to CDP Mode for tough cases

For a ready-to-use solution, check out the cloudflare-bypass-2026 repository on GitHub.

Remember that Cloudflare continuously updates their detection. Keep SeleniumBase updated and monitor for changes in bypass effectiveness.

]]>
<![CDATA[How to use XLogin.us for web scraping]]>https://roundproxies.com/blog/xloginus-web-scraping/697874f926f439f88a95b4feTue, 27 Jan 2026 08:25:12 GMTManaging multiple browser profiles for web scraping used to mean juggling virtual machines or getting constantly blocked. XLogin.us changed that for me.

Whether I'm scraping e-commerce prices across regions, collecting data from sites with aggressive anti-bot protection, or running parallel data collection tasks, XLogin.us keeps each session isolated with unique fingerprints.

If you're new to XLogin.us and want to use it for web scraping, you're in the right place. This guide covers everything from installation to building automated scrapers with Selenium.

What is XLogin.us?

XLogin.us is an antidetect browser designed for managing multiple browser profiles, each with a unique digital fingerprint. It creates isolated browsing environments where cookies, local storage, and cache files stay completely separate between profiles.

Unlike regular browsers that expose your real device information, XLogin.us replaces your browser fingerprint with custom values. This includes:

  • Canvas and WebGL fingerprints
  • Audio context fingerprints
  • Screen resolution and color depth
  • Timezone and language settings
  • Hardware concurrency and device memory
  • User agent strings

For web scraping, this means you can run multiple concurrent sessions without websites linking them together or detecting automation patterns.

XLogin.us supports automation through Selenium WebDriver and provides a REST API running on http://127.0.0.1:35000 for programmatic profile management.

Why use XLogin.us for web scraping?

Standard Selenium scrapers get detected fast. Websites check browser fingerprints, and a headless Chrome instance screams "bot" to any anti-bot system.

XLogin.us solves this by making each browser profile appear as a legitimate, unique user.

XLogin.us vs. regular Selenium

Feature Regular Selenium Selenium + XLogin.us
Fingerprint consistency Obvious automation markers Realistic human fingerprints
Multi-session support All sessions linked Each profile is isolated
Proxy integration Manual per-session config Built-in per-profile proxies
Cookie persistence Lost on restart Saved per profile
Detection resistance Low High

When to use XLogin.us

XLogin makes sense when you need to:

  • Scrape sites with anti-bot protection like Cloudflare or DataDome
  • Run multiple accounts or sessions in parallel
  • Maintain persistent sessions across scraping runs
  • Collect data from different geographic regions
  • Avoid IP bans and fingerprint-based blocking

When to skip it

For simple, low-volume scraping of static pages without protection, XLogin adds unnecessary complexity. Use basic Requests + BeautifulSoup instead.

How to install and set up XLogin.us

XLogin currently runs only on Windows. Here's how to get started.

Step 1: Download XLogin.us

Visit xlogin.us and download the installer.

The free trial gives you 3 days with 5 browser profiles, unlimited fingerprints, and full API access.

Step 2: Create an account

Launch XLogin.us and register a new account. You'll need a valid email for verification.

Step 3: Enable browser automation

This step is critical for Selenium integration.

  1. Open XLogin settings (gear icon)
  2. Navigate to "Browser Automation"
  3. Enable "Launch the browser automation port"
  4. Set the port to 35000 (default)
  5. Save settings

Without this, your Python scripts can't connect to XLogin profiles.

Step 4: Install Python dependencies

Open your terminal and install the required packages:

pip install selenium requests

For newer Selenium versions, the driver manager handles ChromeDriver automatically. But since we're connecting to XLogin's browser instances, we won't need it.

Creating your first browser profile

Before scraping, you need at least one browser profile.

Manual profile creation

  1. Click "New Browser Profile" in XLogin
  2. Enter a profile name (e.g., "scraper-profile-1")
  3. Choose the browser kernel version (latest Chrome recommended)
  4. Configure basic settings:
    • Operating System: Windows 10/11
    • Screen Resolution: Common values like 1920x1080
    • Language: Match your target site's region
  5. Leave fingerprint settings on "Auto" for realistic values
  6. Click "Create"

Creating profiles via API

For automated setup, use XLogin's REST API:

import requests

# XLogin API endpoint
API_BASE = "http://127.0.0.1:35000/api/v1"

def create_profile(name, proxy_config=None):
    """
    Create a new browser profile via XLogin API.
    
    Args:
        name: Profile display name
        proxy_config: Optional proxy string (type/host/port/user/pass)
    
    Returns:
        Profile ID if successful, None otherwise
    """
    endpoint = f"{API_BASE}/profile/create"
    
    params = {
        "name": name,
        "kernel": "chrome",  # Browser engine
        "os": "win",         # Operating system
    }
    
    if proxy_config:
        params["proxy"] = proxy_config
    
    response = requests.get(endpoint, params=params)
    
    if response.status_code == 200:
        data = response.json()
        if data.get("status") == "OK":
            return data.get("value")
    
    return None

# Create a new profile
profile_id = create_profile("scraper-profile-api")
print(f"Created profile: {profile_id}")

The API returns a unique profile ID that you'll use for all subsequent operations.

Configuring proxies for scraping

Every serious scraping setup needs proxy rotation. XLogin.us lets you assign proxies at the profile level.

Setting proxies in the UI

  1. Select your profile in XLogin.us
  2. Click "Edit Profile"
  3. Scroll to "Proxy Server"
  4. Enable "Use Proxy"
  5. Choose proxy type: HTTP, HTTPS, or SOCKS5
  6. Enter proxy details:
    • Host: proxy.example.com
    • Port: 8080
    • Username and Password (if authenticated)
  7. Click "Check Proxy" to verify connectivity
  8. Save the profile

Setting proxies via API

def create_profile_with_proxy(name, proxy_type, host, port, username=None, password=None):
    """
    Create a profile with proxy configuration.
    
    Args:
        name: Profile name
        proxy_type: http, https, or socks5
        host: Proxy server hostname
        port: Proxy server port
        username: Auth username (optional)
        password: Auth password (optional)
    
    Returns:
        Profile ID
    """
    # Build proxy string: type/host/port/user/pass
    if username and password:
        proxy_string = f"{proxy_type}/{host}/{port}/{username}/{password}"
    else:
        proxy_string = f"{proxy_type}/{host}/{port}"
    
    endpoint = f"{API_BASE}/profile/create_start"
    
    params = {
        "name": name,
        "proxy": proxy_string
    }
    
    response = requests.get(endpoint, params=params)
    return response.json()

# Example: Create profile with residential proxy
result = create_profile_with_proxy(
    name="geo-profile-us",
    proxy_type="http",
    host="us.residential-proxy.com",
    port="8080",
    username="user123",
    password="pass456"
)

Proxy rotation strategy

For large-scale scraping, create multiple profiles with different proxies:

proxies = [
    {"host": "us1.proxy.com", "port": "8080"},
    {"host": "us2.proxy.com", "port": "8080"},
    {"host": "uk1.proxy.com", "port": "8080"},
    {"host": "de1.proxy.com", "port": "8080"},
]

profiles = []
for i, proxy in enumerate(proxies):
    profile_id = create_profile_with_proxy(
        name=f"scraper-{i}",
        proxy_type="http",
        host=proxy["host"],
        port=proxy["port"]
    )
    profiles.append(profile_id)

print(f"Created {len(profiles)} profiles with different proxies")

Automating XLogin.us with Selenium

Here's where XLogin.us shines. You can connect Selenium to any XLogin profile and control it programmatically.

Understanding the connection flow

  1. Start an XLogin.us profile via API
  2. API returns a WebDriver port (e.g., http://127.0.0.1:XXXXX)
  3. Connect Selenium's Remote WebDriver to that port
  4. Control the browser as normal
  5. Stop the profile when done

Basic Selenium connection

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
import requests
import time

API_BASE = "http://127.0.0.1:35000/api/v1"

def start_profile(profile_id):
    """
    Start an XLogin profile and return the WebDriver URL.
    
    Args:
        profile_id: UUID of the profile to launch
    
    Returns:
        WebDriver URL string or None
    """
    endpoint = f"{API_BASE}/profile/start"
    params = {
        "automation": "true",
        "profileId": profile_id
    }
    
    response = requests.get(endpoint, params=params)
    
    if response.status_code == 200:
        data = response.json()
        if data.get("status") == "OK":
            return data.get("value")
    
    return None

def stop_profile(profile_id):
    """Stop a running XLogin profile."""
    endpoint = f"{API_BASE}/profile/stop"
    params = {"profileId": profile_id}
    requests.get(endpoint, params=params)

def connect_selenium(webdriver_url):
    """
    Connect Selenium to an XLogin browser instance.
    
    Args:
        webdriver_url: URL returned by start_profile()
    
    Returns:
        Selenium WebDriver instance
    """
    options = Options()
    
    driver = webdriver.Remote(
        command_executor=webdriver_url,
        options=options
    )
    
    return driver

# Example usage
profile_id = "YOUR-PROFILE-ID-HERE"

# Start the profile
webdriver_url = start_profile(profile_id)
print(f"WebDriver URL: {webdriver_url}")

# Give the browser time to fully launch
time.sleep(3)

# Connect Selenium
driver = connect_selenium(webdriver_url)

# Navigate to a page
driver.get("https://httpbin.org/headers")
print(driver.page_source)

# Clean up
driver.quit()
stop_profile(profile_id)

The key insight: XLogin's browser instance exposes a WebDriver-compatible endpoint. You connect to it exactly like you'd connect to Selenium Grid.

Building a complete web scraper

Let's build a practical scraper that extracts product data from an e-commerce site.

Project structure

xlogin-scraper/
├── config.py          # Profile IDs and settings
├── xlogin_client.py   # XLogin API wrapper
├── scraper.py         # Main scraping logic
└── requirements.txt   # Dependencies

XLogin.us client wrapper

Create a reusable client for XLogin.us operations:

# xlogin_client.py
import requests
import time
from selenium import webdriver
from selenium.webdriver.chrome.options import Options

class XLoginClient:
    """Wrapper for XLogin API and Selenium integration."""
    
    def __init__(self, api_base="http://127.0.0.1:35000/api/v1"):
        self.api_base = api_base
        self.active_profiles = {}
    
    def start_profile(self, profile_id, wait_time=3):
        """
        Start a profile and return a connected WebDriver.
        
        Args:
            profile_id: XLogin profile UUID
            wait_time: Seconds to wait for browser launch
        
        Returns:
            Selenium WebDriver instance
        """
        endpoint = f"{self.api_base}/profile/start"
        params = {
            "automation": "true",
            "profileId": profile_id
        }
        
        response = requests.get(endpoint, params=params)
        data = response.json()
        
        if data.get("status") != "OK":
            raise Exception(f"Failed to start profile: {data}")
        
        webdriver_url = data.get("value")
        time.sleep(wait_time)
        
        options = Options()
        driver = webdriver.Remote(
            command_executor=webdriver_url,
            options=options
        )
        
        self.active_profiles[profile_id] = driver
        return driver
    
    def stop_profile(self, profile_id):
        """Stop a profile and close its WebDriver."""
        if profile_id in self.active_profiles:
            try:
                self.active_profiles[profile_id].quit()
            except:
                pass
            del self.active_profiles[profile_id]
        
        endpoint = f"{self.api_base}/profile/stop"
        params = {"profileId": profile_id}
        requests.get(endpoint, params=params)
    
    def stop_all(self):
        """Stop all active profiles."""
        for profile_id in list(self.active_profiles.keys()):
            self.stop_profile(profile_id)
    
    def get_profile_list(self):
        """Get all available profiles."""
        endpoint = f"{self.api_base}/profile/list"
        response = requests.get(endpoint)
        return response.json()

Main scraper implementation

# scraper.py
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
import json
import time
import random

from xlogin_client import XLoginClient

class ProductScraper:
    """Scrape product data using XLogin profiles."""
    
    def __init__(self, profile_id):
        self.client = XLoginClient()
        self.profile_id = profile_id
        self.driver = None
    
    def __enter__(self):
        self.driver = self.client.start_profile(self.profile_id)
        return self
    
    def __exit__(self, exc_type, exc_val, exc_tb):
        self.client.stop_profile(self.profile_id)
    
    def random_delay(self, min_sec=1, max_sec=3):
        """Add human-like delays between actions."""
        time.sleep(random.uniform(min_sec, max_sec))
    
    def wait_for_element(self, by, value, timeout=10):
        """Wait for an element to be present."""
        try:
            element = WebDriverWait(self.driver, timeout).until(
                EC.presence_of_element_located((by, value))
            )
            return element
        except TimeoutException:
            return None
    
    def scrape_product_page(self, url):
        """
        Extract product data from a single page.
        
        Args:
            url: Product page URL
        
        Returns:
            Dictionary with product data
        """
        self.driver.get(url)
        self.random_delay(2, 4)
        
        product = {"url": url}
        
        # Wait for page to load
        self.wait_for_element(By.TAG_NAME, "body")
        
        # Extract title
        try:
            title_elem = self.driver.find_element(By.CSS_SELECTOR, "h1.product-title")
            product["title"] = title_elem.text.strip()
        except:
            product["title"] = None
        
        # Extract price
        try:
            price_elem = self.driver.find_element(By.CSS_SELECTOR, ".price-current")
            product["price"] = price_elem.text.strip()
        except:
            product["price"] = None
        
        # Extract description
        try:
            desc_elem = self.driver.find_element(By.CSS_SELECTOR, ".product-description")
            product["description"] = desc_elem.text.strip()
        except:
            product["description"] = None
        
        # Extract availability
        try:
            avail_elem = self.driver.find_element(By.CSS_SELECTOR, ".stock-status")
            product["in_stock"] = "in stock" in avail_elem.text.lower()
        except:
            product["in_stock"] = None
        
        return product
    
    def scrape_multiple(self, urls):
        """
        Scrape multiple product pages.
        
        Args:
            urls: List of product URLs
        
        Returns:
            List of product dictionaries
        """
        products = []
        
        for i, url in enumerate(urls):
            print(f"Scraping {i+1}/{len(urls)}: {url}")
            
            try:
                product = self.scrape_product_page(url)
                products.append(product)
            except Exception as e:
                print(f"Error scraping {url}: {e}")
                products.append({"url": url, "error": str(e)})
            
            # Random delay between pages
            if i < len(urls) - 1:
                self.random_delay(3, 7)
        
        return products


def main():
    """Run the scraper."""
    profile_id = "YOUR-PROFILE-ID"
    
    urls = [
        "https://example-shop.com/product/1",
        "https://example-shop.com/product/2",
        "https://example-shop.com/product/3",
    ]
    
    with ProductScraper(profile_id) as scraper:
        products = scraper.scrape_multiple(urls)
    
    # Save results
    with open("products.json", "w") as f:
        json.dump(products, f, indent=2)
    
    print(f"Scraped {len(products)} products")


if __name__ == "__main__":
    main()

This scraper uses context managers for clean resource handling and includes human-like delays to avoid detection.

Advanced techniques

Running multiple profiles in parallel

For faster scraping, run several profiles simultaneously:

from concurrent.futures import ThreadPoolExecutor, as_completed
from xlogin_client import XLoginClient

def scrape_with_profile(profile_id, urls):
    """Scrape URLs using a specific profile."""
    client = XLoginClient()
    results = []
    
    try:
        driver = client.start_profile(profile_id)
        
        for url in urls:
            driver.get(url)
            # ... extraction logic ...
            results.append({"url": url, "data": "..."})
        
    finally:
        client.stop_profile(profile_id)
    
    return results

def parallel_scrape(profile_ids, all_urls, max_workers=4):
    """
    Distribute URLs across multiple profiles.
    
    Args:
        profile_ids: List of XLogin profile UUIDs
        all_urls: Complete list of URLs to scrape
        max_workers: Number of concurrent profiles
    
    Returns:
        Combined results from all profiles
    """
    # Split URLs among profiles
    chunks = [[] for _ in profile_ids]
    for i, url in enumerate(all_urls):
        chunks[i % len(profile_ids)].append(url)
    
    all_results = []
    
    with ThreadPoolExecutor(max_workers=max_workers) as executor:
        futures = {
            executor.submit(scrape_with_profile, pid, chunk): pid
            for pid, chunk in zip(profile_ids, chunks)
            if chunk  # Skip empty chunks
        }
        
        for future in as_completed(futures):
            profile_id = futures[future]
            try:
                results = future.result()
                all_results.extend(results)
                print(f"Profile {profile_id[:8]}... completed {len(results)} URLs")
            except Exception as e:
                print(f"Profile {profile_id[:8]}... failed: {e}")
    
    return all_results

# Usage
profiles = ["profile-1-uuid", "profile-2-uuid", "profile-3-uuid"]
urls = ["https://site.com/page/" + str(i) for i in range(100)]

results = parallel_scrape(profiles, urls, max_workers=3)

Importing and managing cookies

Maintain login sessions by importing cookies:

import base64
import json

def import_cookies(profile_id, cookies):
    """
    Import cookies into an XLogin profile.
    
    Args:
        profile_id: Target profile UUID
        cookies: List of cookie dictionaries
    """
    # XLogin expects base64-encoded JSON
    cookies_json = json.dumps(cookies)
    cookies_b64 = base64.b64encode(cookies_json.encode()).decode()
    
    endpoint = f"{API_BASE}/profile/cookies/import"
    params = {
        "profileId": profile_id,
        "cookies": cookies_b64
    }
    
    response = requests.post(endpoint, data=params)
    return response.json()

def export_cookies(profile_id):
    """Export cookies from a profile."""
    endpoint = f"{API_BASE}/profile/cookies/export"
    params = {"profileId": profile_id}
    
    response = requests.get(endpoint, params=params)
    data = response.json()
    
    if data.get("status") == "OK":
        cookies_b64 = data.get("value")
        cookies_json = base64.b64decode(cookies_b64).decode()
        return json.loads(cookies_json)
    
    return None

# Export cookies after manual login
cookies = export_cookies("logged-in-profile-id")

# Import to a new profile
import_cookies("new-profile-id", cookies)

Batch profile creation

Create many profiles at once for large-scale operations:

def batch_create_profiles(count, name_prefix, proxy_list=None):
    """
    Create multiple profiles with optional proxy rotation.
    
    Args:
        count: Number of profiles to create
        name_prefix: Prefix for profile names
        proxy_list: Optional list of proxy configs
    
    Returns:
        List of created profile IDs
    """
    created = []
    
    for i in range(count):
        name = f"{name_prefix}-{i:03d}"
        
        proxy = None
        if proxy_list:
            proxy = proxy_list[i % len(proxy_list)]
        
        endpoint = f"{API_BASE}/profile/create"
        params = {"name": name}
        
        if proxy:
            params["proxy"] = f"http/{proxy['host']}/{proxy['port']}"
        
        response = requests.get(endpoint, params=params)
        data = response.json()
        
        if data.get("status") == "OK":
            created.append(data.get("value"))
            print(f"Created: {name}")
        else:
            print(f"Failed: {name}")
    
    return created

# Create 10 profiles with rotating proxies
proxies = [
    {"host": "proxy1.com", "port": "8080"},
    {"host": "proxy2.com", "port": "8080"},
]

profile_ids = batch_create_profiles(10, "scraper", proxies)

Common errors and troubleshooting

"Connection refused" on port 35000

Cause: XLogin.us isn't running or automation isn't enabled.

Fix:

  1. Make sure XLogin.us is open
  2. Go to Settings → Browser Automation
  3. Enable "Launch the browser automation port"
  4. Restart XLogin.us

"Profile not found" error

Cause: Invalid profile ID or profile was deleted.

Fix:

# List all profiles to find the correct ID
response = requests.get(f"{API_BASE}/profile/list")
profiles = response.json()
print(json.dumps(profiles, indent=2))

Selenium times out connecting

Cause: Profile didn't fully launch before Selenium tried to connect.

Fix: Increase the wait time after starting the profile:

webdriver_url = start_profile(profile_id)
time.sleep(5)  # Wait longer for slow systems
driver = connect_selenium(webdriver_url)

"WebDriver not reachable" after stopping profile

Cause: Profile was stopped but WebDriver reference wasn't cleaned up.

Fix: Always call driver.quit() before stopping the profile:

try:
    driver.quit()
except:
    pass
finally:
    stop_profile(profile_id)

Profile fingerprint detected

Cause: Some sites use advanced fingerprinting that detects inconsistencies.

Fix:

  1. Use "Auto" fingerprint settings instead of manual
  2. Ensure timezone matches your proxy's location
  3. Set language and locale to match the target region
  4. Keep the browser kernel updated

Best practices

1. One proxy per profile

Never share proxies between profiles. If two profiles use the same IP, sites can link them despite different fingerprints.

2. Match fingerprint to proxy location

If your proxy is in Germany, set the profile's timezone, language, and locale to German settings. Mismatches trigger detection.

3. Add realistic delays

Scraping at machine speed gets you blocked. Add random delays:

import random
time.sleep(random.uniform(2, 5))

4. Rotate user agents occasionally

Even with XLogin's fingerprint protection, rotating user agents across sessions adds another layer:

user_agents = [
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) Chrome/120...",
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) Chrome/119...",
]

5. Handle failures gracefully

Sites go down. Connections fail. Build retry logic:

def scrape_with_retry(url, max_retries=3):
    for attempt in range(max_retries):
        try:
            return scrape_page(url)
        except Exception as e:
            if attempt == max_retries - 1:
                raise
            time.sleep(5 * (attempt + 1))

6. Save cookies regularly

Export cookies after important actions (login, session refresh) to maintain state:

# After successful login
cookies = driver.get_cookies()
save_cookies_to_profile(profile_id, cookies)

7. Keep XLogin.us updated

XLogin.us regularly updates browser kernels and fingerprint databases. Outdated versions get detected more easily.

FAQs

Is XLogin free?

XLogin offers a 3-day free trial with 5 browser profiles and full API access. Paid plans start at $99/month for 200 profiles.

Can XLogin.us bypass Cloudflare?

XLogin.us helps by providing realistic fingerprints, but Cloudflare's advanced challenges may still require additional techniques like residential proxies and human-like behavior patterns.

How many profiles can I run simultaneously?

This depends on your hardware. Each profile consumes RAM and CPU. A typical machine handles 5-10 concurrent profiles comfortably. For more, you'll need beefier specs or distributed setups.

XLogin.us itself is legal software. The legality of scraping depends on what you scrape, how you use the data, and your jurisdiction. Always check a site's Terms of Service and relevant laws like GDPR and CFAA.

Can I use XLogin.us with Puppeteer instead of Selenium?

Yes, XLogin.us supports Puppeteer automation. The connection process is similar—you start the profile via API, get the WebSocket URL, and connect Puppeteer to it.

Conclusion

XLogin.us transforms web scraping from a constant battle with detection systems into a manageable operation.

The key workflow is straightforward:

  1. Create profiles with unique fingerprints and proxies
  2. Start profiles via the REST API
  3. Connect Selenium to the WebDriver endpoint
  4. Scrape with human-like behavior
  5. Stop profiles and export cookies for persistence

Start with the free trial to test your use case. Once you've validated that XLogin.us works for your target sites, scale up with more profiles and parallel execution.

]]>
<![CDATA[How to scrape YouTube in 2026: 5 methods (+ working code)]]>https://roundproxies.com/blog/scrape-youtube/69515d5a26f439f88a95a95eSat, 24 Jan 2026 10:56:22 GMTYouTube holds a goldmine of data. Video metadata, engagement metrics, comments, transcripts—it's all there waiting to be extracted for market research, sentiment analysis, or training ML models.

I've spent years building scrapers for YouTube data. Whether I'm tracking trending topics, analyzing competitor channels, or gathering datasets for content recommendation systems, I keep coming back to the same proven methods.

If you want to scrape YouTube and wondering where to start, you're in the right place.

In this guide, I'll walk you through five different methods to extract YouTube data—from quick metadata grabs to large-scale channel scraping.

What is YouTube Scraping?

YouTube scraping is the process of programmatically extracting data from YouTube pages. This includes video metadata, channel information, comments, transcripts, search results, and engagement metrics.

YouTube relies heavily on JavaScript to render content. This makes traditional HTTP-based scraping challenging.

However, YouTube also exposes hidden JSON endpoints and embeds structured data in its HTML. These provide easier extraction paths than parsing rendered HTML.

In practice, you can scrape YouTube to:

  • Extract video titles, descriptions, view counts, and like counts
  • Gather comments for sentiment analysis
  • Download transcripts for content analysis
  • Monitor channel growth and posting frequency
  • Track trending topics and keywords
  • Build datasets for machine learning projects

What Data Can You Extract from YouTube?

Before diving into methods, let's clarify what you can actually scrape from YouTube.

Video Data

Field Description
Title Video title
Description Full description text
View count Total views
Like count Number of likes
Comment count Total comments
Duration Video length
Upload date When published
Thumbnail URL Video thumbnail image
Tags Associated keywords
Category Content category

Channel Data

Field Description
Channel name Display name
Subscriber count Total subscribers
Video count Number of uploads
View count Total channel views
Description About section
Join date Channel creation date
Links External links

Additional Data

  • Comments: Text, author, likes, replies
  • Transcripts: Auto-generated and manual captions
  • Search results: Videos matching keywords
  • Playlists: Video lists and metadata

5 Methods to Scrape YouTube

Let's explore each method with working code examples.

Method 1: yt-dlp Library

Best for: Quick metadata extraction without browser overhead

Difficulty: Easy | Cost: Free | Speed: Fast

yt-dlp is a command-line tool and Python library forked from youtube-dl. It's the fastest way to extract YouTube metadata without rendering JavaScript.

Installation

pip install yt-dlp

Extract Video Metadata

from yt_dlp import YoutubeDL

def get_video_info(video_url):
    """Extract metadata from a YouTube video."""
    
    ydl_opts = {
        'quiet': True,
        'no_warnings': True,
        'extract_flat': False,
    }
    
    with YoutubeDL(ydl_opts) as ydl:
        info = ydl.extract_info(video_url, download=False)
        
    return {
        'title': info.get('title'),
        'description': info.get('description'),
        'view_count': info.get('view_count'),
        'like_count': info.get('like_count'),
        'duration': info.get('duration'),
        'upload_date': info.get('upload_date'),
        'channel': info.get('channel'),
        'channel_id': info.get('channel_id'),
        'tags': info.get('tags', []),
    }

# Usage
video_url = "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
data = get_video_info(video_url)
print(data)

The download=False parameter prevents downloading the actual video file.

Extract Comments

from yt_dlp import YoutubeDL

def get_video_comments(video_url, max_comments=100):
    """Extract comments from a YouTube video."""
    
    ydl_opts = {
        'quiet': True,
        'no_warnings': True,
        'getcomments': True,
        'extractor_args': {
            'youtube': {
                'max_comments': [str(max_comments)]
            }
        }
    }
    
    with YoutubeDL(ydl_opts) as ydl:
        info = ydl.extract_info(video_url, download=False)
        
    comments = info.get('comments', [])
    
    return [{
        'text': c.get('text'),
        'author': c.get('author'),
        'likes': c.get('like_count'),
        'timestamp': c.get('timestamp'),
    } for c in comments]

# Usage
comments = get_video_comments(video_url)
for comment in comments[:5]:
    print(f"{comment['author']}: {comment['text'][:50]}...")

Scrape YouTube Search Results

from yt_dlp import YoutubeDL

def search_youtube(query, max_results=10):
    """Search YouTube and return video metadata."""
    
    ydl_opts = {
        'quiet': True,
        'no_warnings': True,
        'extract_flat': True,
        'playlistend': max_results,
    }
    
    search_url = f"ytsearch{max_results}:{query}"
    
    with YoutubeDL(ydl_opts) as ydl:
        results = ydl.extract_info(search_url, download=False)
        
    videos = []
    for entry in results.get('entries', []):
        videos.append({
            'title': entry.get('title'),
            'url': entry.get('url'),
            'duration': entry.get('duration'),
            'view_count': entry.get('view_count'),
            'channel': entry.get('channel'),
        })
    
    return videos

# Usage
results = search_youtube("python web scraping tutorial")
for video in results:
    print(f"{video['title']}")

Pros and Cons

Pros:

  • No browser required—very fast
  • Handles most anti-bot detection automatically
  • Extracts comprehensive metadata
  • Active development and updates

Cons:

  • Can trigger sign-in prompts at scale
  • Limited control over request headers
  • Comments extraction can be slow

Method 2: YouTube Data API v3

Best for: Reliable, structured data with official support

Difficulty: Easy | Cost: Free (with quota limits) | Speed: Fast

The YouTube Data API is the official way to access YouTube data. It's reliable and returns clean JSON responses.

The downside? You're limited to 10,000 quota units per day.

Setup

  1. Go to Google Cloud Console
  2. Create a new project
  3. Enable YouTube Data API v3
  4. Create an API key under Credentials

Installation

pip install google-api-python-client

Search Videos

from googleapiclient.discovery import build

API_KEY = "YOUR_API_KEY"

def search_videos(query, max_results=10):
    """Search YouTube using the official API."""
    
    youtube = build('youtube', 'v3', developerKey=API_KEY)
    
    request = youtube.search().list(
        q=query,
        part='id,snippet',
        maxResults=max_results,
        type='video'
    )
    
    response = request.execute()
    
    videos = []
    for item in response.get('items', []):
        videos.append({
            'video_id': item['id']['videoId'],
            'title': item['snippet']['title'],
            'description': item['snippet']['description'],
            'channel': item['snippet']['channelTitle'],
            'published_at': item['snippet']['publishedAt'],
            'thumbnail': item['snippet']['thumbnails']['high']['url'],
        })
    
    return videos

# Usage
results = search_videos("machine learning tutorial")
for video in results:
    print(f"{video['title']}")

Get Video Statistics

def get_video_stats(video_ids):
    """Get detailed statistics for videos."""
    
    youtube = build('youtube', 'v3', developerKey=API_KEY)
    
    # API accepts up to 50 IDs per request
    request = youtube.videos().list(
        id=','.join(video_ids),
        part='statistics,contentDetails,snippet'
    )
    
    response = request.execute()
    
    stats = []
    for item in response.get('items', []):
        stats.append({
            'video_id': item['id'],
            'title': item['snippet']['title'],
            'view_count': int(item['statistics'].get('viewCount', 0)),
            'like_count': int(item['statistics'].get('likeCount', 0)),
            'comment_count': int(item['statistics'].get('commentCount', 0)),
            'duration': item['contentDetails']['duration'],
        })
    
    return stats

# Usage
video_ids = ['dQw4w9WgXcQ', 'kJQP7kiw5Fk']
stats = get_video_stats(video_ids)

Get Channel Information

def get_channel_info(channel_id):
    """Get channel details and statistics."""
    
    youtube = build('youtube', 'v3', developerKey=API_KEY)
    
    request = youtube.channels().list(
        id=channel_id,
        part='snippet,statistics,contentDetails'
    )
    
    response = request.execute()
    
    if not response.get('items'):
        return None
    
    item = response['items'][0]
    
    return {
        'channel_id': item['id'],
        'title': item['snippet']['title'],
        'description': item['snippet']['description'],
        'subscriber_count': int(item['statistics'].get('subscriberCount', 0)),
        'video_count': int(item['statistics'].get('videoCount', 0)),
        'view_count': int(item['statistics'].get('viewCount', 0)),
        'uploads_playlist': item['contentDetails']['relatedPlaylists']['uploads'],
    }

# Usage
channel = get_channel_info('UC8butISFwT-Wl7EV0hUK0BQ')
print(f"{channel['title']}: {channel['subscriber_count']} subscribers")

Quota Costs

Each API call consumes quota units:

Operation Cost
search.list 100 units
videos.list 1 unit
channels.list 1 unit
commentThreads.list 1 unit

With 10,000 units daily, you can make roughly 100 searches or 10,000 video detail requests.

Pros and Cons

Pros:

  • Official and reliable
  • Clean JSON responses
  • No blocking or CAPTCHAs
  • Well-documented

Cons:

  • 10,000 quota units daily limit
  • Search costs 100 units per call
  • Doesn't include all public data
  • Requires API key management

Method 3: Hidden JSON Endpoints

Best for: Bypassing API limits with direct data access

Difficulty: Medium | Cost: Free | Speed: Fast

YouTube embeds JSON data directly in its HTML pages. The ytInitialData and ytInitialPlayerResponse objects contain structured data you can parse without rendering JavaScript.

Extract ytInitialData

import requests
import re
import json

def extract_initial_data(url):
    """Extract ytInitialData from YouTube page."""
    
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
        'Accept-Language': 'en-US,en;q=0.9',
    }
    
    response = requests.get(url, headers=headers)
    response.raise_for_status()
    
    # Find ytInitialData in script tags
    pattern = r'var ytInitialData = ({.*?});'
    match = re.search(pattern, response.text)
    
    if not match:
        # Try alternative pattern
        pattern = r'ytInitialData\s*=\s*({.*?});'
        match = re.search(pattern, response.text)
    
    if match:
        return json.loads(match.group(1))
    
    return None

# Usage
url = "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
data = extract_initial_data(url)

Parse Video Page Data

def parse_video_data(initial_data):
    """Parse video information from ytInitialData."""
    
    try:
        # Navigate to video details
        contents = initial_data['contents']['twoColumnWatchNextResults']
        primary = contents['results']['results']['contents']
        
        video_info = {}
        
        for content in primary:
            if 'videoPrimaryInfoRenderer' in content:
                renderer = content['videoPrimaryInfoRenderer']
                video_info['title'] = renderer['title']['runs'][0]['text']
                video_info['views'] = renderer['viewCount']['videoViewCountRenderer']['viewCount']['simpleText']
                
            if 'videoSecondaryInfoRenderer' in content:
                renderer = content['videoSecondaryInfoRenderer']
                video_info['channel'] = renderer['owner']['videoOwnerRenderer']['title']['runs'][0]['text']
                video_info['description'] = renderer.get('attributedDescription', {}).get('content', '')
        
        return video_info
        
    except (KeyError, IndexError) as e:
        print(f"Parse error: {e}")
        return None

Scrape Search Results via Hidden API

def scrape_youtube_search(query):
    """Scrape search results using hidden endpoint."""
    
    search_url = f"https://www.youtube.com/results?search_query={query}"
    
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
        'Accept-Language': 'en-US,en;q=0.9',
    }
    
    response = requests.get(search_url, headers=headers)
    initial_data = extract_initial_data(search_url)
    
    if not initial_data:
        return []
    
    videos = []
    
    try:
        contents = initial_data['contents']['twoColumnSearchResultsRenderer']
        items = contents['primaryContents']['sectionListRenderer']['contents'][0]
        results = items['itemSectionRenderer']['contents']
        
        for item in results:
            if 'videoRenderer' in item:
                renderer = item['videoRenderer']
                videos.append({
                    'video_id': renderer['videoId'],
                    'title': renderer['title']['runs'][0]['text'],
                    'channel': renderer['ownerText']['runs'][0]['text'],
                    'views': renderer.get('viewCountText', {}).get('simpleText', 'N/A'),
                    'duration': renderer.get('lengthText', {}).get('simpleText', 'N/A'),
                })
    
    except (KeyError, IndexError):
        pass
    
    return videos

Handle Pagination with Continuation Tokens

def get_continuation_data(continuation_token):
    """Fetch next page using continuation token."""
    
    api_url = "https://www.youtube.com/youtubei/v1/browse"
    
    headers = {
        'Content-Type': 'application/json',
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
    }
    
    payload = {
        'context': {
            'client': {
                'clientName': 'WEB',
                'clientVersion': '2.20240101.00.00',
            }
        },
        'continuation': continuation_token,
    }
    
    response = requests.post(api_url, headers=headers, json=payload)
    return response.json()

Pros and Cons

Pros:

  • No API key required
  • Faster than browser automation
  • Access to data not in official API
  • No quota limits

Cons:

  • Endpoints change without notice
  • Requires understanding JSON structure
  • Can break with YouTube updates
  • More complex parsing logic

Method 4: Selenium Browser Automation

Best for: Dynamic content requiring JavaScript execution

Difficulty: Medium | Cost: Free | Speed: Slow

When hidden endpoints don't work, Selenium provides full browser control. It renders JavaScript and handles dynamic content like infinite scroll.

Installation

pip install selenium webdriver-manager

Basic Setup

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from webdriver_manager.chrome import ChromeDriverManager
import time

def create_driver():
    """Create a configured Chrome driver."""
    
    options = Options()
    options.add_argument('--headless')
    options.add_argument('--no-sandbox')
    options.add_argument('--disable-dev-shm-usage')
    options.add_argument('--disable-blink-features=AutomationControlled')
    options.add_argument('user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36')
    
    service = Service(ChromeDriverManager().install())
    driver = webdriver.Chrome(service=service, options=options)
    
    return driver

Scrape Channel Videos

def scrape_channel_videos(channel_url, max_videos=50):
    """Scrape all videos from a YouTube channel."""
    
    driver = create_driver()
    videos = []
    
    try:
        # Navigate to channel videos tab
        videos_url = f"{channel_url}/videos"
        driver.get(videos_url)
        
        # Wait for content to load
        WebDriverWait(driver, 10).until(
            EC.presence_of_element_located((By.ID, "contents"))
        )
        
        # Scroll to load more videos
        last_height = driver.execute_script("return document.documentElement.scrollHeight")
        
        while len(videos) < max_videos:
            # Scroll down
            driver.execute_script("window.scrollTo(0, document.documentElement.scrollHeight);")
            time.sleep(2)
            
            # Check if we've reached the bottom
            new_height = driver.execute_script("return document.documentElement.scrollHeight")
            if new_height == last_height:
                break
            last_height = new_height
            
            # Extract video elements
            video_elements = driver.find_elements(By.CSS_SELECTOR, "ytd-rich-item-renderer")
            
            for element in video_elements:
                if len(videos) >= max_videos:
                    break
                    
                try:
                    title_elem = element.find_element(By.CSS_SELECTOR, "#video-title")
                    views_elem = element.find_element(By.CSS_SELECTOR, "#metadata-line span:first-child")
                    
                    video_data = {
                        'title': title_elem.text,
                        'url': title_elem.get_attribute('href'),
                        'views': views_elem.text,
                    }
                    
                    if video_data not in videos:
                        videos.append(video_data)
                        
                except Exception:
                    continue
        
        return videos
        
    finally:
        driver.quit()

Extract Video Details

def scrape_video_details(video_url):
    """Scrape detailed information from a video page."""
    
    driver = create_driver()
    
    try:
        driver.get(video_url)
        
        # Wait for video info to load
        WebDriverWait(driver, 10).until(
            EC.presence_of_element_located((By.CSS_SELECTOR, "h1.ytd-watch-metadata"))
        )
        
        # Expand description
        try:
            expand_btn = driver.find_element(By.CSS_SELECTOR, "#expand")
            expand_btn.click()
            time.sleep(1)
        except Exception:
            pass
        
        # Extract data
        title = driver.find_element(By.CSS_SELECTOR, "h1.ytd-watch-metadata").text
        
        # Get view count from info section
        info_text = driver.find_element(By.CSS_SELECTOR, "#info-container").text
        
        # Get channel name
        channel = driver.find_element(By.CSS_SELECTOR, "#channel-name a").text
        
        # Get description
        description = driver.find_element(By.CSS_SELECTOR, "#description-inner").text
        
        return {
            'title': title,
            'channel': channel,
            'description': description,
            'info': info_text,
        }
        
    finally:
        driver.quit()

Pros and Cons

Pros:

  • Handles any JavaScript-rendered content
  • Full browser capabilities
  • Can interact with page elements
  • Works when other methods fail

Cons:

  • Slowest method
  • High resource usage
  • More likely to trigger detection
  • Complex to maintain

Method 5: Playwright with Stealth

Best for: Evading bot detection while automating browsers

Difficulty: Hard | Cost: Free | Speed: Medium

Playwright offers better stealth capabilities than Selenium. Combined with anti-detection techniques, it can bypass most bot detection systems.

Installation

pip install playwright playwright-stealth
playwright install chromium

Stealth Configuration

from playwright.sync_api import sync_playwright
from playwright_stealth import stealth_sync

def create_stealth_browser():
    """Create a browser with stealth mode enabled."""
    
    playwright = sync_playwright().start()
    
    browser = playwright.chromium.launch(
        headless=True,
        args=[
            '--disable-blink-features=AutomationControlled',
            '--no-sandbox',
        ]
    )
    
    context = browser.new_context(
        viewport={'width': 1920, 'height': 1080},
        user_agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
        locale='en-US',
    )
    
    page = context.new_page()
    stealth_sync(page)
    
    return playwright, browser, page

Scrape with Stealth

def scrape_youtube_stealth(url):
    """Scrape YouTube with stealth mode to avoid detection."""
    
    playwright, browser, page = create_stealth_browser()
    
    try:
        page.goto(url, wait_until='networkidle')
        
        # Handle cookie consent if present
        try:
            consent_button = page.locator('button:has-text("Accept all")')
            if consent_button.is_visible():
                consent_button.click()
                page.wait_for_timeout(1000)
        except Exception:
            pass
        
        # Wait for content
        page.wait_for_selector('#contents', timeout=10000)
        
        # Extract data using JavaScript
        data = page.evaluate('''() => {
            const videos = [];
            const items = document.querySelectorAll('ytd-video-renderer, ytd-rich-item-renderer');
            
            items.forEach(item => {
                const titleEl = item.querySelector('#video-title');
                const viewsEl = item.querySelector('#metadata-line span');
                
                if (titleEl) {
                    videos.push({
                        title: titleEl.textContent.trim(),
                        url: titleEl.href,
                        views: viewsEl ? viewsEl.textContent.trim() : 'N/A'
                    });
                }
            });
            
            return videos;
        }''')
        
        return data
        
    finally:
        browser.close()
        playwright.stop()

Block Unnecessary Resources

def scrape_fast_stealth(url):
    """Scrape with resource blocking for faster loads."""
    
    playwright, browser, page = create_stealth_browser()
    
    # Block images, videos, and fonts
    page.route('**/*.{png,jpg,jpeg,gif,webp,svg,mp4,webm,woff,woff2}', 
               lambda route: route.abort())
    
    page.route('**/googlevideo.com/**', lambda route: route.abort())
    
    try:
        page.goto(url, wait_until='domcontentloaded')
        page.wait_for_selector('#contents', timeout=10000)
        
        # Extract data...
        return page.content()
        
    finally:
        browser.close()
        playwright.stop()

Pros and Cons

Pros:

  • Best anti-detection capabilities
  • Modern API design
  • Auto-waiting for elements
  • Supports multiple browsers

Cons:

  • Steeper learning curve
  • Requires additional setup
  • Still slower than direct HTTP
  • Can still be detected at scale

Comparison: Which Method Should You Use?

Method Speed Difficulty Anti-Bot Handling Best For
yt-dlp ⚡ Fast Easy Good Quick metadata extraction
YouTube API ⚡ Fast Easy N/A Reliable structured data
Hidden JSON ⚡ Fast Medium Manual Bypassing API limits
Selenium 🐢 Slow Medium Poor Legacy systems
Playwright 🐢 Medium Hard Good Stealth scraping

Decision Guide

Choose yt-dlp if:

  • You need video metadata quickly
  • You're scraping fewer than 1,000 videos
  • You want the simplest solution

Choose YouTube API if:

  • You need reliable, official data
  • Your daily needs fit within quota
  • You want clean, structured responses

Choose Hidden JSON if:

  • API quotas are insufficient
  • You understand JSON parsing
  • You can maintain code when endpoints change

Choose Selenium/Playwright if:

  • Other methods are blocked
  • You need to interact with page elements
  • You're scraping dynamic content

Handling Anti-Bot Detection

YouTube actively detects and blocks automated access. Here's how to stay under the radar.

Use Rotating Proxies

Residential proxies distribute requests across real IP addresses.

import requests

proxy = {
    'http': 'http://user:pass@proxy-server:port',
    'https': 'http://user:pass@proxy-server:port',
}

response = requests.get(url, proxies=proxy)

For high-volume YouTube scraping, residential proxies from providers like Roundproxies significantly reduce blocking.

Add Request Delays

Never hammer YouTube with rapid requests.

import time
import random

def scrape_with_delay(urls):
    results = []
    
    for url in urls:
        result = scrape_url(url)
        results.append(result)
        
        # Random delay between 2-5 seconds
        delay = random.uniform(2, 5)
        time.sleep(delay)
    
    return results

Rotate User Agents

import random

USER_AGENTS = [
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:121.0) Gecko/20100101 Firefox/121.0',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.2 Safari/605.1.15',
]

headers = {
    'User-Agent': random.choice(USER_AGENTS),
    'Accept-Language': 'en-US,en;q=0.9',
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
}

Common Errors and Troubleshooting

"Sign in to confirm you're not a bot"

Cause: YouTube detected automated access.

Fix: Use yt-dlp with cookies from a logged-in session:

# Export cookies from browser
yt-dlp --cookies-from-browser chrome "VIDEO_URL"

# Or use a cookies file
yt-dlp --cookies cookies.txt "VIDEO_URL"

403 Forbidden Error

Cause: Request blocked by YouTube.

Fix:

  • Add realistic headers
  • Use residential proxies
  • Reduce request frequency

Empty ytInitialData

Cause: Page loaded with different structure or region restriction.

Fix:

  • Check if content requires sign-in
  • Try different Accept-Language headers
  • Use a VPN for region-locked content

Selenium Timeout Errors

Cause: Element not loading in time.

Fix:

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

# Increase timeout
element = WebDriverWait(driver, 30).until(
    EC.presence_of_element_located((By.ID, "contents"))
)

Best Practices

1. Respect Rate Limits

Even without explicit limits, excessive requests harm YouTube's servers.

  • Add 2-5 second delays between requests
  • Limit concurrent connections
  • Implement exponential backoff on errors

2. Cache Responses

Don't re-scrape data you already have.

import hashlib
import json
import os

def get_cached_or_fetch(url):
    cache_dir = '.cache'
    os.makedirs(cache_dir, exist_ok=True)
    
    # Create cache key from URL
    cache_key = hashlib.md5(url.encode()).hexdigest()
    cache_file = f'{cache_dir}/{cache_key}.json'
    
    # Check cache
    if os.path.exists(cache_file):
        with open(cache_file) as f:
            return json.load(f)
    
    # Fetch and cache
    data = fetch_data(url)
    with open(cache_file, 'w') as f:
        json.dump(data, f)
    
    return data

3. Handle Errors Gracefully

import logging
from tenacity import retry, stop_after_attempt, wait_exponential

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

@retry(stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=2, max=10))
def fetch_with_retry(url):
    try:
        response = requests.get(url, timeout=10)
        response.raise_for_status()
        return response
    except requests.RequestException as e:
        logger.warning(f"Request failed: {e}")
        raise

4. Save Raw Responses

Always save original data before parsing.

def scrape_and_save(url, output_dir='raw_data'):
    os.makedirs(output_dir, exist_ok=True)
    
    response = requests.get(url)
    
    # Save raw response
    timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
    filename = f'{output_dir}/response_{timestamp}.html'
    
    with open(filename, 'w', encoding='utf-8') as f:
        f.write(response.text)
    
    # Then parse
    return parse_response(response.text)

Before you scrape YouTube, understand the legal landscape.

Terms of Service

YouTube's Terms of Service prohibit automated access. However, courts have generally ruled that scraping publicly available data is legal.

Key considerations:

  • Don't scrape private or logged-in content
  • Don't circumvent technical protection measures
  • Don't use scraped data to compete with YouTube
  • Respect robots.txt (advisory)

Ethical Guidelines

Do:

  • Scrape only public data
  • Identify your scraper with a contact email
  • Minimize server load
  • Cache data to reduce requests
  • Use data responsibly

Don't:

  • Scrape personal user data without consent
  • Republish copyrighted content
  • Overload YouTube's servers
  • Sell scraped data commercially without legal review

When to Use the Official API

For commercial projects or applications requiring reliability, use the YouTube Data API. It's designed for programmatic access and won't get you blocked.

FAQs

Scraping publicly available data is generally legal, but violates YouTube's Terms of Service. Use at your own risk for personal or research purposes. For commercial use, consult legal counsel or use the official API.

Can I download YouTube videos with these methods?

Yes, yt-dlp supports video downloads. However, downloading copyrighted content may violate copyright laws. Only download videos you have rights to.

How do I scrape YouTube comments at scale?

Use yt-dlp with the --get-comments flag or the YouTube Data API's commentThreads.list endpoint. For large volumes, implement pagination and rate limiting.

Why does my scraper keep getting blocked?

YouTube blocks scrapers that:

  • Send too many requests too fast
  • Use datacenter IPs
  • Have bot-like fingerprints
  • Lack realistic headers

Use residential proxies, add delays, and rotate user agents to avoid detection.

What's the difference between yt-dlp and youtube-dl?

yt-dlp is an actively maintained fork of youtube-dl with better performance, more features, and faster bug fixes. Always use yt-dlp for new projects.

Conclusion

You now have five proven methods to scrape YouTube data:

  1. yt-dlp for quick metadata extraction
  2. YouTube Data API for official, reliable access
  3. Hidden JSON endpoints for bypassing quota limits
  4. Selenium for legacy automation needs
  5. Playwright for stealth scraping

Start with yt-dlp for simple tasks. Use the API for commercial projects. Fall back to browser automation only when necessary.

Remember to scrape responsibly, cache your data, and respect rate limits.

]]>
<![CDATA[How to bypass Reblaze in 2026: 7 best methods]]>You've built your scraper. It runs flawlessly on test pages. Then you point it at a protected enterprise site and everything falls apart.

Your requests return cryptic challenge pages. Your IP gets flagged. The rbzid cookie never validates.

Welcome to Reblaze.

Reblaze protects thousands of enterprise websites, APIs,

]]>
https://roundproxies.com/blog/bypass-reblaze/6973f96f26f439f88a95b3e9Fri, 23 Jan 2026 23:48:46 GMTYou've built your scraper. It runs flawlessly on test pages. Then you point it at a protected enterprise site and everything falls apart.

Your requests return cryptic challenge pages. Your IP gets flagged. The rbzid cookie never validates.

Welcome to Reblaze.

Reblaze protects thousands of enterprise websites, APIs, and web applications globally. It's deployed on AWS, Azure, and Google Cloud as a full-stack security layer.

If you're scraping financial services, travel sites, or enterprise SaaS platforms, you'll encounter it eventually.

In this guide, I'll show you 7 proven methods to bypass Reblaze—from simple header adjustments to advanced behavioral emulation. Each method has trade-offs, so I'll help you pick the right one for your specific situation.

What is Reblaze and Why Does It Block Scrapers?

Reblaze is a cloud-based Web Application Firewall (WAF) and bot mitigation platform. It sits in front of web servers as a reverse proxy, analyzing every request before it reaches the origin.

Unlike simpler protection systems, Reblaze uses a multi-layered detection approach. It doesn't rely on a single technique—it combines several methods simultaneously.

When your scraper sends a request to a Reblaze-protected site, the platform analyzes multiple signals:

IP and Network Analysis Reblaze checks IP reputation, detects VPNs, proxies, TOR exit nodes, and cloud platform IPs. Known datacenter ranges get flagged immediately.

Browser Environment Detection The platform injects JavaScript challenges that verify your browser environment. It checks for automation markers like navigator.webdriver and headless browser signatures.

Signature Detection Request patterns, header configurations, and known bot fingerprints trigger instant blocks. Default Selenium or Puppeteer configurations fail here.

Behavioral Analysis This is where Reblaze differs from competitors. It uses machine learning to build behavioral profiles—tracking mouse movements, click patterns, scroll behavior, typing speed, and session timing.

Cookie-Based Tracking Reblaze sets an rbzid cookie to track sessions. Requests without valid authentication cookies face additional challenges.

Why Standard Scrapers Fail

Standard scraping tools fail Reblaze checks because they lack legitimate browser characteristics.

A basic requests call doesn't execute JavaScript. Selenium exposes automation flags. Even headless browsers leak detectable signals through missing APIs and unnatural behavior patterns.

Reblaze identifies these gaps and blocks the request—sometimes silently, sometimes with a challenge page.

Reblaze Protection Levels

Reblaze offers different protection intensities:

ACL Filtering Basic IP and network-based filtering. Easiest to bypass with good proxies.

Active Challenges JavaScript redirects that require browser execution. Moderate difficulty.

Passive Challenges with Biometric Detection Full behavioral analysis including mouse tracking and interaction patterns. Hardest to bypass.

The methods below address each protection level.

7 Methods to Bypass Reblaze

Before diving into implementations, here's a quick comparison:

Method Difficulty Cost Best For Success Rate
Header Optimization Easy Free Basic ACL filtering Medium
Session & Cookie Management Easy Free Maintaining auth state Medium
Residential Proxy Rotation Medium $ IP-based blocks High
Puppeteer Stealth Medium Free JS challenges High
Nodriver Medium Free Advanced detection Very High
Behavioral Emulation Hard Free Biometric checks Very High
TLS Fingerprint Spoofing Hard Free Advanced fingerprinting High
Quick recommendation: Start with Method 1 (headers) plus Method 3 (residential proxies). If you're still blocked, move to Method 4 or 5 for browser automation.

Basic Methods

1. Header Optimization

Optimize your HTTP headers to mimic legitimate browser traffic.

Best for: Sites with minimal Reblaze protection
Difficulty: Easy
Cost: Free
Success rate: Medium (works against basic ACL filtering)

How it works

Reblaze analyzes HTTP headers to identify bot traffic. Default scraper headers are obvious red flags.

A real browser sends dozens of headers in a specific order. Missing headers, wrong values, or unusual ordering triggers suspicion.

The goal is making your requests indistinguishable from Chrome or Firefox traffic.

Implementation

import requests
from fake_useragent import UserAgent

def create_browser_headers():
    """
    Generate headers that mimic a real Chrome browser.
    Order matters - Reblaze checks header sequence.
    """
    ua = UserAgent()
    
    headers = {
        'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8',
        'Accept-Encoding': 'gzip, deflate, br',
        'Accept-Language': 'en-US,en;q=0.9',
        'Cache-Control': 'max-age=0',
        'Connection': 'keep-alive',
        'Sec-Ch-Ua': '"Chromium";v="122", "Not(A:Brand";v="24", "Google Chrome";v="122"',
        'Sec-Ch-Ua-Mobile': '?0',
        'Sec-Ch-Ua-Platform': '"Windows"',
        'Sec-Fetch-Dest': 'document',
        'Sec-Fetch-Mode': 'navigate',
        'Sec-Fetch-Site': 'none',
        'Sec-Fetch-User': '?1',
        'Upgrade-Insecure-Requests': '1',
        'User-Agent': ua.chrome,
    }
    
    return headers

def scrape_with_headers(url):
    """
    Make a request with optimized headers.
    """
    session = requests.Session()
    headers = create_browser_headers()
    
    response = session.get(
        url,
        headers=headers,
        timeout=30
    )
    
    return response

The code above creates a header set that matches Chrome 122. The Sec-Ch-Ua headers are client hints that modern browsers send.

Missing these headers immediately identifies your request as non-browser traffic.

Key headers explained

The Sec-Fetch-* headers tell the server about request context. Real browsers always include them.

Sec-Ch-Ua identifies the browser brand and version. Reblaze validates this against the User-Agent string.

Header order affects detection. Some WAFs flag requests where headers appear in unusual sequences.

Pros and cons

Pros:

  • Zero cost
  • Easy to implement
  • Works for basic protection

Cons:

  • Fails against JavaScript challenges
  • Headers alone won't pass behavioral analysis
  • Requires regular updates as browser versions change

When to use this method

Use header optimization when:

  • You're scraping sites with light protection
  • Requests work in a browser but fail with basic scripts
  • You want a quick fix before trying advanced methods

Avoid this method if:

  • The site shows challenge pages
  • You're getting 403 responses even with good headers
  • The site requires JavaScript execution

Properly manage sessions and preserve the rbzid cookie across requests.

Best for: Maintaining authenticated state after passing initial challenges
Difficulty: Easy
Cost: Free
Success rate: Medium (essential complement to other methods)

How it works

Reblaze sets an rbzid cookie after successful challenge completion. This cookie identifies your session as verified.

Subsequent requests must include this cookie. Without it, you'll face challenges repeatedly.

A persistent session also maintains other cookies and connection state that Reblaze tracks.

Implementation

import requests
import pickle
import os

class ReblazeSessions:
    """
    Manage sessions with persistent cookie storage.
    Preserves rbzid and other auth cookies across requests.
    """
    
    def __init__(self, cookie_file='reblaze_cookies.pkl'):
        self.cookie_file = cookie_file
        self.session = requests.Session()
        self._load_cookies()
    
    def _load_cookies(self):
        """Load cookies from file if they exist."""
        if os.path.exists(self.cookie_file):
            with open(self.cookie_file, 'rb') as f:
                self.session.cookies.update(pickle.load(f))
    
    def _save_cookies(self):
        """Persist cookies to file for reuse."""
        with open(self.cookie_file, 'wb') as f:
            pickle.dump(self.session.cookies, f)
    
    def get(self, url, headers=None):
        """
        Make GET request with session persistence.
        """
        response = self.session.get(url, headers=headers, timeout=30)
        
        # Check if we received the rbzid cookie
        if 'rbzid' in self.session.cookies:
            print(f"[+] rbzid cookie acquired: {self.session.cookies['rbzid'][:20]}...")
            self._save_cookies()
        
        return response
    
    def has_valid_session(self):
        """Check if we have an rbzid cookie."""
        return 'rbzid' in self.session.cookies


# Usage example
scraper = ReblazeSessions()

# First request - may trigger challenge
response = scraper.get('https://target-site.com')

# If challenge passed, subsequent requests use saved cookies
if scraper.has_valid_session():
    response = scraper.get('https://target-site.com/data')

This code creates a session manager that persists cookies between runs. Once you pass a challenge (manually or through other methods), the session remains valid.

Reblaze cookies have expiration times. Your code should handle refreshes:

import time
from datetime import datetime, timedelta

class CookieManager:
    def __init__(self):
        self.session = requests.Session()
        self.last_refresh = None
        self.refresh_interval = timedelta(minutes=30)
    
    def needs_refresh(self):
        """Check if session needs refreshing."""
        if self.last_refresh is None:
            return True
        return datetime.now() - self.last_refresh > self.refresh_interval
    
    def refresh_session(self, url, browser_func):
        """
        Refresh session using browser automation.
        browser_func should return cookies from a real browser session.
        """
        if self.needs_refresh():
            new_cookies = browser_func(url)
            self.session.cookies.update(new_cookies)
            self.last_refresh = datetime.now()

Pros and cons

Pros:

  • Reduces challenge frequency
  • Works with any bypass method
  • Simple to implement

Cons:

  • Requires initial challenge bypass
  • Cookies expire and need refreshing
  • One session per IP/fingerprint

When to use this method

Use session management when:

  • You've successfully passed a challenge once
  • You're making multiple requests to the same site
  • You want to reduce detection triggers

This method complements other techniques—it's rarely sufficient alone.

Intermediate Methods

3. Residential Proxy Rotation

Route requests through residential IPs to bypass IP-based detection.

Best for: Avoiding IP blocks and datacenter blacklists
Difficulty: Medium
Cost: $-$$
Success rate: High

How it works

Reblaze maintains databases of datacenter IPs, VPN endpoints, and known proxy ranges. Requests from these sources face extra scrutiny.

Residential proxies use IPs assigned to real home internet connections. They appear as legitimate user traffic.

Rotating IPs prevents rate limiting and makes your requests look like distinct users.

Implementation

For residential proxies, I recommend Roundproxies.com which offers residential, datacenter, ISP, and mobile proxy options. Here's how to integrate rotating proxies:

import requests
from itertools import cycle
import random
import time

class ProxyRotator:
    """
    Rotate through residential proxies for each request.
    Supports authenticated proxy endpoints.
    """
    
    def __init__(self, proxy_endpoint, username, password):
        self.proxy_url = f"http://{username}:{password}@{proxy_endpoint}"
        self.session = requests.Session()
        self.request_count = 0
    
    def get_proxy_config(self):
        """Return proxy configuration for requests."""
        return {
            'http': self.proxy_url,
            'https': self.proxy_url
        }
    
    def make_request(self, url, headers=None):
        """
        Make request through rotating proxy.
        Each request gets a new IP from the pool.
        """
        proxies = self.get_proxy_config()
        
        try:
            response = self.session.get(
                url,
                headers=headers,
                proxies=proxies,
                timeout=30
            )
            self.request_count += 1
            return response
            
        except requests.exceptions.ProxyError as e:
            print(f"Proxy error: {e}")
            # Retry with delay
            time.sleep(random.uniform(2, 5))
            return self.make_request(url, headers)


# Usage
rotator = ProxyRotator(
    proxy_endpoint="gate.rproxies.com:10000",
    username="your_username",
    password="your_password"
)

response = rotator.make_request('https://target-site.com')

Geo-targeting for better results

Some sites are region-specific. Using proxies from the expected geography improves success rates:

def get_geo_proxy(country_code):
    """
    Get proxy endpoint for specific country.
    Most providers support country targeting.
    """
    # Example format - varies by provider
    return f"http://user-country-{country_code}:pass@proxy.example.com:port"

# Target US traffic
us_proxy = get_geo_proxy('US')

Handling proxy failures

Residential proxies occasionally fail. Build in retry logic:

def request_with_retry(url, proxies, max_retries=3):
    """Make request with automatic retry on proxy failure."""
    
    for attempt in range(max_retries):
        try:
            response = requests.get(
                url,
                proxies=proxies,
                timeout=30
            )
            
            # Check for block indicators
            if response.status_code == 403:
                print(f"Blocked on attempt {attempt + 1}, rotating...")
                time.sleep(random.uniform(1, 3))
                continue
                
            return response
            
        except Exception as e:
            print(f"Attempt {attempt + 1} failed: {e}")
            time.sleep(random.uniform(2, 5))
    
    return None

Pros and cons

Pros:

  • High success against IP-based detection
  • IPs appear as legitimate users
  • Scales well for large scraping jobs

Cons:

  • Ongoing cost
  • Slower than direct connections
  • Still fails against JS/behavioral checks

When to use this method

Use residential proxies when:

  • You're getting blocked despite good headers
  • Target site heavily filters datacenter IPs
  • You need to scrape at scale

Combine with header optimization for best results.

4. Puppeteer Stealth

Use fortified headless browsers to pass JavaScript challenges.

Best for: Sites requiring JavaScript execution and browser verification
Difficulty: Medium
Cost: Free
Success rate: High

How it works

Reblaze injects JavaScript challenges that verify browser environments. Standard headless browsers fail these checks.

Puppeteer Stealth is a plugin that patches detection points. It modifies browser properties to match legitimate Chrome behavior.

The plugin handles navigator.webdriver, Chrome runtime objects, missing permissions APIs, and other giveaways.

Implementation

First install the required packages:

npm install puppeteer-extra puppeteer-extra-plugin-stealth

Then implement the stealth scraper:

const puppeteer = require('puppeteer-extra');
const StealthPlugin = require('puppeteer-extra-plugin-stealth');

// Apply stealth plugin
puppeteer.use(StealthPlugin());

async function scrapeWithStealth(url) {
    // Launch browser with stealth configuration
    const browser = await puppeteer.launch({
        headless: 'new',
        args: [
            '--no-sandbox',
            '--disable-setuid-sandbox',
            '--disable-blink-features=AutomationControlled',
            '--disable-features=site-per-process',
            '--window-size=1920,1080'
        ]
    });
    
    const page = await browser.newPage();
    
    // Set realistic viewport
    await page.setViewport({
        width: 1920,
        height: 1080,
        deviceScaleFactor: 1,
        hasTouch: false,
        isLandscape: true,
        isMobile: false
    });
    
    // Set user agent
    await page.setUserAgent(
        'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 ' +
        '(KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36'
    );
    
    try {
        // Navigate and wait for network idle
        await page.goto(url, {
            waitUntil: 'networkidle2',
            timeout: 60000
        });
        
        // Wait for any challenge to resolve
        await page.waitForTimeout(3000);
        
        // Check for challenge indicators
        const content = await page.content();
        if (content.includes('window.rbzns')) {
            console.log('Challenge detected, waiting...');
            await page.waitForTimeout(5000);
        }
        
        // Get cookies including rbzid
        const cookies = await page.cookies();
        const rbzid = cookies.find(c => c.name === 'rbzid');
        
        if (rbzid) {
            console.log('Successfully obtained rbzid cookie');
        }
        
        // Extract page content
        const html = await page.content();
        
        return {
            html,
            cookies,
            success: true
        };
        
    } catch (error) {
        console.error('Scraping failed:', error.message);
        return { success: false, error: error.message };
        
    } finally {
        await browser.close();
    }
}

// Run scraper
scrapeWithStealth('https://target-site.com')
    .then(result => {
        if (result.success) {
            console.log('Page length:', result.html.length);
        }
    });

Adding proxy support

Combine stealth with residential proxies:

async function scrapeWithProxy(url, proxyUrl) {
    const browser = await puppeteer.launch({
        headless: 'new',
        args: [
            `--proxy-server=${proxyUrl}`,
            '--no-sandbox',
            '--disable-blink-features=AutomationControlled'
        ]
    });
    
    const page = await browser.newPage();
    
    // Authenticate proxy if needed
    await page.authenticate({
        username: 'proxy_user',
        password: 'proxy_pass'
    });
    
    // Continue with scraping...
}

Pros and cons

Pros:

  • Passes JavaScript challenges
  • Handles dynamic content
  • Active community maintaining evasions

Cons:

  • Slower than HTTP requests
  • Higher resource usage
  • May still fail biometric checks

When to use this method

Use Puppeteer Stealth when:

  • Simple requests return challenge pages
  • Target site heavily uses JavaScript
  • You need to interact with page elements

5. Nodriver

Use Nodriver for superior detection evasion compared to traditional headless browsers.

Best for: Advanced bot detection that catches standard automation
Difficulty: Medium
Cost: Free
Success rate: Very High

How it works

Nodriver is the successor to undetected-chromedriver. It takes a fundamentally different approach.

Instead of patching automation flags after they're set, Nodriver avoids setting them entirely. It communicates with Chrome without using the Chrome DevTools Protocol (CDP) in detectable ways.

This makes it significantly harder for Reblaze to identify automation.

Implementation

Install Nodriver:

pip install nodriver

Basic implementation:

import nodriver as uc
import asyncio

async def scrape_with_nodriver(url):
    """
    Scrape using Nodriver for maximum stealth.
    Nodriver avoids CDP detection patterns.
    """
    
    # Launch browser
    browser = await uc.start(
        headless=False,  # Headed mode is more stealthy
        browser_args=[
            '--window-size=1920,1080',
            '--disable-blink-features=AutomationControlled'
        ]
    )
    
    try:
        # Create new tab
        page = await browser.get(url)
        
        # Wait for page to fully load
        await page.sleep(3)
        
        # Check for Reblaze challenge
        content = await page.get_content()
        
        if 'rbzns' in content or 'challenge' in content.lower():
            print("Challenge detected, waiting for resolution...")
            await page.sleep(5)
            content = await page.get_content()
        
        # Extract cookies
        cookies = await browser.cookies.get_all()
        rbzid_cookie = next(
            (c for c in cookies if c.name == 'rbzid'), 
            None
        )
        
        if rbzid_cookie:
            print(f"rbzid acquired: {rbzid_cookie.value[:20]}...")
        
        return {
            'content': content,
            'cookies': cookies,
            'success': True
        }
        
    except Exception as e:
        print(f"Error: {e}")
        return {'success': False, 'error': str(e)}
        
    finally:
        browser.stop()


# Run the scraper
async def main():
    result = await scrape_with_nodriver('https://target-site.com')
    if result['success']:
        print(f"Content length: {len(result['content'])}")

asyncio.run(main())

Advanced configuration

For tougher sites, customize browser behavior:

async def advanced_nodriver_scrape(url):
    """
    Advanced Nodriver configuration for difficult targets.
    """
    
    browser = await uc.start(
        headless=False,
        browser_args=[
            '--window-size=1920,1080',
            '--start-maximized',
            '--disable-blink-features=AutomationControlled',
            '--disable-features=IsolateOrigins,site-per-process'
        ],
        lang='en-US'
    )
    
    page = await browser.get(url)
    
    # Simulate human-like behavior before interaction
    await page.sleep(2)
    
    # Scroll the page naturally
    await page.evaluate('''
        window.scrollTo({
            top: 300,
            behavior: 'smooth'
        });
    ''')
    
    await page.sleep(1)
    
    # Move mouse to random position
    await page.mouse.move(
        x=500 + (100 * (0.5 - 0.5)),  # Add randomness
        y=300 + (100 * (0.5 - 0.5))
    )
    
    await page.sleep(3)
    
    content = await page.get_content()
    return content

Combining with proxies

Route Nodriver through residential proxies:

async def nodriver_with_proxy(url, proxy):
    """
    Use Nodriver with proxy rotation.
    """
    
    browser = await uc.start(
        headless=False,
        browser_args=[
            f'--proxy-server={proxy}',
            '--window-size=1920,1080'
        ]
    )
    
    page = await browser.get(url)
    # Continue scraping...

Pros and cons

Pros:

  • Higher success rate than Puppeteer/Selenium
  • Avoids CDP detection patterns
  • Actively maintained
  • Python-native (easier for data pipelines)

Cons:

  • Requires GUI environment (or Xvfb)
  • Newer tool with smaller community
  • Still resource-intensive

When to use this method

Use Nodriver when:

  • Puppeteer Stealth gets detected
  • Target uses advanced fingerprinting
  • You need the highest possible success rate

Advanced Methods

6. Behavioral Emulation

Simulate human-like interactions to pass biometric behavioral checks.

Best for: Sites using Reblaze's passive challenges with biometric verification
Difficulty: Hard
Cost: Free
Success rate: Very High

How it works

Reblaze's biometric detection tracks mouse movements, click patterns, scroll behavior, typing speed, and interaction timing.

Bots typically exhibit inhuman patterns—instant movements, perfect timing, lack of micro-movements.

Behavioral emulation generates realistic human patterns using libraries like ghost-cursor for mouse movements and randomized timing.

Implementation

Install dependencies:

npm install puppeteer-extra puppeteer-extra-plugin-stealth ghost-cursor

Implement behavioral emulation:

const puppeteer = require('puppeteer-extra');
const StealthPlugin = require('puppeteer-extra-plugin-stealth');
const { createCursor } = require('ghost-cursor');

puppeteer.use(StealthPlugin());

async function humanLikeScrape(url) {
    const browser = await puppeteer.launch({
        headless: false,  // Headed for behavioral tracking
        args: [
            '--window-size=1920,1080',
            '--disable-blink-features=AutomationControlled'
        ]
    });
    
    const page = await browser.newPage();
    
    // Create cursor instance for human-like movements
    const cursor = createCursor(page);
    
    await page.setViewport({ width: 1920, height: 1080 });
    
    // Navigate to page
    await page.goto(url, { waitUntil: 'networkidle2' });
    
    // Wait random time like a human would
    await randomDelay(1000, 3000);
    
    // Move mouse to random position with human-like curve
    await cursor.moveTo({
        x: randomInt(200, 800),
        y: randomInt(200, 600)
    });
    
    await randomDelay(500, 1500);
    
    // Scroll down naturally
    await smoothScroll(page, 300);
    
    await randomDelay(1000, 2000);
    
    // Move mouse again
    await cursor.moveTo({
        x: randomInt(400, 1000),
        y: randomInt(300, 700)
    });
    
    // Random click (if appropriate)
    await cursor.click();
    
    await randomDelay(2000, 4000);
    
    const content = await page.content();
    await browser.close();
    
    return content;
}

// Helper functions
function randomInt(min, max) {
    return Math.floor(Math.random() * (max - min + 1)) + min;
}

function randomDelay(min, max) {
    return new Promise(resolve => 
        setTimeout(resolve, randomInt(min, max))
    );
}

async function smoothScroll(page, distance) {
    await page.evaluate((dist) => {
        return new Promise((resolve) => {
            let scrolled = 0;
            const step = 10;
            const interval = setInterval(() => {
                window.scrollBy(0, step);
                scrolled += step;
                if (scrolled >= dist) {
                    clearInterval(interval);
                    resolve();
                }
            }, 20 + Math.random() * 30);
        });
    }, distance);
}

Keyboard input emulation

For forms or search boxes:

async function humanLikeType(page, selector, text) {
    /**
     * Type text with human-like delays between keystrokes.
     * Varies speed like a real typist.
     */
    
    await page.click(selector);
    await randomDelay(200, 500);
    
    for (const char of text) {
        await page.keyboard.type(char, {
            delay: randomInt(50, 150)
        });
        
        // Occasional longer pause (like thinking)
        if (Math.random() < 0.1) {
            await randomDelay(200, 500);
        }
    }
}

Session recording patterns

Study legitimate user patterns on the target site:

async function recordSessionPatterns(page) {
    /**
     * Record mouse/keyboard events to analyze patterns.
     * Use this data to improve emulation.
     */
    
    await page.evaluate(() => {
        window.sessionEvents = [];
        
        document.addEventListener('mousemove', (e) => {
            window.sessionEvents.push({
                type: 'move',
                x: e.clientX,
                y: e.clientY,
                time: Date.now()
            });
        });
        
        document.addEventListener('click', (e) => {
            window.sessionEvents.push({
                type: 'click',
                x: e.clientX,
                y: e.clientY,
                time: Date.now()
            });
        });
        
        document.addEventListener('scroll', () => {
            window.sessionEvents.push({
                type: 'scroll',
                y: window.scrollY,
                time: Date.now()
            });
        });
    });
}

Pros and cons

Pros:

  • Defeats biometric behavioral analysis
  • Combined with stealth browsers, very effective
  • No ongoing costs

Cons:

  • Significantly slower
  • Complex to implement well
  • Requires headed browser (more resources)

When to use this method

Use behavioral emulation when:

  • Other methods get blocked after initial success
  • Target site uses passive challenges
  • You see patterns suggesting behavioral analysis

7. TLS Fingerprint Spoofing

Spoof TLS fingerprints to match legitimate browsers.

Best for: Bypassing TLS fingerprinting that flags automation libraries
Difficulty: Hard
Cost: Free
Success rate: High

How it works

Every HTTPS connection begins with a TLS handshake. The client sends a "ClientHello" message containing supported cipher suites, extensions, and version information.

Each browser (and HTTP library) has a unique TLS fingerprint. Python's requests library has a different fingerprint than Chrome.

Reblaze can identify non-browser connections through these fingerprints alone.

Libraries like curl_cffi and tls_client spoof browser TLS fingerprints.

Implementation

Install tls-client:

pip install tls-client

Use it to match Chrome's fingerprint:

import tls_client

def scrape_with_tls_spoofing(url):
    """
    Use TLS fingerprint spoofing to match Chrome.
    This bypasses TLS-based bot detection.
    """
    
    # Create session with Chrome fingerprint
    session = tls_client.Session(
        client_identifier="chrome_120",  # Match Chrome 120
        random_tls_extension_order=True
    )
    
    headers = {
        'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
        'Accept-Language': 'en-US,en;q=0.9',
        'Accept-Encoding': 'gzip, deflate, br',
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
        'Sec-Ch-Ua': '"Not_A Brand";v="8", "Chromium";v="120", "Google Chrome";v="120"',
        'Sec-Ch-Ua-Mobile': '?0',
        'Sec-Ch-Ua-Platform': '"Windows"',
    }
    
    response = session.get(url, headers=headers)
    
    return response


# Usage
response = scrape_with_tls_spoofing('https://target-site.com')
print(f"Status: {response.status_code}")
print(f"Content length: {len(response.text)}")

Using curl_cffi (alternative)

Another option with good browser impersonation:

from curl_cffi import requests

def scrape_with_curl_cffi(url):
    """
    Use curl_cffi for browser-like TLS fingerprints.
    Impersonates various browser versions.
    """
    
    response = requests.get(
        url,
        impersonate="chrome120",  # Options: chrome, safari, firefox
        headers={
            'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
            'Accept-Language': 'en-US,en;q=0.9',
        }
    )
    
    return response

Combining with proxies

TLS spoofing works well with proxy rotation:

import tls_client

def tls_spoof_with_proxy(url, proxy_url):
    """
    Combine TLS spoofing with residential proxies.
    """
    
    session = tls_client.Session(
        client_identifier="chrome_120"
    )
    
    # Parse proxy for tls_client format
    proxies = {
        "http": proxy_url,
        "https": proxy_url
    }
    
    response = session.get(
        url,
        proxy=proxy_url,
        headers=create_browser_headers()
    )
    
    return response

Pros and cons

Pros:

  • Bypasses TLS fingerprinting
  • Fast (no browser overhead)
  • Works for non-JS pages

Cons:

  • Doesn't execute JavaScript
  • Can't pass browser challenges
  • Library support varies

When to use this method

Use TLS fingerprint spoofing when:

  • Basic requests fail despite good headers
  • You don't need JavaScript execution
  • Speed is important

Combine with browser automation for complete coverage.

Which Bypass Method Should You Use?

Choosing the right method depends on your target site's protection level and your requirements.

Decision flowchart

Is the site returning HTML with basic requests?
├── Yes → Use Method 1 (Headers) + Method 3 (Proxies)
└── No → Does the response contain JavaScript challenges?
    ├── Yes → Use Method 4 (Puppeteer) or Method 5 (Nodriver)
    └── No → Getting 403/blocked instantly?
        ├── Yes → Use Method 7 (TLS) + Method 3 (Proxies)
        └── No → Blocked after initial success?
            └── Yes → Add Method 6 (Behavioral Emulation)

Quick reference by situation

Situation Recommended Methods
Basic blocks, fast scraping needed 1 + 3 + 7
JavaScript challenges 4 or 5 + 3
Getting caught after passing challenge 5 + 6 + 3
Maximum stealth required 5 + 6 + 3 + 7

Troubleshooting Common Issues

"403 Forbidden" immediately

Cause: IP blacklisted or TLS fingerprint flagged.

Fix:

  • Switch to residential proxies
  • Add TLS fingerprint spoofing
  • Verify headers match browser exactly

Challenge page never resolves

Cause: JavaScript challenge failing or biometric check triggered.

Fix:

  • Use Nodriver instead of Puppeteer
  • Add behavioral emulation
  • Run in headed mode (not headless)

Session invalidated between requests

Cause: rbzid cookie not persisting or expired.

Fix:

  • Implement proper cookie persistence
  • Refresh sessions before expiration
  • Ensure cookies are sent with every request

Blocked after several successful requests

Cause: Rate limiting or behavioral anomaly detection.

Fix:

  • Increase delays between requests
  • Add random variation to timing
  • Rotate IPs more frequently

"Server: Reblaze Secure Web Gateway" but page loads

Cause: Reblaze is present but monitoring only.

Fix:

  • Continue with current method
  • Monitor for changes
  • The site may have light protection

Ethical Considerations

Before bypassing Reblaze or any protection system, consider:

Respect robots.txt and ToS

Most sites have Terms of Service that address scraping. Violating these may have legal implications.

Check robots.txt for scraping guidelines. Many sites explicitly allow certain scrapers.

Use responsibly

Do:

  • Scrape public data for legitimate purposes
  • Implement rate limiting even when you can avoid it
  • Cache data to minimize requests
  • Identify yourself with a contact email when appropriate

Don't:

  • Scrape personal or private data without consent
  • Overload servers with requests
  • Resell scraped data without rights
  • Bypass protection on government, healthcare, or financial sites inappropriately

When to use official APIs instead

If a site offers an API, use it. APIs are:

  • Faster and more reliable
  • Legal and ToS-compliant
  • Often free for reasonable usage

Libraries and frameworks

Tool Language Best For
Nodriver Python Maximum stealth browser automation
Puppeteer Stealth Node.js JavaScript challenge bypass
tls_client Python TLS fingerprint spoofing
curl_cffi Python Browser impersonation
ghost-cursor Node.js Human-like mouse movements
fake-useragent Python User-Agent rotation

Proxy providers

For residential proxies, consider providers that offer:

  • Large IP pools
  • Geographic targeting
  • Session control
  • Competitive pricing

Conclusion

Reblaze is a sophisticated WAF with multi-layered detection. Bypassing it requires combining multiple techniques.

Start simple: Headers + residential proxies work for many sites.

Escalate when needed: Add Nodriver or Puppeteer Stealth for JavaScript challenges.

Go advanced sparingly: Behavioral emulation is powerful but slow—save it for tough targets.

Quick reference

Protection Level Solution Stack
Light (ACL only) Headers + Proxies
Medium (JS challenges) Nodriver + Proxies
Heavy (Biometric) Nodriver + Behavioral + Proxies

The key is starting with the simplest effective method and escalating only when necessary.

]]>
<![CDATA[The 4 best Mobile Proxy Providers in 2026]]>https://roundproxies.com/blog/best-mobile-proxy/696948ff26f439f88a95b266Fri, 23 Jan 2026 01:40:51 GMTMobile proxy providers route your traffic through real mobile devices connected to carrier networks like Verizon, AT&T, and Vodafone. This makes your requests look like they're coming from actual smartphone users—which websites almost never block.

I've spent weeks testing different providers for web scraping, social media management, and ad verification. In this guide, I'll break down the four best mobile proxy services that actually deliver in 2026, including real-world performance data and honest assessments of each option.

What Makes Mobile Proxies Different?

Mobile proxies provide IP addresses assigned by cellular carriers to real devices. Unlike datacenter proxies (which websites easily detect) or residential proxies (which come from home ISPs), mobile IPs carry the highest trust scores across virtually all platforms.

Here's why that matters. Mobile carrier networks use something called CGNAT (Carrier-Grade Network Address Translation). This means thousands of legitimate users share the same IP pools. When Instagram or Amazon sees a mobile IP, they know blocking it would affect countless real customers.

The result? Success rates above 99% on even the most aggressive anti-bot systems.

Social platforms like Instagram, TikTok, and Facebook are particularly lenient with mobile traffic. Most of their users access these apps from phones anyway. Your automated requests blend in with millions of real smartphone sessions happening simultaneously.

The 4 Best Mobile Proxy Providers at a Glance

Provider Best For IP Pool Starting Price Free Trial
Roundproxies All-around performance & value 10M+ $5/GB 3 days
Oxylabs Premium enterprise performance 20M+ $5.40/GB 7 days
SOAX Budget-friendly flexibility 33M+ $4/GB $1.99 trial
IPRoyal Unlimited bandwidth needs 4.5M+ $10.11/day 24 hours

How I Tested These Providers

I evaluated each provider across several criteria that actually matter for real-world usage.

Success rate testing: I ran 1,000+ requests against social media platforms, e-commerce sites, and Google SERPs. Any provider with success rates below 97% didn't make this list.

Speed measurements: Response times were recorded across multiple geographic locations. Anything consistently above 2 seconds got flagged.

IP quality checks: I verified that IPs actually came from real carrier networks—not datacenter IPs masquerading as mobile.

Support responsiveness: I opened tickets with fake issues to test response times and helpfulness.

Value assessment: Raw pricing means nothing without context. I calculated cost per successful request, factoring in retry rates.

Now let's dig into each provider.


1. Roundproxies — Best All-Around Mobile Proxy Provider

Roundproxies — Best All-Around Mobile Proxy Provider

Roundproxies has built a reputation for delivering reliable mobile proxy infrastructure at competitive pricing. Their network balances performance, geographic coverage, and affordability in a way that works for both individual users and businesses.

Roundproxies pros:

  • Excellent price-to-performance ratio
  • 10M+ mobile IPs across 150+ countries
  • True 4G/5G carrier connections
  • Responsive 24/7 customer support
  • Clean, intuitive dashboard

Roundproxies cons:

  • Smaller pool than some enterprise competitors
  • Advanced API features still expanding

Network Details

Roundproxies sources mobile IPs directly from carrier networks across North America, Europe, and Asia-Pacific regions. Their pool of over 10 million addresses covers 4G and 5G connections from major carriers like Verizon, AT&T, T-Mobile, Vodafone, and regional providers.

The targeting options hit the sweet spot for most use cases. Filter by country, state, city, or carrier. Session control allows sticky connections up to 30 minutes—long enough for most workflows without risking stale connections.

What sets Roundproxies apart is their focus on IP quality over raw quantity. They actively monitor and rotate out flagged addresses, keeping success rates consistently high across demanding targets.

Performance in Testing

In my testing against Instagram, TikTok, Amazon, and Google, Roundproxies delivered success rates above 99%. Response times averaged around 0.8 seconds—competitive with premium providers costing significantly more.

The mobile proxy network handled burst traffic well. Even when pushing 100+ concurrent connections, performance stayed stable without the timeout errors I experienced with some competitors.

Who Should Use Roundproxies

Roundproxies is my top recommendation for users who want reliable mobile proxy access without enterprise pricing. Whether you're managing social media accounts, scraping e-commerce data, or running ad verification, their network handles it.

The 3-day free trial gives you enough time to validate performance against your specific targets. No credit card required for the trial—just sign up and start testing.

Pricing: Starts at $5/GB for pay-as-you-go. They also offer Residential Proxies, Datacenter Proxies, and ISP Proxies if you need different proxy types for various use cases.Roundproxies.com

2. Oxylabs — Best Premium Performance

Oxylabs — Best Premium Performance

Oxylabs positions itself as the premium choice for businesses that need reliable mobile proxy access without the complexity of Bright Data's feature set.

Oxylabs pros:

  • 20M+ monthly mobile IPs
  • Excellent documentation and integration support
  • Strong performance with consistent sub-1-second response times
  • Award-winning customer support

Oxylabs cons:

  • Enterprise pricing excludes small users
  • City and ASN targeting can't be combined
  • Free trial requires business verification

Network Details

Oxylabs sources their mobile IPs from real devices—not SIM farms or emulators. This matters because some cheaper providers use artificial setups that websites have learned to detect.

The pool covers 195+ locations with filtering down to country, state, city, ASN, and even GPS coordinates. Session control lets you maintain the same IP for up to 24 hours—crucial for workflows that require persistent identity.

Performance in Testing

Oxylabs matched Bright Data in my testing. Success rates hit 99.94% across the same target sites, with average response times around 0.57 seconds.

What sets them apart is consistency. Bright Data occasionally threw timeout errors during high-load periods. Oxylabs maintained steady performance regardless of time of day or request volume.

Who Should Use Oxylabs

Oxylabs is my recommendation for mid-sized businesses that need premium performance without enterprise complexity. Their dashboard is cleaner than Bright Data's, and the documentation genuinely helps during integration.

The 7-day free trial requires business verification, so hobbyists and individual developers will hit friction. This is clearly a B2B service.

Pricing: Pay-as-you-go starts at $9/GB. Subscriptions starting at $5.40/GB with commitments.

3. SOAX — Best Value for Money

SOAX — Best Value for Money

SOAX offers the largest advertised mobile proxy pool on this list—33 million IPs—at prices that undercut premium providers by nearly 50%.

SOAX pros:

  • Massive 33M+ IP pool in 195+ locations
  • Most affordable per-GB pricing
  • Flexible rotation options (90 seconds to 10 minutes)
  • HTTP/SOCKS5 with UDP support

SOAX cons:

  • Dashboard redesign left some features basic
  • Traffic tracking can be imprecise
  • Fewer integration guides than competitors

Network Details

SOAX's filtering options rival the premium providers. Target by country, region, city, carrier, or ASN. The rotation flexibility stands out—you can set IPs to rotate anywhere from 90 seconds to 10 minutes, or keep sticky sessions for up to an hour.

The pool diversity helps for tasks like multi-account management. With 33 million IPs, you're unlikely to hit the same address twice in rapid succession.

Performance in Testing

SOAX performed well, though not quite at the level of Bright Data or Oxylabs. Success rates averaged 99.61%, with response times around 1.39 seconds.

The slower response times come from their infrastructure's geographic distribution. If you're targeting US sites from European servers, expect some latency. Choosing closer proxy locations helps.

Who Should Use SOAX

SOAX is my top recommendation for price-conscious users who still need serious mobile proxy capability. The $4/GB starting price makes it accessible for smaller operations, while the pool size supports enterprise-level tasks.

The $1.99 trial (400MB over 3 days) lets you test without commitment. That's enough to validate performance against your specific targets before scaling up.

Pricing: Starts at $4/GB for pay-as-you-go. Enterprise plans (800GB+) include dedicated account managers.

4. IPRoyal — Best for Unlimited Bandwidth

IPRoyal — Best for Unlimited Bandwidth

IPRoyal takes a different approach to pricing: instead of charging per gigabyte, they rent you access to a dedicated mobile device with unlimited bandwidth.

IPRoyal pros:

  • Unlimited bandwidth on all plans
  • True dedicated mobile IPs (not shared pools)
  • Instant IP changes when needed
  • 99% uptime guarantee

IPRoyal cons:

  • Limited to 15 countries
  • Higher entry cost than per-GB models
  • You're renting a single device, not a pool

Network Details

IPRoyal's mobile proxies work differently. You're essentially renting exclusive access to a real mobile device in your target location. That device connects to carrier networks through actual SIM cards.

The limited coverage (15 countries across two continents) reflects this hardware-based approach. Scaling requires renting additional devices.

What you gain is true IP freshness. Since you're not sharing with other users, your IP hasn't been flagged by previous bad actors. For social media automation where account safety matters most, this is significant.

Performance in Testing

Response times varied more than pool-based providers—anywhere from 0.5 to 1.5 seconds depending on the specific device's network conditions. Success rates stayed above 99% consistently.

The unlimited bandwidth shines for data-heavy tasks. Scraping video metadata, downloading images, or any workflow where per-GB pricing would break the budget.

Who Should Use IPRoyal

IPRoyal makes sense if you have predictable, high-bandwidth needs in supported locations. Social media managers running multiple accounts benefit from the dedicated IP approach.

The 24-hour trial at $10.11 includes unlimited bandwidth—enough to stress-test against your actual use case before committing to monthly plans.

Pricing: Starts at $10.11/day or $130/month for a dedicated mobile device.

Mobile Proxy Use Cases: Where They Actually Matter

Not every task requires mobile proxies. They're premium products, and using them when cheaper alternatives work is just wasting money.

When Mobile Proxies Are Essential

Social media automation: Platforms like Instagram and TikTok have become extremely aggressive at detecting non-mobile access patterns. If you're managing multiple accounts or running automation tools, mobile IPs are basically mandatory in 2026.

Ad verification: Checking how ads display across different mobile carriers and locations requires actual mobile network traffic. Datacenter proxies won't show you what real users see.

App testing: QA teams testing mobile applications need to simulate real network conditions. Mobile proxies provide authentic carrier connections that emulators can't replicate.

Sneaker and limited-release purchases: High-demand e-commerce sites actively block datacenter IPs. Mobile proxies offer the best success rates for checkout automation.

When Cheaper Alternatives Work

For general web scraping of non-sensitive targets, residential proxies often work just fine at lower costs. Roundproxies also offers Residential Proxies, Datacenter Proxies, and ISP Proxies—so you can mix proxy types based on each task's requirements. If success rates above 95% satisfy your requirements, you might not need mobile-tier trust scores.

Basic geo-unblocking for streaming or content access typically works with any proxy type. Mobile proxies are overkill here.

How Mobile Proxies Work: Technical Deep Dive

Understanding the technical architecture helps you make better decisions about provider selection and configuration.

Carrier-Grade NAT Explained

Mobile carriers don't assign unique public IPs to each device. Instead, they route thousands of subscribers through shared IP pools using CGNAT.

When you connect through a mobile proxy, you're inheriting this shared reputation. Websites see an IP that hundreds of legitimate users are actively using. Blocking that address would disrupt real customers—something no platform wants.

3G vs 4G vs 5G: Does It Matter?

Speed differences exist, but they rarely impact proxy performance meaningfully. Your actual throughput depends more on the proxy provider's infrastructure than the carrier network.

What does matter is IP pool composition. Providers with more 4G/5G IPs tend to have cleaner addresses because newer networks have had less time to accumulate flagged IPs.

Rotation vs Sticky Sessions

Rotating proxies assign a new IP with each request. Use these for:

  • High-volume scraping where you need thousands of unique identities
  • Tasks where session continuity doesn't matter
  • Spreading requests across maximum IP diversity

Sticky sessions maintain the same IP for extended periods (usually up to 24 hours). Use these for:

  • Account management where IP changes trigger security alerts
  • Multi-page workflows that require consistent identity
  • Checkout processes that track session continuity

Most providers let you configure rotation intervals. Finding the right balance depends on your specific targets.

Choosing the Right Mobile Proxy Provider

The best provider depends entirely on your use case. Here's a decision framework.

Choose Roundproxies if:

  • You want the best balance of price and performance
  • You need reliable mobile IPs without enterprise complexity
  • Geographic coverage in 150+ countries meets your needs
  • You're running social media, scraping, or ad verification tasks

Choose Oxylabs if:

  • You want premium performance with cleaner interface
  • Your business requires 24/7 professional support
  • You can verify business credentials for the free trial
  • Consistent sub-second response times matter

Choose SOAX if:

  • Price-per-GB is a significant factor
  • You need the largest IP pool for maximum diversity
  • Flexible rotation settings fit your workflow
  • The trial pricing works for your testing needs

Choose IPRoyal if:

  • Bandwidth costs would crush other pricing models
  • You need dedicated IPs for account safety
  • Coverage in 15 countries meets your requirements
  • Per-device pricing makes sense for your scale

Common Mobile Proxy Mistakes to Avoid

After testing dozens of providers over the years, I've seen the same mistakes repeatedly.

Buying on Price Alone

The cheapest mobile proxies often aren't mobile proxies at all. Some providers route through residential IPs and call them "mobile." Others use SIM farms that carriers have learned to identify and flag.

Always verify IP authenticity. Run a few test requests through IP checking services before committing to bulk purchases.

Over-rotating IPs

Changing IPs too frequently actually hurts performance. Websites track behavioral patterns. If every request comes from a different IP in a different city, that's more suspicious than consistent traffic from a single address.

Match rotation settings to natural user behavior.

Ignoring Session Requirements

Some workflows require sticky sessions. If you're logging into accounts or maintaining shopping carts, rapid IP rotation will break things.

Plan your session requirements before choosing between rotating pools and sticky options.

Skipping Geographic Matching

Using US mobile proxies to access region-locked UK content defeats the purpose. Check that your provider has coverage in locations you actually need.

How to Set Up Mobile Proxies

Configuration is straightforward once you understand the basics. Here's a quick walkthrough that applies to most providers.

Basic HTTP/HTTPS Setup

Most mobile proxy providers give you a connection string in this format:

host:port:username:password

For example: us.mobileproxy.example.com:10000:user123:pass456

In Python with the requests library:

import requests

proxies = {
    'http': 'http://user123:pass456@us.mobileproxy.example.com:10000',
    'https': 'http://user123:pass456@us.mobileproxy.example.com:10000'
}

response = requests.get('https://httpbin.org/ip', proxies=proxies)
print(response.json())

SOCKS5 Configuration

SOCKS5 proxies work at a lower network level and support more protocols. Most browsers and tools have native SOCKS5 support.

In curl:

curl --socks5 user123:pass456@us.mobileproxy.example.com:10000 https://httpbin.org/ip

Rotation and Session Control

Different providers handle rotation differently. Usually, you append parameters to the username.

For rotating IPs (new IP each request):

user123-rotate:pass456

For sticky sessions (same IP for set duration):

user123-session-abc123:pass456

Check your provider's documentation for exact syntax. These patterns vary.

Red Flags: What to Avoid in Mobile Proxy Providers

The mobile proxy market has its share of questionable operators. Here's what should trigger your suspicion.

Suspiciously Low Pricing

Quality mobile IPs cost money to source and maintain. If someone offers mobile proxies at residential proxy prices, they're probably not delivering actual mobile IPs.

Run verification tests. Check IP lookups on multiple services to confirm carrier attribution.

No Clear IP Sourcing Information

Legitimate providers explain where their IPs come from. Vague language about "proprietary networks" or refusal to discuss sourcing methods suggests potential ethical issues.

The proxy industry has faced scrutiny over malware-infected devices being used as proxy nodes. Reputable providers demonstrate ethical sourcing practices.

Excessive Pool Size Claims

Some providers claim impossibly large mobile IP pools. Think critically about these numbers. There are only so many mobile devices connected to carrier networks in any given region.

If someone claims 100 million mobile IPs in a country with 50 million mobile subscriptions, the math doesn't work.

No Trial or Refund Policy

Any confident provider offers trial access or money-back guarantees. Refusal to let you test before committing suggests they don't trust their own service.

Frequently Asked Questions

Yes. Using proxies to route internet traffic is legal in most jurisdictions. What matters is what you do with them. Scraping public data? Legal. Breaking platform terms of service? Against their rules, but not illegal. Fraudulent activities? Obviously illegal regardless of proxy use.

How much bandwidth do I need?

Typical web scraping uses about 1-5MB per page, depending on content. Social media automation runs lighter—usually under 1MB per action. Calculate your expected request volume and multiply by average page size.

For reference: managing 10 social accounts with 100 actions per day each uses roughly 1GB monthly. Heavy scraping operations can consume 50-100GB monthly easily.

Can websites still detect mobile proxies?

Advanced detection systems can flag any proxy traffic. Mobile proxies just have the lowest detection rates. Quality providers constantly refresh IP pools and monitor for flagged addresses.

Detection typically combines IP reputation with behavioral analysis. A mobile IP making 1,000 requests per minute will still get flagged regardless of its trust score.

What's the difference between mobile and residential proxies?

Residential proxies route through home ISP connections. Mobile proxies route through carrier networks. Mobile IPs generally have higher trust scores because of CGNAT sharing, but residential proxies offer broader location coverage at lower prices.

For most web scraping tasks, residential proxies work fine. Reserve mobile proxies for social platforms and other targets that specifically track mobile vs. desktop access patterns.

Do I need an antidetect browser with mobile proxies?

For social media automation, yes. Platforms fingerprint browser characteristics beyond just IP addresses. Mobile proxies hide your network identity, but antidetect browsers mask device fingerprints.

Without fingerprint masking, platforms can link multiple accounts to the same browser regardless of IP changes. The combination of mobile proxies plus antidetect browser provides complete anonymity.

How often should I rotate IPs?

It depends on your target. For scraping, rotate frequently—every request or every few seconds. For account management, avoid rotation entirely during active sessions.

A good rule: match rotation patterns to natural human behavior. Real users don't change IP addresses every 10 seconds.

Can I use mobile proxies with Selenium or Puppeteer?

Absolutely. Both tools support proxy configuration through browser launch options.

For Puppeteer:

const browser = await puppeteer.launch({
  args: ['--proxy-server=http://proxy.example.com:10000']
});

Authentication typically happens through browser extensions or upstream proxy handlers.

What happens if my mobile proxy IP gets banned?

With rotating proxies, you simply get a new IP on the next request. With sticky sessions, you'll need to request a new session or switch to a different exit node.

Quality providers monitor IP reputation and pull flagged addresses from active rotation. This happens automatically—you don't need to manage it manually.

Final Recommendations

For most users: Start with Roundproxies. Their balance of pricing, performance, and ease of use makes them the best choice for the majority of mobile proxy use cases. The free trial lets you validate against your specific targets risk-free.

For enterprise operations: Oxylabs provides the premium infrastructure and support that large organizations require. The compliance certifications and dedicated account managers justify the higher investment.

For budget-conscious projects: SOAX offers the largest IP pool at the lowest per-GB rates. If you're cost-sensitive but still need serious mobile proxy capability, they deliver.

For social media managers: Consider IPRoyal's dedicated device approach. The unlimited bandwidth and dedicated IPs reduce account-ban risks compared to shared pools.

Mobile proxies aren't cheap, but for the right use cases, they're essential. The providers on this list have proven reliability through 2025 and into 2026. Test your specific targets before scaling up, and don't assume that what worked last year will work forever—the proxy landscape evolves constantly.

]]>
<![CDATA[How to do Web Automation with Morelogin in 6 Steps]]>https://roundproxies.com/blog/morelogin-web-automation/6971ecef26f439f88a95b33eThu, 22 Jan 2026 09:44:59 GMTRunning automation scripts on standard browsers is a fast track to getting banned. Websites detect Selenium and Puppeteer almost instantly. That's where antidetect browsers change the game.

Morelogin automation works by exposing a local API and debug ports that automation frameworks can connect to. Instead of controlling a detectable Chrome instance, your scripts control isolated browser profiles with unique fingerprints.

This guide walks you through the complete setup process. You'll learn to automate Morelogin profiles using Python with Selenium, Node.js with Puppeteer, and even Playwright.

What You'll Learn

  • Setting up Morelogin's local API for automation
  • Connecting Selenium to Morelogin browser profiles
  • Automating with Puppeteer and Playwright
  • Running headless automation for server environments
  • Handling multiple profiles simultaneously

Why Use Morelogin for Web Automation?

Standard browser automation tools leave obvious fingerprints. Websites check for WebDriver properties, automation flags, and inconsistent browser fingerprints.

Morelogin creates isolated browser profiles with real fingerprint parameters. Each profile has its own Canvas hash, WebGL renderer, fonts, screen resolution, and dozens of other attributes.

When you connect Selenium or Puppeteer to a Morelogin profile, the automation framework controls a properly fingerprinted browser. The target website sees a regular user instead of a bot.

This matters for social media management, e-commerce tasks, data collection, and any scenario where detection means account bans.

Step 1: Install and Configure Morelogin

Download Morelogin from the official website at morelogin.com. The application runs on Windows and macOS.

After installation, create your Morelogin account and log in. New accounts receive 2 free permanent browser profiles.

Before running any automation, you need to enable the API interface. Open Morelogin, click the gear icon for Settings, then navigate to the API section.

The API page displays three critical pieces of information: your API ID, API Key, and the local API address. The default API endpoint is http://127.0.0.1:40000.

Write down or copy your API credentials. You'll need these for authentication in your automation scripts.

Important: The API only works while Morelogin is running. Keep the application open in the background during automation tasks.

Step 2: Create Browser Profiles for Automation

Each automation task needs a browser profile. You can create profiles manually through the interface or programmatically via the API.

For manual creation, click "New Profile" in Morelogin. Configure the following settings:

  • Browser Type: Chrome or Firefox (Chrome recommended for Puppeteer)
  • Operating System: Match your target platform (Windows/macOS/Android)
  • Proxy: Add your proxy if needed (residential proxies work best)
  • Fingerprint Settings: Leave on auto-generate for realistic fingerprints

After creating the profile, right-click it in the list. Select "Copy Browser Profile ID" to get the unique identifier you'll use in scripts.

Bulk Profile Creation via API

For automation at scale, create profiles through the API. Here's a Python example:

import requests

API_URL = "http://127.0.0.1:40000"
HEADERS = {
    "Content-Type": "application/json",
    "X-Api-Id": "YOUR_API_ID",
    "X-Api-Key": "YOUR_API_KEY"
}

def create_profiles(count):
    payload = {
        "browserTypeId": 1,      # 1 = Chrome, 2 = Firefox
        "operatorSystemId": 1,   # 1 = Windows, 2 = macOS
        "quantity": count
    }
    
    response = requests.post(
        f"{API_URL}/api/env/create/quick",
        json=payload,
        headers=HEADERS
    )
    
    result = response.json()
    if result["code"] == 0:
        return result["data"]  # Returns list of profile IDs
    else:
        print(f"Error: {result['msg']}")
        return []

# Create 5 new profiles
profile_ids = create_profiles(5)
print(f"Created profiles: {profile_ids}")

This script creates profiles with default fingerprint settings. The API returns an array of profile IDs.

Step 3: Start Profiles and Get Debug Ports

Before connecting automation frameworks, you must start the browser profile through the API. The start endpoint returns a debug port for WebSocket connections.

Here's how to start a profile and get its debug port:

import requests
import time

API_URL = "http://127.0.0.1:40000"
HEADERS = {
    "Content-Type": "application/json",
    "X-Api-Id": "YOUR_API_ID",
    "X-Api-Key": "YOUR_API_KEY"
}

def start_profile(profile_id):
    payload = {"envId": profile_id}
    
    response = requests.post(
        f"{API_URL}/api/env/start",
        json=payload,
        headers=HEADERS
    )
    
    result = response.json()
    if result["code"] == 0:
        debug_port = result["data"]["debugPort"]
        print(f"Profile started on debug port: {debug_port}")
        return debug_port
    else:
        print(f"Failed to start: {result['msg']}")
        return None

def stop_profile(profile_id):
    payload = {"envId": profile_id}
    
    response = requests.post(
        f"{API_URL}/api/env/close",
        json=payload,
        headers=HEADERS
    )
    
    result = response.json()
    return result["code"] == 0

The debugPort in the response is what Selenium, Puppeteer, and Playwright use to connect. Each profile gets its own unique port.

Pro Tip: Always check the profile status before connecting. Use the /api/env/status endpoint to verify the browser is fully loaded.

def check_profile_status(profile_id):
    payload = {"envId": profile_id}
    
    response = requests.post(
        f"{API_URL}/api/env/status",
        json=payload,
        headers=HEADERS
    )
    
    result = response.json()
    if result["code"] == 0:
        return result["data"]["status"]  # "running" or "stopped"
    return None

Step 4: Connect Selenium to Morelogin Profiles

Selenium works with Morelogin through the Remote WebDriver. Instead of launching a new browser, you connect to the already-running profile.

Install the required packages first:

pip install selenium requests

Here's a complete Morelogin automation script using Python and Selenium:

import requests
import time
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By

# Configuration
API_URL = "http://127.0.0.1:40000"
PROFILE_ID = "YOUR_PROFILE_ID"
HEADERS = {
    "Content-Type": "application/json",
    "X-Api-Id": "YOUR_API_ID",
    "X-Api-Key": "YOUR_API_KEY"
}

def start_profile(profile_id):
    """Start a Morelogin profile and return the debug port."""
    payload = {"envId": profile_id}
    response = requests.post(
        f"{API_URL}/api/env/start",
        json=payload,
        headers=HEADERS
    )
    result = response.json()
    
    if result["code"] == 0:
        return result["data"]["debugPort"]
    raise Exception(f"Failed to start profile: {result['msg']}")

def stop_profile(profile_id):
    """Stop a running Morelogin profile."""
    payload = {"envId": profile_id}
    requests.post(
        f"{API_URL}/api/env/close",
        json=payload,
        headers=HEADERS
    )

def create_driver(debug_port):
    """Create Selenium driver connected to Morelogin profile."""
    options = Options()
    options.debugger_address = f"127.0.0.1:{debug_port}"
    
    driver = webdriver.Chrome(options=options)
    return driver

def main():
    debug_port = None
    driver = None
    
    try:
        # Start the profile
        debug_port = start_profile(PROFILE_ID)
        time.sleep(3)  # Wait for browser to fully load
        
        # Connect Selenium
        driver = create_driver(debug_port)
        
        # Your automation logic here
        driver.get("https://example.com")
        print(f"Page title: {driver.title}")
        
        # Example: Fill a form
        # driver.find_element(By.ID, "username").send_keys("test")
        
        time.sleep(5)
        
    finally:
        if driver:
            driver.quit()
        stop_profile(PROFILE_ID)

if __name__ == "__main__":
    main()

The key is the debugger_address option. This tells Selenium to connect to an existing Chrome DevTools Protocol session instead of launching a new browser.

Handling ChromeDriver Versions

Morelogin uses its own Chromium core. The API response includes the path to a compatible WebDriver. Use this to avoid version mismatch errors:

def get_webdriver_path(profile_id):
    payload = {"envId": profile_id}
    response = requests.post(
        f"{API_URL}/api/env/status",
        json=payload,
        headers=HEADERS
    )
    result = response.json()
    
    if result["code"] == 0:
        return result["data"].get("webdriver")
    return None

Step 5: Automate with Puppeteer and Playwright

For Node.js projects, Puppeteer offers better performance than Selenium. Playwright provides additional browser support and modern features.

Puppeteer Setup

Install puppeteer-core (not the full puppeteer package):

npm install puppeteer-core axios

Here's a complete Puppeteer script for Morelogin automation:

const puppeteer = require('puppeteer-core');
const axios = require('axios');

const API_URL = 'http://127.0.0.1:40000';
const PROFILE_ID = 'YOUR_PROFILE_ID';
const HEADERS = {
    'Content-Type': 'application/json',
    'X-Api-Id': 'YOUR_API_ID',
    'X-Api-Key': 'YOUR_API_KEY'
};

async function startProfile(profileId) {
    const response = await axios.post(
        `${API_URL}/api/env/start`,
        { envId: profileId },
        { headers: HEADERS }
    );
    
    if (response.data.code === 0) {
        return response.data.data.debugPort;
    }
    throw new Error(`Failed to start: ${response.data.msg}`);
}

async function stopProfile(profileId) {
    await axios.post(
        `${API_URL}/api/env/close`,
        { envId: profileId },
        { headers: HEADERS }
    );
}

async function main() {
    let browser = null;
    
    try {
        // Start Morelogin profile
        const debugPort = await startProfile(PROFILE_ID);
        console.log(`Profile started on port: ${debugPort}`);
        
        // Wait for browser initialization
        await new Promise(r => setTimeout(r, 3000));
        
        // Connect Puppeteer
        const browserWSEndpoint = `ws://127.0.0.1:${debugPort}`;
        browser = await puppeteer.connect({
            browserWSEndpoint,
            defaultViewport: null
        });
        
        // Get existing page or create new one
        const pages = await browser.pages();
        const page = pages[0] || await browser.newPage();
        
        // Your automation logic
        await page.goto('https://example.com');
        console.log(`Page title: ${await page.title()}`);
        
        // Example: Take screenshot
        await page.screenshot({ path: 'screenshot.png' });
        
        await new Promise(r => setTimeout(r, 5000));
        
    } finally {
        if (browser) {
            await browser.disconnect();
        }
        await stopProfile(PROFILE_ID);
    }
}

main().catch(console.error);

The puppeteer.connect() method attaches to the running browser instead of launching one. This preserves all fingerprint settings from Morelogin.

Playwright Setup

Playwright works similarly. Install it with:

npm install playwright-core axios
const { chromium } = require('playwright-core');
const axios = require('axios');

// Same API configuration as Puppeteer example...

async function main() {
    const debugPort = await startProfile(PROFILE_ID);
    
    const browser = await chromium.connectOverCDP(
        `http://127.0.0.1:${debugPort}`
    );
    
    const context = browser.contexts()[0];
    const page = context.pages()[0] || await context.newPage();
    
    await page.goto('https://example.com');
    console.log(await page.title());
    
    await browser.close();
    await stopProfile(PROFILE_ID);
}

Playwright uses connectOverCDP instead of a WebSocket URL. The rest of the workflow remains identical.

Step 6: Run Headless Automation on Servers

For production environments and servers, you need headless execution. Morelogin supports a headless service mode that runs without a GUI.

Starting Headless Mode

Launch Morelogin in headless mode from the command line:

Windows:

cd "C:\Program Files\MoreLogin"
start /WAIT MoreLogin.exe --headless=true --port=40000

macOS:

"/Applications/MoreLogin.app/Contents/MacOS/MoreLogin" --headless=true --port=40000

The --port parameter specifies which port the API listens on. Make sure this matches your scripts.

Starting Profiles in Headless Mode

Profiles can also run headless. Add the isHeadless parameter when starting:

def start_headless_profile(profile_id):
    payload = {
        "envId": profile_id,
        "isHeadless": True
    }
    
    response = requests.post(
        f"{API_URL}/api/env/start",
        json=payload,
        headers=HEADERS
    )
    
    return response.json()

Headless mode requires Morelogin version 2.36.0 or higher.

Running Multiple Profiles Simultaneously

Morelogin automation really shines with multi-profile workflows. Here's a pattern for parallel execution:

import asyncio
import aiohttp

async def start_and_automate(session, profile_id):
    # Start profile
    async with session.post(
        f"{API_URL}/api/env/start",
        json={"envId": profile_id},
        headers=HEADERS
    ) as response:
        result = await response.json()
        debug_port = result["data"]["debugPort"]
    
    # Connect and automate
    options = Options()
    options.debugger_address = f"127.0.0.1:{debug_port}"
    driver = webdriver.Chrome(options=options)
    
    # Run your tasks...
    driver.get("https://example.com")
    
    driver.quit()
    
    # Stop profile
    async with session.post(
        f"{API_URL}/api/env/close",
        json={"envId": profile_id},
        headers=HEADERS
    ) as response:
        pass

async def main():
    profile_ids = ["profile1", "profile2", "profile3"]
    
    async with aiohttp.ClientSession() as session:
        tasks = [
            start_and_automate(session, pid) 
            for pid in profile_ids
        ]
        await asyncio.gather(*tasks)

asyncio.run(main())

Keep in mind the API rate limit of 60 requests per minute per endpoint. Space out your profile starts if running many simultaneously.

Best Practices for Morelogin Automation

Use Proxies Per Profile

Each profile should have its own proxy IP. This prevents correlation between accounts. Configure proxies through the Morelogin interface or API before starting profiles.

Add Random Delays

Instant actions trigger bot detection. Add random pauses between interactions:

import random
import time

def human_delay(min_seconds=1, max_seconds=3):
    time.sleep(random.uniform(min_seconds, max_seconds))

Check Profile Status Before Connecting

Network delays can cause connection failures. Always verify the profile is running:

def wait_for_profile(profile_id, timeout=30):
    start_time = time.time()
    while time.time() - start_time < timeout:
        status = check_profile_status(profile_id)
        if status == "running":
            return True
        time.sleep(1)
    return False

Clean Up Profiles

Clear cookies and cache periodically using the API:

def clear_profile_cache(profile_id):
    payload = {
        "envId": profile_id,
        "cookie": True,
        "localStorage": True,
        "indexedDB": True
    }
    
    requests.post(
        f"{API_URL}/api/env/removeLocalCache",
        json=payload,
        headers=HEADERS
    )

Common Issues and Fixes

Profile Won't Start

Check that Morelogin is running and logged in. The API requires an active session.

Selenium Can't Connect

The profile might not be fully loaded. Increase the delay after starting or use the status endpoint to wait for "running" state.

ChromeDriver Version Mismatch

Use the WebDriver path returned by the status API. Morelogin bundles compatible drivers for each Chromium version.

Rate Limiting

The API limits requests to 60 per minute per endpoint. Implement request queuing for high-volume operations.

Final Thoughts

Morelogin automation gives you the detection evasion of an antidetect browser combined with the power of standard automation frameworks. The local API approach means your existing Selenium or Puppeteer skills transfer directly.

Start with a single profile to test your workflow. Once it works reliably, scale up to multi-profile automation with proper proxies and delays.

The key advantage over raw browser automation is fingerprint isolation. Each Morelogin profile maintains its own browser identity across sessions, making long-term account management practical.

For high-volume scraping or data collection tasks, consider pairing Morelogin with residential proxies from Roundproxies.com to further reduce detection risk.

FAQ

Does Morelogin automation work on Linux servers?

Morelogin currently supports Windows and macOS only. For Linux servers, run the Morelogin client on a Windows instance and connect your scripts via the API over your local network.

Can I use Firefox profiles with Puppeteer?

No, Puppeteer only supports Chromium-based browsers. Use Selenium or Playwright for Firefox profile automation with Morelogin.

How many profiles can run simultaneously?

This depends on your system resources and Morelogin subscription tier. Each profile runs a full browser instance, so RAM is usually the limiting factor.

Do I need to keep Morelogin open during automation?

Yes, the local API only functions while Morelogin is running and logged in. Use headless mode for server deployments.

]]>
<![CDATA[How to bypass TrustDecision in 2026: Step-by-step guide]]>https://roundproxies.com/blog/bypass-trustdecision/6970b2e026f439f88a95b31aWed, 21 Jan 2026 11:09:01 GMTTrustDecision uses device fingerprinting and behavioral analysis across 150+ data points to detect automated traffic. This guide shows you practical methods to bypass their protection using TLS fingerprinting, stealth browsers, and behavioral simulation.

Whether you're building a scraper for research purposes or testing your own applications, understanding how TrustDecision works is the first step to getting past it.

What is TrustDecision Anti-Bot?

TrustDecision anti-bot is a fraud prevention system that identifies automated traffic through device fingerprinting, behavioral analysis, and real-time risk scoring. It creates unique identifiers for each device by collecting hardware configurations, operating system details, and software settings.

The system processes over 6 billion device profiles globally. It serves more than 10,000 clients across financial services, e-commerce, and digital platforms.

Unlike simpler anti-bot solutions that rely primarily on IP reputation or basic CAPTCHA challenges, TrustDecision operates at multiple detection layers simultaneously.

How TrustDecision Detects Bots

Understanding the detection mechanisms is essential before attempting any bypass. TrustDecision employs several overlapping techniques.

Device Fingerprinting

TrustDecision collects over 150 device parameters to create a unique identifier. These include:

  • Hardware configuration (CPU cores, GPU details, screen resolution)
  • Operating system and version
  • Browser type and installed plugins
  • Timezone and language settings
  • Canvas and WebGL rendering characteristics
  • Audio context fingerprints

The system persists device identification even after factory resets or app reinstalls. This makes simple evasion tactics ineffective.

Behavioral Analysis

Beyond static fingerprints, TrustDecision monitors how users interact with applications. It tracks typing patterns, mouse movements, scroll behavior, and touch gestures on mobile devices.

Automated scripts typically produce unnaturally consistent timing. Real users show variation in their actions. The system flags requests that lack this natural randomness.

Risk Scoring

Each incoming request receives a real-time risk score based on multiple factors. The system assigns dynamic risk labels that update continuously.

High-risk indicators include:

  • Emulator or virtual machine detection
  • Modified device parameters
  • Group control tools (device farms)
  • VPN or proxy usage patterns
  • Inconsistent location data

WebAssembly Protection

TrustDecision protects its client-side scripts using WebAssembly obfuscation. This makes reverse engineering significantly harder than with standard JavaScript fingerprinting libraries.

The code virtualization technology increases the cost and effort required to understand the detection logic.

Method 1: Bypass TLS Fingerprinting with curl_cffi

When your scraper connects over HTTPS, a TLS handshake reveals information about your client. Standard HTTP libraries like Python's requests produce fingerprints that differ noticeably from real browsers.

TrustDecision analyzes these TLS characteristics as part of its detection stack. The curl_cffi library solves this by impersonating real browser fingerprints.

Install curl_cffi

First, install the library using pip:

pip install curl_cffi

The package includes pre-compiled binaries for Windows, macOS, and Linux. No additional compilation is required.

Basic Usage

Replace your standard requests import with curl_cffi:

from curl_cffi import requests

# Impersonate Chrome browser including TLS fingerprint
response = requests.get(
    "https://target-site.com",
    impersonate="chrome"
)

print(response.status_code)
print(response.text)

The impersonate parameter tells curl_cffi which browser's TLS signature to mimic. This single change makes your requests appear identical to real Chrome traffic at the TLS layer.

Supported Browser Profiles

curl_cffi supports multiple browser versions. Use specific versions for more precise impersonation:

from curl_cffi import requests

# Specific Chrome version
response = requests.get(
    "https://target-site.com",
    impersonate="chrome124"
)

# Safari impersonation
response = requests.get(
    "https://target-site.com",
    impersonate="safari"
)

# Latest Chrome (auto-updates)
response = requests.get(
    "https://target-site.com",
    impersonate="chrome"
)

Available profiles include chrome99 through chrome131, safari, and safari_ios. The library updates profiles as browser fingerprints change.

Adding Proxy Support

Combine TLS fingerprinting with proxy rotation for better results:

from curl_cffi import requests

proxies = {
    "http": "http://user:pass@proxy-server:8080",
    "https": "http://user:pass@proxy-server:8080"
}

response = requests.get(
    "https://target-site.com",
    impersonate="chrome",
    proxies=proxies
)

Residential proxies work better than datacenter IPs against TrustDecision. The system cross-references IP reputation with device fingerprints.

Session Management

Maintain cookies and session state across multiple requests:

from curl_cffi import requests

session = requests.Session()

# First request establishes session
session.get(
    "https://target-site.com/login",
    impersonate="chrome"
)

# Subsequent requests reuse cookies
response = session.get(
    "https://target-site.com/protected-page",
    impersonate="chrome"
)

Consistent sessions help maintain a believable browsing pattern. Avoid creating new fingerprints for every request.

Limitations of curl_cffi

While curl_cffi handles TLS fingerprinting effectively, it cannot execute JavaScript. Sites that require JavaScript execution or browser interactions need a different approach.

Use curl_cffi for:

  • API endpoints
  • Pages without JavaScript challenges
  • Initial reconnaissance

Move to stealth browsers when JavaScript execution is required.

Method 2: Stealth Browser Automation with Camoufox

Camoufox is a Firefox-based anti-detect browser that masks fingerprints at the C++ implementation level. This approach defeats detection methods that inspect JavaScript properties.

Standard automation tools like Selenium expose the navigator.webdriver flag. Camoufox eliminates these telltale signs before JavaScript can even check them.

Install Camoufox

Install the library and download browser binaries:

pip install camoufox
python -m camoufox fetch

The fetch command downloads the modified Firefox binary. This step is required before first use.

Basic Scraping Example

Here's how to scrape a protected page:

from camoufox.sync_api import Camoufox

with Camoufox(headless=True) as browser:
    page = browser.new_page()
    page.goto("https://target-site.com")
    
    # Wait for content to load
    page.wait_for_load_state("networkidle")
    
    # Extract content
    content = page.content()
    print(content)

Camoufox automatically generates realistic fingerprints on each launch. The browser appears identical to a genuine Firefox installation.

Configure Fingerprint Properties

Override specific values when you need precise control:

from camoufox.sync_api import Camoufox

config = {
    'window.outerHeight': 1056,
    'window.outerWidth': 1920,
    'window.innerHeight': 1008,
    'window.innerWidth': 1920,
    'navigator.language': 'en-US',
    'navigator.languages': ['en-US'],
    'navigator.platform': 'Win32',
    'navigator.hardwareConcurrency': 8,
}

with Camoufox(
    config=config,
    i_know_what_im_doing=True
) as browser:
    page = browser.new_page()
    page.goto("https://target-site.com")

The i_know_what_im_doing flag acknowledges that custom configurations can create detectable inconsistencies. Only use custom values when necessary.

GeoIP Proxy Integration

Camoufox automatically matches location data to your proxy's geographic location:

from camoufox.sync_api import Camoufox

with Camoufox(
    headless=True,
    geoip=True,
    proxy={
        "server": "http://proxy-server:8080",
        "username": "user",
        "password": "pass"
    }
) as browser:
    page = browser.new_page()
    page.goto("https://target-site.com")

The geoip feature queries your proxy's IP and sets timezone, locale, and WebRTC IP accordingly. This prevents location inconsistencies that TrustDecision would flag.

Async Implementation for Scale

For large-scale scraping, use the async API:

from camoufox.async_api import AsyncCamoufox
import asyncio

async def scrape_page(browser, url):
    page = await browser.new_page()
    await page.goto(url)
    content = await page.content()
    await page.close()
    return content

async def main():
    urls = [
        "https://target-site.com/page1",
        "https://target-site.com/page2",
        "https://target-site.com/page3"
    ]
    
    async with AsyncCamoufox(headless=True) as browser:
        tasks = [scrape_page(browser, url) for url in urls]
        results = await asyncio.gather(*tasks)
        
        for url, content in zip(urls, results):
            print(f"Scraped {len(content)} chars from {url}")

asyncio.run(main())

Async mode handles concurrent page loads efficiently. Keep the browser instance alive to maintain consistent fingerprints across requests.

Why Camoufox Works Against TrustDecision

Camoufox achieves 0% detection rates on fingerprinting test sites like CreepJS and BrowserScan. The modifications happen at the C++ level, before any JavaScript can inspect browser properties.

TrustDecision's JavaScript-based fingerprinting cannot detect Camoufox because:

  1. The navigator.webdriver flag is genuinely absent
  2. Canvas fingerprints match real Firefox installations
  3. WebGL parameters are consistent with the spoofed OS
  4. Audio context fingerprints are properly masked

Method 3: Nodriver for CDP-Minimal Automation

Nodriver takes a fundamentally different approach from patched browsers. Instead of hiding automation signals, it avoids creating them in the first place.

Traditional tools like Selenium use the Chrome DevTools Protocol (CDP) extensively. Anti-bot systems learned to detect CDP patterns. Nodriver minimizes CDP usage while still providing automation capabilities.

Install Nodriver

pip install nodriver

Nodriver manages browser installation automatically. No separate driver downloads required.

Basic Example

import nodriver as uc

async def main():
    browser = await uc.start()
    page = await browser.get("https://target-site.com")
    
    # Wait for page to fully load
    await page.sleep(2)
    
    # Get page content
    content = await page.get_content()
    print(content)
    
    await browser.stop()

if __name__ == "__main__":
    uc.loop().run_until_complete(main())

The library launches Chrome with minimal automation traces. Most CDP interactions that trigger detection are avoided entirely.

Handling Form Submissions

Nodriver supports realistic user interactions:

import nodriver as uc

async def login_example():
    browser = await uc.start()
    page = await browser.get("https://target-site.com/login")
    
    # Find and fill username field
    username = await page.select("input[name='username']")
    await username.send_keys("your_username")
    
    # Find and fill password field
    password = await page.select("input[name='password']")
    await password.send_keys("your_password")
    
    # Click login button
    login_btn = await page.select("button[type='submit']")
    await login_btn.click()
    
    # Wait for navigation
    await page.sleep(3)
    
    print(await page.get_content())
    await browser.stop()

uc.loop().run_until_complete(login_example())

The interactions mimic real user behavior. No CDP-based input injection that would trigger detection.

Proxy Configuration

import nodriver as uc

async def main():
    browser = await uc.start(
        browser_args=[
            '--proxy-server=http://proxy-server:8080'
        ]
    )
    page = await browser.get("https://target-site.com")
    await browser.stop()

uc.loop().run_until_complete(main())

Combine Nodriver with residential proxies for best results against TrustDecision's IP reputation checks.

Method 4: Behavioral Simulation

TrustDecision monitors user behavior patterns. Even with perfect fingerprinting, robotic behavior can trigger detection. Simulate human-like interactions to avoid behavioral flags.

Random Delays

Never send requests at consistent intervals:

import random
import time
from curl_cffi import requests

def human_delay():
    """Generate random delay between 1-5 seconds"""
    delay = random.uniform(1.0, 5.0)
    time.sleep(delay)

session = requests.Session()

urls = ["https://site.com/page1", "https://site.com/page2"]

for url in urls:
    human_delay()
    response = session.get(url, impersonate="chrome")
    print(f"Fetched: {url}")

Real users don't navigate instantly between pages. The random delays make your traffic pattern more believable.

Mouse Movement Simulation

For browser automation, simulate realistic cursor paths:

from camoufox.sync_api import Camoufox
import random

def human_mouse_move(page, target_x, target_y):
    """Move cursor with human-like randomness"""
    current_x, current_y = 0, 0
    steps = random.randint(10, 25)
    
    for i in range(steps):
        progress = (i + 1) / steps
        # Add slight randomness to path
        jitter_x = random.uniform(-5, 5)
        jitter_y = random.uniform(-5, 5)
        
        next_x = current_x + (target_x - current_x) * progress + jitter_x
        next_y = current_y + (target_y - current_y) * progress + jitter_y
        
        page.mouse.move(next_x, next_y)
        page.wait_for_timeout(random.randint(10, 30))

with Camoufox(humanize=True) as browser:
    page = browser.new_page()
    page.goto("https://target-site.com")
    
    # Move to a button before clicking
    button = page.locator("button.submit")
    box = button.bounding_box()
    human_mouse_move(page, box['x'] + box['width']/2, box['y'] + box['height']/2)
    
    button.click()

Camoufox includes built-in humanization with the humanize=True parameter. This adds natural variations to interactions automatically.

Scroll Behavior

Humans scroll pages in distinct patterns:

from camoufox.sync_api import Camoufox
import random
import time

def human_scroll(page, total_scroll):
    """Scroll page with realistic behavior"""
    scrolled = 0
    
    while scrolled < total_scroll:
        # Variable scroll distance
        scroll_amount = random.randint(100, 400)
        
        page.evaluate(f"window.scrollBy(0, {scroll_amount})")
        scrolled += scroll_amount
        
        # Occasional pause to "read"
        if random.random() < 0.3:
            time.sleep(random.uniform(0.5, 2.0))
        else:
            time.sleep(random.uniform(0.1, 0.3))

with Camoufox(headless=True) as browser:
    page = browser.new_page()
    page.goto("https://target-site.com/long-page")
    
    # Scroll down the page naturally
    human_scroll(page, 2000)

Avoid scrolling at constant speeds or to exact pixel positions. Real users are imprecise.

Method 5: Proxy Rotation Strategy

TrustDecision correlates device fingerprints with IP addresses. Using the same IP with different fingerprints raises flags. A proper proxy strategy is essential.

Residential vs Datacenter Proxies

Datacenter proxies are cheap but easily detected. TrustDecision maintains IP reputation databases that flag known datacenter ranges.

Residential proxies route traffic through real user devices. They're harder to detect but more expensive.

For TrustDecision bypass:

  • Use residential proxies for sensitive operations
  • Rotate IPs between sessions, not requests
  • Match proxy location to your spoofed fingerprint location

Session-Based Rotation

Don't rotate IPs within a single user session:

from curl_cffi import requests
import random

class ProxyRotator:
    def __init__(self, proxy_list):
        self.proxies = proxy_list
        self.current_proxy = None
    
    def get_session_proxy(self):
        """Get proxy for entire session"""
        if not self.current_proxy:
            self.current_proxy = random.choice(self.proxies)
        return {
            "http": self.current_proxy,
            "https": self.current_proxy
        }
    
    def rotate(self):
        """Switch to new proxy for next session"""
        self.current_proxy = random.choice(self.proxies)

proxy_list = [
    "http://user:pass@proxy1:8080",
    "http://user:pass@proxy2:8080",
    "http://user:pass@proxy3:8080"
]

rotator = ProxyRotator(proxy_list)

# Use same proxy for entire browsing session
session = requests.Session()
proxies = rotator.get_session_proxy()

# Multiple requests with same proxy
for page in range(1, 5):
    response = session.get(
        f"https://site.com/page/{page}",
        impersonate="chrome",
        proxies=proxies
    )
    
# Rotate for next session
rotator.rotate()

Consistent IP within sessions mimics real user behavior. Changing IPs mid-session triggers anti-fraud alerts.

Putting It All Together

Here's a complete example combining multiple bypass techniques:

from camoufox.sync_api import Camoufox
import random
import time

def scrape_with_bypass(url, proxy_config):
    """
    Complete TrustDecision bypass scraper
    """
    with Camoufox(
        headless=True,
        humanize=True,
        geoip=True,
        proxy=proxy_config
    ) as browser:
        page = browser.new_page()
        
        # Navigate to target
        page.goto(url)
        
        # Wait for any JavaScript challenges to complete
        page.wait_for_load_state("networkidle")
        
        # Simulate reading behavior
        time.sleep(random.uniform(2, 4))
        
        # Scroll down naturally
        for _ in range(3):
            page.evaluate(f"window.scrollBy(0, {random.randint(200, 400)})")
            time.sleep(random.uniform(0.3, 0.8))
        
        # Extract data
        content = page.content()
        
        return content

# Configure proxy
proxy = {
    "server": "http://residential-proxy:8080",
    "username": "user",
    "password": "pass"
}

# Run scraper
result = scrape_with_bypass(
    "https://trustdecision-protected-site.com",
    proxy
)

print(f"Scraped {len(result)} characters")

This implementation addresses multiple detection layers:

  • TLS fingerprinting (Camoufox uses real Firefox)
  • Device fingerprinting (C++-level spoofing)
  • Behavioral analysis (humanize mode + manual simulation)
  • IP reputation (residential proxy with GeoIP matching)

Common Pitfalls to Avoid

Several mistakes commonly trigger TrustDecision detection even when using bypass tools.

Fingerprint Inconsistencies

Don't spoof a Windows fingerprint while using a macOS proxy location. TrustDecision cross-references all available signals.

Keep your configuration consistent:

  • OS fingerprint matches proxy location
  • Timezone aligns with IP geolocation
  • Language settings match expected region

Excessive Request Rates

Even with perfect fingerprints, sending 100 requests per minute looks automated. Real users browse slowly.

Target 2-5 requests per minute for heavily protected sites. Monitor success rates and adjust accordingly.

Reusing Fingerprints Across IPs

Each IP should have a consistent, unique fingerprint. Don't use the same device ID with multiple IP addresses.

Generate new fingerprints when rotating proxies between sessions.

Ignoring JavaScript Execution

curl_cffi bypasses TLS fingerprinting but cannot handle JavaScript challenges. If a site requires browser execution, switch to Camoufox or Nodriver.

Watch for symptoms like:

  • Blank pages despite 200 status codes
  • "Please enable JavaScript" messages
  • Immediate redirects to challenge pages

Testing Your Setup

Before targeting production sites, verify your bypass configuration works correctly.

Fingerprint Test Sites

Check your fingerprint against these testing tools:

from camoufox.sync_api import Camoufox

test_urls = [
    "https://browserleaks.com/javascript",
    "https://browserscan.net/",
    "https://creepjs.com/"
]

with Camoufox(headless=False) as browser:
    for url in test_urls:
        page = browser.new_page()
        page.goto(url)
        input(f"Check {url} - Press Enter to continue")
        page.close()

Run in headful mode to manually inspect results. Look for:

  • Zero headless detection flags
  • Consistent fingerprint properties
  • No automation signals detected

TLS Fingerprint Verification

Confirm curl_cffi produces correct browser fingerprints:

from curl_cffi import requests

# Check JA3 fingerprint
response = requests.get(
    "https://tls.browserleaks.com/json",
    impersonate="chrome"
)

data = response.json()
print(f"JA3 Hash: {data.get('ja3_hash')}")
print(f"User Agent: {data.get('user_agent')}")

Compare the JA3 hash against known Chrome values. Mismatches indicate configuration issues.

Method 6: SeleniumBase UC Mode Alternative

SeleniumBase with Undetected ChromeDriver (UC) Mode provides another option for bypassing TrustDecision. This approach works well if you already have Selenium-based code.

Install SeleniumBase

pip install seleniumbase

Basic Usage with UC Mode

from seleniumbase import SB

with SB(uc=True) as sb:
    sb.open("https://target-site.com")
    
    # Wait for any challenges to complete
    sb.sleep(3)
    
    # Extract content
    content = sb.get_page_source()
    print(content)

The uc=True parameter enables undetected mode. The browser launches with modifications that hide automation signals from detection scripts.

Handling Turnstile Challenges

SeleniumBase includes built-in CAPTCHA handling:

from seleniumbase import SB

with SB(uc=True) as sb:
    sb.uc_open_with_reconnect("https://protected-site.com", 4)
    
    # If Turnstile appears
    if sb.is_element_visible("iframe[src*='turnstile']"):
        sb.uc_gui_click_captcha()
    
    # Continue after verification
    sb.click("button.proceed")

The uc_gui_click_captcha() method handles interactive Turnstile challenges automatically. This saves significant development time compared to manual solver integration.

Combining with Proxies

from seleniumbase import SB

proxy_string = "user:pass@proxy-server:8080"

with SB(uc=True, proxy=proxy_string) as sb:
    sb.open("https://target-site.com")
    content = sb.get_page_source()

SeleniumBase passes proxy configuration to the browser cleanly. The traffic routes through your proxy while maintaining stealth mode.

Troubleshooting Detection Issues

Even with proper bypass techniques, you may encounter blocks. Here's how to diagnose and fix common problems.

Check TLS Fingerprint First

If curl_cffi requests fail, verify your TLS fingerprint matches expectations:

from curl_cffi import requests

response = requests.get(
    "https://tls.browserleaks.com/json",
    impersonate="chrome124"
)

print(response.json())

Look for any anomalies in the JA3 hash or HTTP/2 settings. Mismatches indicate the impersonation isn't working correctly.

Verify Browser Fingerprint

For Camoufox or Nodriver issues, run fingerprint tests in headful mode:

from camoufox.sync_api import Camoufox

with Camoufox(headless=False) as browser:
    page = browser.new_page()
    page.goto("https://browserscan.net/")
    
    # Manually inspect the results
    input("Press Enter after checking results...")

Common issues include:

  • WebGL parameters inconsistent with spoofed OS
  • Audio context fingerprint anomalies
  • Canvas rendering differences

Monitor Request Headers

TrustDecision inspects HTTP headers for inconsistencies:

from curl_cffi import requests

response = requests.get(
    "https://httpbin.org/headers",
    impersonate="chrome"
)

print(response.json()['headers'])

Verify the User-Agent, Accept, and Accept-Language headers match a real browser. Missing or malformed headers trigger detection.

Check for IP Reputation Issues

Sometimes the proxy IP itself is flagged. Test with a fresh IP:

from curl_cffi import requests

# Test IP reputation
response = requests.get(
    "https://ipinfo.io/json",
    impersonate="chrome",
    proxies={"https": "http://proxy:port"}
)

data = response.json()
print(f"IP: {data.get('ip')}")
print(f"Type: {data.get('hosting', 'Unknown')}")
print(f"Location: {data.get('city')}, {data.get('country')}")

Datacenter IPs often show "hosting: true" which triggers TrustDecision's IP reputation filter. Switch to residential proxies if this occurs.

Debug JavaScript Challenges

If pages load but show challenge screens, JavaScript execution is required:

from camoufox.sync_api import Camoufox

with Camoufox(headless=False) as browser:
    page = browser.new_page()
    
    # Enable console logging
    page.on("console", lambda msg: print(f"Console: {msg.text}"))
    
    page.goto("https://target-site.com")
    
    # Watch for any errors or challenge triggers
    page.wait_for_timeout(10000)

JavaScript errors in the console often indicate detection. Common triggers include undefined properties that the fingerprinting script expects.

Performance Optimization Tips

Bypassing TrustDecision effectively requires balancing stealth with speed.

Connection Pooling

Reuse connections instead of creating new ones for each request:

from curl_cffi import requests

session = requests.Session()

# Connection is reused across requests
for i in range(10):
    response = session.get(
        f"https://site.com/page/{i}",
        impersonate="chrome"
    )

Connection reuse mimics real browser behavior and reduces overhead.

Browser Instance Reuse

For Camoufox, keep browser instances alive across multiple pages:

from camoufox.sync_api import Camoufox

# WRONG - Creates new fingerprint each time
for url in urls:
    with Camoufox() as browser:  # New instance per URL
        page = browser.new_page()
        page.goto(url)

# CORRECT - Consistent fingerprint for session
with Camoufox() as browser:
    for url in urls:
        page = browser.new_page()
        page.goto(url)
        page.close()

Creating new browser instances generates new fingerprints. Keep the instance alive to maintain consistency.

Parallel Processing Limits

Don't run too many concurrent browsers. Each instance consumes significant memory:

from camoufox.async_api import AsyncCamoufox
import asyncio

async def scrape_batch(urls, max_concurrent=3):
    semaphore = asyncio.Semaphore(max_concurrent)
    
    async def scrape_one(browser, url):
        async with semaphore:
            page = await browser.new_page()
            await page.goto(url)
            content = await page.content()
            await page.close()
            return content
    
    async with AsyncCamoufox(headless=True) as browser:
        tasks = [scrape_one(browser, url) for url in urls]
        return await asyncio.gather(*tasks)

Limit concurrent pages to prevent resource exhaustion. Three to five concurrent pages works well for most systems.

Frequently Asked Questions

What is TrustDecision anti-bot protection?

TrustDecision anti-bot is a fraud prevention system that detects automated traffic through device fingerprinting, behavioral analysis, and real-time risk scoring across 150+ data points. It serves over 10,000 clients and maintains profiles for 6 billion+ devices globally.

Can TrustDecision detect headless browsers?

Yes, TrustDecision can detect standard headless browsers through JavaScript property inspection and fingerprint analysis. However, stealth browsers like Camoufox achieve 0% detection by modifying properties at the C++ level before JavaScript executes.

Which bypass method is most effective?

No single method works universally. For static content, curl_cffi with TLS fingerprinting often suffices. For JavaScript-heavy sites, Camoufox provides the best stealth. Combine multiple techniques with residential proxies for heavily protected targets.

Do I need residential proxies?

Residential proxies significantly improve success rates against TrustDecision. The system cross-references device fingerprints with IP reputation. Datacenter IPs are often flagged regardless of fingerprint quality.

How often should I rotate fingerprints?

Rotate fingerprints between sessions, not requests. Maintain the same fingerprint throughout a browsing session to mimic real user behavior. Generate new fingerprints when switching proxy IPs.

The legality depends on your use case and jurisdiction. Authorized security testing and accessing your own accounts is generally permitted. Scraping websites against their terms of service may violate computer fraud laws in some jurisdictions.

Conclusion

Bypassing TrustDecision requires a layered approach. No single technique works in isolation. Combine TLS fingerprinting with stealth browsers and realistic behavioral simulation for best results.

Start with curl_cffi for API endpoints and simple pages. Move to Camoufox when JavaScript execution is required. Add behavioral simulation and proxy rotation for heavily protected targets.

Monitor your success rates continuously. TrustDecision updates their detection methods regularly. What works today may need adjustment tomorrow.

The key principles to remember:

  1. Match your TLS fingerprint to a real browser using curl_cffi
  2. Use stealth browsers like Camoufox for JavaScript-heavy sites
  3. Simulate human behavior with realistic delays and interactions
  4. Keep fingerprints consistent within sessions
  5. Use residential proxies with matching geographic data

For large-scale operations, consider using residential proxies from providers that offer high-quality IPs suitable for anti-bot bypass.

]]>
<![CDATA[How to scrape DuckDuckGo: 3 working methods]]>https://roundproxies.com/blog/scrape-duckduckgo/696e557b26f439f88a95b2caMon, 19 Jan 2026 16:08:04 GMTDuckDuckGo handles over 100 million daily searches. Unlike Google, it doesn't track users or personalize results.

This makes DuckDuckGo a goldmine for unbiased search data.

In this guide, you'll learn exactly how to scrape DuckDuckGo search results using three different methods. I'll show you working code that doesn't rely on expensive third-party APIs.

Whether you need to monitor keyword rankings, gather SERP data for research, or build a search aggregator, these techniques will get you there.

What You Need to Scrape DuckDuckGo

DuckDuckGo scraping requires different approaches depending on which version you target. The search engine serves two distinct page types:

The static HTML version lives at html.duckduckgo.com. It renders without JavaScript and uses traditional pagination. This version is faster to scrape and requires fewer resources.

The dynamic version at duckduckgo.com requires JavaScript rendering. It includes features like AI-generated summaries and infinite scroll pagination. Scraping this version demands browser automation tools.

Feature Static Version Dynamic Version
URL html.duckduckgo.com/html/?q= duckduckgo.com/?q=
JavaScript Required No Yes
Pagination "Next" button "More Results" button
AI Summaries No Yes
Scraping Difficulty Easy Moderate

Most scraping projects work fine with the static version. The code runs faster and uses less memory.

Let's start with the simplest approach.

Method 1: Scrape DuckDuckGo With HTTP Requests

This method uses Python's requests library combined with BeautifulSoup for parsing. It targets the static HTML version and works well for most use cases.

Setting Up Your Environment

First, create a project folder and virtual environment:

mkdir duckduckgo-scraper
cd duckduckgo-scraper
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install the required packages:

pip install requests beautifulsoup4

Building the Basic Scraper

Create a file named scraper.py and add the following imports:

import requests
from bs4 import BeautifulSoup
import csv
import time

The requests library handles HTTP connections. BeautifulSoup parses the HTML response into a searchable tree structure.

Now add the core scraping function:

def scrape_duckduckgo(query, num_pages=1):
    """
    Scrape DuckDuckGo search results for a given query.
    
    Args:
        query: Search term to look up
        num_pages: Number of result pages to scrape
    
    Returns:
        List of dictionaries containing scraped results
    """
    base_url = "https://html.duckduckgo.com/html/"
    
    headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"
    }
    
    all_results = []
    
    params = {"q": query}
    
    for page in range(num_pages):
        response = requests.get(base_url, params=params, headers=headers)
        
        if response.status_code != 200:
            print(f"Error: Received status code {response.status_code}")
            break
            
        results, next_params = parse_results(response.text)
        all_results.extend(results)
        
        if not next_params:
            break
            
        params = next_params
        time.sleep(1)  # Be respectful to the server
    
    return all_results

This function sends GET requests to DuckDuckGo's static search page. The User-Agent header makes the request look like it's coming from a real browser.

Without this header, DuckDuckGo returns a 403 Forbidden error.

Parsing Search Results

Add the parsing function that extracts data from the HTML:

def parse_results(html):
    """
    Parse DuckDuckGo HTML and extract search results.
    
    Args:
        html: Raw HTML string from the response
    
    Returns:
        Tuple of (results list, next page params)
    """
    soup = BeautifulSoup(html, "html.parser")
    results = []
    
    # Find all result containers
    result_elements = soup.select("#links .result")
    
    for element in result_elements:
        # Extract the title and URL
        title_link = element.select_one(".result__a")
        if not title_link:
            continue
            
        title = title_link.get_text(strip=True)
        url = title_link.get("href", "")
        
        # DuckDuckGo uses protocol-relative URLs
        if url.startswith("//"):
            url = "https:" + url
        
        # Extract the display URL
        display_url_elem = element.select_one(".result__url")
        display_url = display_url_elem.get_text(strip=True) if display_url_elem else ""
        
        # Extract the snippet
        snippet_elem = element.select_one(".result__snippet")
        snippet = snippet_elem.get_text(strip=True) if snippet_elem else ""
        
        results.append({
            "title": title,
            "url": url,
            "display_url": display_url,
            "snippet": snippet
        })
    
    # Get next page parameters
    next_params = get_next_page_params(soup)
    
    return results, next_params

The CSS selectors target specific elements in DuckDuckGo's HTML structure. Each result sits inside a container with the result class.

Handling Pagination

DuckDuckGo's pagination works through form submissions. Add this function to extract the next page parameters:

def get_next_page_params(soup):
    """
    Extract parameters needed to fetch the next page.
    
    Args:
        soup: BeautifulSoup object of current page
    
    Returns:
        Dictionary of form parameters or None if no next page
    """
    next_form = soup.select_one(".nav-link form")
    
    if not next_form:
        return None
    
    params = {}
    
    for input_elem in next_form.select("input"):
        name = input_elem.get("name")
        value = input_elem.get("value", "")
        
        if name:
            params[name] = value
    
    return params

The static version uses a hidden form for pagination. This function extracts all form fields and passes them to the next request.

Saving Results to CSV

Add a function to export the scraped data:

def save_to_csv(results, filename):
    """
    Save scraped results to a CSV file.
    
    Args:
        results: List of result dictionaries
        filename: Output file path
    """
    if not results:
        print("No results to save")
        return
    
    fieldnames = results[0].keys()
    
    with open(filename, "w", newline="", encoding="utf-8") as f:
        writer = csv.DictWriter(f, fieldnames=fieldnames)
        writer.writeheader()
        writer.writerows(results)
    
    print(f"Saved {len(results)} results to {filename}")

Running the Scraper

Add the main execution block:

if __name__ == "__main__":
    query = "python web scraping tutorial"
    results = scrape_duckduckgo(query, num_pages=3)
    save_to_csv(results, "duckduckgo_results.csv")
    
    # Print a sample
    for result in results[:5]:
        print(f"\nTitle: {result['title']}")
        print(f"URL: {result['url']}")
        print(f"Snippet: {result['snippet'][:100]}...")

Run it with:

python scraper.py

You'll get a CSV file containing titles, URLs, display URLs, and snippets from DuckDuckGo's search results.

Method 2: Scrape DuckDuckGo With Browser Automation

Some projects require the dynamic version with JavaScript-rendered content. Browser automation handles this by controlling a real browser instance.

Playwright offers a cleaner API than Selenium and runs faster. Let's build a scraper using it.

Installing Playwright

pip install playwright
playwright install chromium

The second command downloads the Chromium browser binary that Playwright controls.

Building the Browser-Based Scraper

Create browser_scraper.py:

from playwright.sync_api import sync_playwright
import json
import time

def scrape_duckduckgo_dynamic(query, max_results=30):
    """
    Scrape DuckDuckGo using browser automation.
    
    Args:
        query: Search term
        max_results: Maximum results to collect
    
    Returns:
        List of result dictionaries
    """
    results = []
    
    with sync_playwright() as p:
        # Launch browser in headless mode
        browser = p.chromium.launch(headless=True)
        
        context = browser.new_context(
            user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"
        )
        
        page = context.new_page()
        
        # Navigate to DuckDuckGo
        search_url = f"https://duckduckgo.com/?q={query}"
        page.goto(search_url, wait_until="networkidle")
        
        # Wait for results to load
        page.wait_for_selector("[data-testid='result']", timeout=10000)
        
        while len(results) < max_results:
            # Extract visible results
            new_results = extract_results(page)
            
            for result in new_results:
                if result not in results:
                    results.append(result)
            
            if len(results) >= max_results:
                break
            
            # Click "More Results" if available
            more_button = page.query_selector("button:has-text('More Results')")
            
            if more_button:
                more_button.click()
                time.sleep(2)
            else:
                break
        
        browser.close()
    
    return results[:max_results]

Playwright waits for the network to become idle before proceeding. This ensures all JavaScript has finished executing.

Extracting Results From the Dynamic Page

def extract_results(page):
    """
    Extract search results from the current page state.
    
    Args:
        page: Playwright page object
    
    Returns:
        List of result dictionaries
    """
    results = []
    
    # The dynamic version uses data-testid attributes
    result_elements = page.query_selector_all("[data-testid='result']")
    
    for element in result_elements:
        try:
            title_elem = element.query_selector("h2 a")
            snippet_elem = element.query_selector("[data-result='snippet']")
            
            if not title_elem:
                continue
            
            title = title_elem.inner_text()
            url = title_elem.get_attribute("href")
            snippet = snippet_elem.inner_text() if snippet_elem else ""
            
            results.append({
                "title": title,
                "url": url,
                "snippet": snippet
            })
            
        except Exception as e:
            continue
    
    return results

The dynamic version's HTML structure differs from the static version. It uses data-testid attributes for testing, which also make scraping easier.

Running the Browser Scraper

if __name__ == "__main__":
    results = scrape_duckduckgo_dynamic("machine learning courses", max_results=50)
    
    print(f"Scraped {len(results)} results")
    
    with open("dynamic_results.json", "w") as f:
        json.dump(results, f, indent=2)

Browser automation uses more resources than HTTP requests. Reserve it for cases where you specifically need JavaScript-rendered content.

Method 3: Using the DDGS Python Library

DDGS (formerly duckduckgo-search) provides a high-level interface for DuckDuckGo scraping. It handles all the parsing logic internally.

Installing DDGS

pip install -U ddgs

Scraping With DDGS

The library supports both Python code and command-line usage:

from ddgs import DDGS

def search_with_ddgs(query, max_results=20):
    """
    Search DuckDuckGo using the DDGS library.
    
    Args:
        query: Search term
        max_results: Number of results to return
    
    Returns:
        List of result dictionaries
    """
    results = []
    
    with DDGS() as ddgs:
        for result in ddgs.text(query, max_results=max_results):
            results.append({
                "title": result.get("title"),
                "url": result.get("href"),
                "snippet": result.get("body")
            })
    
    return results

# Usage
results = search_with_ddgs("best python frameworks 2024", max_results=30)

DDGS also offers a command-line interface:

ddgs text -q "python web scraping" -m 20 -o results.csv

This outputs results directly to a CSV file without writing any code.

Additional DDGS Features

The library supports multiple search types:

from ddgs import DDGS

with DDGS() as ddgs:
    # Image search
    images = list(ddgs.images("sunset beach", max_results=10))
    
    # News search
    news = list(ddgs.news("tech industry", max_results=10))
    
    # Video search
    videos = list(ddgs.videos("python tutorial", max_results=10))

DDGS abstracts away the complexity but offers less flexibility than custom scrapers.

Avoiding Blocks When You Scrape DuckDuckGo

DuckDuckGo implements rate limiting to prevent abuse. Making too many requests from the same IP triggers blocks.

Signs You're Being Blocked

Watch for these indicators:

  • HTTP 403 Forbidden responses
  • CAPTCHA challenges appearing
  • Empty result pages
  • Longer response times followed by connection drops

Implementing Request Delays

Add delays between requests to reduce detection:

import random
import time

def respectful_request(url, params, headers):
    """Make a request with random delay."""
    # Random delay between 1-3 seconds
    delay = random.uniform(1, 3)
    time.sleep(delay)
    
    return requests.get(url, params=params, headers=headers)

Random delays look more natural than fixed intervals.

Rotating User Agents

Cycle through different user agent strings:

import random

USER_AGENTS = [
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.2 Safari/605.1.15",
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:121.0) Gecko/20100101 Firefox/121.0",
    "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"
]

def get_random_headers():
    return {
        "User-Agent": random.choice(USER_AGENTS),
        "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
        "Accept-Language": "en-US,en;q=0.5",
        "Accept-Encoding": "gzip, deflate",
        "Connection": "keep-alive"
    }

Using Rotating Proxies for Scale

For large-scale scraping, rotating proxies are essential. Each request goes through a different IP address, making it impossible for DuckDuckGo to identify your scraper.

Residential proxies work best because they use real home IP addresses. We offer residential, datacenter, ISP, and mobile proxy options that integrate easily with Python:

def scrape_with_proxy(query, proxy_url):
    """
    Make a request through a rotating proxy.
    
    Args:
        query: Search term
        proxy_url: Proxy connection string
    
    Returns:
        Response object
    """
    proxies = {
        "http": proxy_url,
        "https": proxy_url
    }
    
    base_url = "https://html.duckduckgo.com/html/"
    params = {"q": query}
    headers = get_random_headers()
    
    response = requests.get(
        base_url,
        params=params,
        headers=headers,
        proxies=proxies,
        timeout=30
    )
    
    return response

With rotating proxies, you can scrape thousands of queries without hitting rate limits.

Handling CAPTCHAs

If you encounter CAPTCHAs frequently, consider these approaches:

  1. Reduce request frequency
  2. Use higher-quality residential proxies
  3. Implement exponential backoff on errors
  4. Switch to the static version which triggers fewer CAPTCHAs
def exponential_backoff(func, max_retries=5):
    """Retry with exponential backoff on failure."""
    for attempt in range(max_retries):
        try:
            return func()
        except Exception as e:
            if attempt == max_retries - 1:
                raise
            
            wait_time = (2 ** attempt) + random.uniform(0, 1)
            print(f"Attempt {attempt + 1} failed. Waiting {wait_time:.1f}s")
            time.sleep(wait_time)

Complete Production Scraper

Here's a complete script combining all the techniques:

import requests
from bs4 import BeautifulSoup
import csv
import time
import random
from typing import List, Dict, Optional

class DuckDuckGoScraper:
    """Production-ready DuckDuckGo scraper with anti-detection measures."""
    
    def __init__(self, proxy_url: Optional[str] = None):
        self.base_url = "https://html.duckduckgo.com/html/"
        self.proxy_url = proxy_url
        self.session = requests.Session()
        
        self.user_agents = [
            "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 Chrome/120.0.0.0",
            "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) Safari/605.1.15",
            "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:121.0) Firefox/121.0",
        ]
    
    def _get_headers(self) -> Dict[str, str]:
        return {
            "User-Agent": random.choice(self.user_agents),
            "Accept": "text/html,application/xhtml+xml",
            "Accept-Language": "en-US,en;q=0.5",
        }
    
    def _make_request(self, params: Dict) -> Optional[str]:
        proxies = None
        if self.proxy_url:
            proxies = {"http": self.proxy_url, "https": self.proxy_url}
        
        time.sleep(random.uniform(1, 2))
        
        try:
            response = self.session.get(
                self.base_url,
                params=params,
                headers=self._get_headers(),
                proxies=proxies,
                timeout=30
            )
            response.raise_for_status()
            return response.text
        except requests.RequestException as e:
            print(f"Request failed: {e}")
            return None
    
    def _parse_results(self, html: str) -> tuple:
        soup = BeautifulSoup(html, "html.parser")
        results = []
        
        for element in soup.select("#links .result"):
            title_link = element.select_one(".result__a")
            if not title_link:
                continue
            
            url = title_link.get("href", "")
            if url.startswith("//"):
                url = "https:" + url
            
            results.append({
                "title": title_link.get_text(strip=True),
                "url": url,
                "snippet": element.select_one(".result__snippet").get_text(strip=True) if element.select_one(".result__snippet") else ""
            })
        
        # Get next page params
        next_form = soup.select_one(".nav-link form")
        next_params = None
        
        if next_form:
            next_params = {}
            for inp in next_form.select("input"):
                if inp.get("name"):
                    next_params[inp.get("name")] = inp.get("value", "")
        
        return results, next_params
    
    def scrape(self, query: str, max_pages: int = 1) -> List[Dict]:
        all_results = []
        params = {"q": query}
        
        for page in range(max_pages):
            html = self._make_request(params)
            if not html:
                break
            
            results, next_params = self._parse_results(html)
            all_results.extend(results)
            
            if not next_params:
                break
            
            params = next_params
            print(f"Scraped page {page + 1}, total results: {len(all_results)}")
        
        return all_results
    
    def save_csv(self, results: List[Dict], filename: str):
        if not results:
            return
        
        with open(filename, "w", newline="", encoding="utf-8") as f:
            writer = csv.DictWriter(f, fieldnames=results[0].keys())
            writer.writeheader()
            writer.writerows(results)


if __name__ == "__main__":
    scraper = DuckDuckGoScraper()
    results = scraper.scrape("best programming languages 2024", max_pages=3)
    scraper.save_csv(results, "output.csv")
    print(f"Done! Scraped {len(results)} results")

This class-based approach keeps code organized and makes it easy to add features like proxy rotation.

Conclusion

You now have three reliable ways to scrape DuckDuckGo search results:

HTTP requests with BeautifulSoup work best for the static version. This approach is fast, lightweight, and handles most use cases.

Browser automation with Playwright handles the dynamic JavaScript version. Use this when you need AI summaries or other dynamic content.

The DDGS library provides a quick solution for simple scraping tasks. It's perfect for prototyping or one-off data collection.

For production scraping at scale, combine these techniques with rotating proxies and respect DuckDuckGo's servers with appropriate delays.

Start with the static version scraper. It covers 90% of use cases and runs much faster than browser automation.

FAQ

Web scraping public information is generally legal. However, you should review DuckDuckGo's terms of service and robots.txt. Avoid overwhelming their servers with excessive requests.

Why do I get 403 errors when scraping DuckDuckGo?

DuckDuckGo returns 403 errors when it detects automated requests. Add a realistic User-Agent header to your requests. If blocks persist, implement request delays and consider using rotating proxies.

How many results can I scrape from DuckDuckGo?

The static version returns about 30 results per page. You can paginate through multiple pages to collect more. Practical limits depend on rate limiting and your proxy infrastructure.

Should I use the static or dynamic version?

Use the static version at html.duckduckgo.com unless you specifically need JavaScript-rendered features like AI summaries. The static version is faster and easier to scrape.

How do I avoid getting blocked?

Implement random delays between requests, rotate User-Agent strings, and use rotating residential proxies for larger projects. Keep request rates reasonable and handle errors gracefully with exponential backoff.

]]>
<![CDATA[How to Bypass Forter anti-bot in 2026]]>https://roundproxies.com/blog/bypass-forter/696c360626f439f88a95b28cSun, 18 Jan 2026 01:27:44 GMTForter stands apart from traditional anti-bot solutions like Cloudflare or Akamai. Unlike conventional WAFs that block requests at the network level, Forter operates as an identity intelligence platform embedded within e-commerce checkout flows.

This makes bypassing Forter particularly challenging.

The system analyzes over 6,000 data points per transaction, tracks behavioral patterns across 1.5 billion identities, and integrates directly with payment processors. Understanding how Forter works is essential before attempting any bypass strategy.

In this guide, you'll learn exactly how Forter detects automated activity and the proven techniques to bypass its protection systems.

Quick Answer: How to Bypass Forter

Bypassing Forter requires a multi-layered approach combining stealth browser automation, realistic behavioral simulation, consistent fingerprint spoofing, residential proxy usage, and proper session management. Unlike traditional anti-bots, Forter tracks behavioral patterns throughout your entire session, making simple request-level bypasses ineffective. Success requires mimicking legitimate human behavior from the first page load through checkout completion.

What Makes Forter Different From Traditional Anti-Bots?

Forter isn't a traditional web application firewall. It's a fraud prevention platform that evaluates trust at the transaction level rather than blocking requests outright.

Here's what sets it apart:

Forter's JavaScript SDK runs on every page of protected sites, collecting mouse movements, click patterns, scroll behavior, and keystroke dynamics. This behavioral data feeds into machine learning models trained on over one billion legitimate user sessions.

The system also performs deep device fingerprinting, VPN detection, IP reputation scoring, and cookie analysis. All this data gets processed in real-time to generate approve/decline decisions during checkout.

Traditional anti-bots block suspicious traffic immediately. Forter lets you browse freely but flags your session for review when you attempt a transaction.

This fundamental difference changes how you approach bypassing it.

How Forter Detects Automated Activity

Before diving into bypass methods, you need to understand Forter's detection layers:

JavaScript Token Generation

Forter's JavaScript snippet generates a ForterTokenCookie on every session. This cookie contains encrypted behavioral data and device fingerprints.

Without a valid token, transactions get flagged immediately.

The token tracks everything from how you move your mouse to how long you spend on each page. Even perfect fingerprint spoofing fails if the token contains suspicious behavioral patterns.

Behavioral Analysis

Forter monitors these behavioral signals in real-time:

Mouse movement patterns and velocity, click timing and positioning, scroll behavior and page navigation, form filling speed and patterns, and keystroke dynamics.

Bots typically exhibit mechanical precision. Humans show natural variation.

This analysis runs continuously from landing page through checkout.

Device Fingerprinting

The system creates unique identifiers using Canvas rendering, WebGL parameters, audio context fingerprints, font enumeration, screen resolution and color depth, and timezone and language settings.

Inconsistencies between fingerprint components raise red flags.

Network Intelligence

Forter maintains databases of datacenter IP ranges, known VPN providers, residential proxy services, and previously flagged addresses.

They also analyze TLS fingerprints to detect automation tools that don't match their claimed browser identity.

Step 1: Set Up Stealth Browser Automation

Standard browser automation tools get detected within seconds. You need a fortified setup.

Why Regular Selenium Fails

Default Selenium and Puppeteer leave obvious traces. The navigator.webdriver property returns true, Chrome runs with automation flags visible, and missing browser attributes expose the automation.

Here's what happens with vanilla Puppeteer:

// This gets detected instantly
const puppeteer = require('puppeteer');

const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://target-site.com');
// Forter's JS immediately flags this session

The browser fingerprint screams "automation tool" before you even interact with the page.

Solution: Puppeteer Extra with Stealth Plugin

Install the stealth plugin to patch known detection vectors:

npm install puppeteer puppeteer-extra puppeteer-extra-plugin-stealth

Configure it properly:

const puppeteer = require('puppeteer-extra');
const StealthPlugin = require('puppeteer-extra-plugin-stealth');

// Apply stealth patches
puppeteer.use(StealthPlugin());

async function createStealthBrowser() {
    const browser = await puppeteer.launch({
        headless: false, // Headless mode is easier to detect
        args: [
            '--no-sandbox',
            '--disable-setuid-sandbox',
            '--disable-blink-features=AutomationControlled',
            '--disable-infobars',
            '--window-size=1920,1080'
        ]
    });
    
    return browser;
}

The stealth plugin patches over 30 detection vectors including navigator.webdriver, Chrome automation flags, missing browser plugins, and iframe contentWindow inconsistencies.

Alternative: Playwright with Stealth

Playwright offers better performance for some use cases:

npm install playwright playwright-extra playwright-extra-plugin-stealth

Implementation:

const { chromium } = require('playwright-extra');
const stealth = require('playwright-extra-plugin-stealth')();

chromium.use(stealth);

async function createPlaywrightBrowser() {
    const browser = await chromium.launch({
        headless: false,
        channel: 'chrome' // Use real Chrome instead of bundled
    });
    
    const context = await browser.newContext({
        viewport: { width: 1920, height: 1080 },
        userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36'
    });
    
    return { browser, context };
}

Playwright's architecture makes certain fingerprint patches more reliable than Puppeteer's approach.

Step 2: Implement Realistic Behavioral Patterns

Forter's behavioral analysis catches most amateur bypass attempts. Mechanical precision is the biggest tell.

Mouse Movement Simulation

Real humans don't move cursors in straight lines. They exhibit micro-corrections, velocity changes, and occasional overshoots.

async function humanMouseMove(page, targetX, targetY) {
    const startPos = await page.evaluate(() => {
        return { x: window.mouseX || 0, y: window.mouseY || 0 };
    });
    
    const steps = Math.floor(Math.random() * 20) + 25;
    const duration = Math.random() * 400 + 200;
    
    for (let i = 0; i <= steps; i++) {
        const progress = i / steps;
        
        // Add natural curve using bezier-like interpolation
        const noise = (Math.random() - 0.5) * 10;
        const easeProgress = easeOutQuad(progress);
        
        const currentX = startPos.x + (targetX - startPos.x) * easeProgress + noise;
        const currentY = startPos.y + (targetY - startPos.y) * easeProgress + noise;
        
        await page.mouse.move(currentX, currentY);
        await sleep(duration / steps);
    }
}

function easeOutQuad(t) {
    return t * (2 - t);
}

function sleep(ms) {
    return new Promise(resolve => setTimeout(resolve, ms));
}

The key elements here are variable step counts, curved trajectories, random micro-noise on positions, and natural easing functions.

Click Timing Variation

Clicks should vary in timing and include occasional misses:

async function humanClick(page, selector) {
    const element = await page.$(selector);
    const box = await element.boundingBox();
    
    // Don't always click dead center
    const offsetX = (Math.random() - 0.5) * box.width * 0.4;
    const offsetY = (Math.random() - 0.5) * box.height * 0.4;
    
    const targetX = box.x + box.width / 2 + offsetX;
    const targetY = box.y + box.height / 2 + offsetY;
    
    await humanMouseMove(page, targetX, targetY);
    
    // Variable delay before clicking
    await sleep(Math.random() * 150 + 50);
    
    await page.mouse.down();
    await sleep(Math.random() * 100 + 30); // Hold duration varies
    await page.mouse.up();
}

Real users have variable reaction times and don't release clicks instantaneously.

Form Input Simulation

Typing speed and rhythm matter. Each keystroke should have natural timing:

async function humanType(page, selector, text) {
    await page.focus(selector);
    await sleep(Math.random() * 200 + 100);
    
    for (const char of text) {
        const delay = getKeystrokeDelay(char);
        await page.keyboard.type(char, { delay });
    }
}

function getKeystrokeDelay(char) {
    // Different characters have different typing speeds
    const baseDelay = 80;
    const variance = 60;
    
    // Uppercase takes longer (shift key)
    if (char === char.toUpperCase() && char !== char.toLowerCase()) {
        return baseDelay + variance + Math.random() * 100;
    }
    
    // Numbers often slower
    if (/\d/.test(char)) {
        return baseDelay + Math.random() * variance + 30;
    }
    
    return baseDelay + Math.random() * variance;
}

The timing differences between character types mimic how humans actually type on physical keyboards.

Step 3: Spoof Device Fingerprints Consistently

Fingerprint spoofing requires consistency across all checked attributes. One mismatch invalidates your entire profile.

Canvas Fingerprint Modification

Canvas fingerprinting measures how your browser renders graphics. Override it with controlled noise:

async function applyCanvasSpoof(page) {
    await page.evaluateOnNewDocument(() => {
        const originalToDataURL = HTMLCanvasElement.prototype.toDataURL;
        const originalGetImageData = CanvasRenderingContext2D.prototype.getImageData;
        
        HTMLCanvasElement.prototype.toDataURL = function(type) {
            if (this.width === 0 || this.height === 0) {
                return originalToDataURL.apply(this, arguments);
            }
            
            const ctx = this.getContext('2d');
            if (ctx) {
                // Add subtle noise to each pixel
                const imageData = ctx.getImageData(0, 0, this.width, this.height);
                const seed = 12345; // Keep consistent per session
                
                for (let i = 0; i < imageData.data.length; i += 4) {
                    const noise = (seededRandom(seed + i) - 0.5) * 2;
                    imageData.data[i] = Math.min(255, Math.max(0, imageData.data[i] + noise));
                }
                
                ctx.putImageData(imageData, 0, 0);
            }
            
            return originalToDataURL.apply(this, arguments);
        };
        
        function seededRandom(seed) {
            const x = Math.sin(seed) * 10000;
            return x - Math.floor(x);
        }
    });
}

Use the same seed throughout a session for consistency.

WebGL Fingerprint Handling

WebGL exposes GPU information. Mask it carefully:

async function applyWebGLSpoof(page) {
    await page.evaluateOnNewDocument(() => {
        const getParameterOld = WebGLRenderingContext.prototype.getParameter;
        
        WebGLRenderingContext.prototype.getParameter = function(parameter) {
            // Mask renderer and vendor
            if (parameter === 37445) { // UNMASKED_VENDOR_WEBGL
                return 'Google Inc. (NVIDIA)';
            }
            if (parameter === 37446) { // UNMASKED_RENDERER_WEBGL
                return 'ANGLE (NVIDIA, NVIDIA GeForce GTX 1060 Direct3D11 vs_5_0 ps_5_0)';
            }
            return getParameterOld.apply(this, arguments);
        };
        
        // Apply same for WebGL2
        if (typeof WebGL2RenderingContext !== 'undefined') {
            WebGL2RenderingContext.prototype.getParameter = 
                WebGLRenderingContext.prototype.getParameter;
        }
    });
}

Match the GPU info to a common consumer card that aligns with your User-Agent.

Audio Fingerprint Spoofing

Audio context fingerprinting is increasingly common:

async function applyAudioSpoof(page) {
    await page.evaluateOnNewDocument(() => {
        const originalGetChannelData = AudioBuffer.prototype.getChannelData;
        
        AudioBuffer.prototype.getChannelData = function(channel) {
            const results = originalGetChannelData.apply(this, arguments);
            
            // Add consistent noise
            const seed = 54321;
            for (let i = 0; i < results.length; i++) {
                const noise = (seededRandom(seed + i) - 0.5) * 0.0001;
                results[i] = results[i] + noise;
            }
            
            return results;
        };
        
        function seededRandom(seed) {
            const x = Math.sin(seed) * 10000;
            return x - Math.floor(x);
        }
    });
}

The noise level must be imperceptible to functionality but sufficient to alter the hash.

Step 4: Configure Residential Proxies

Datacenter IPs get flagged instantly by Forter's network intelligence. Residential proxies are essential.

Why Residential Proxies Matter

Forter maintains extensive databases of datacenter IP ranges and known proxy services. Datacenter IPs rarely appear in legitimate e-commerce traffic.

Residential proxies route through actual ISP networks, making traffic indistinguishable from real consumers.

For reliable residential proxy solutions, providers like Roundproxies.com offer datacenter, residential, ISP, and mobile proxy options that work well for e-commerce automation.

Proxy Implementation

Configure your browser to use rotating residential proxies:

const puppeteer = require('puppeteer-extra');
const StealthPlugin = require('puppeteer-extra-plugin-stealth');

puppeteer.use(StealthPlugin());

async function createProxiedBrowser() {
    const proxyServer = 'http://user:pass@proxy.example.com:8080';
    
    const browser = await puppeteer.launch({
        headless: false,
        args: [
            `--proxy-server=${proxyServer}`,
            '--no-sandbox',
            '--disable-setuid-sandbox'
        ]
    });
    
    return browser;
}

Proxy Session Management

Maintain the same IP throughout a complete session. IP changes mid-session trigger fraud alerts:

class ProxySession {
    constructor(proxyUrl) {
        this.proxyUrl = proxyUrl;
        this.sessionId = this.generateSessionId();
    }
    
    generateSessionId() {
        return `session_${Date.now()}_${Math.random().toString(36).substr(2, 9)}`;
    }
    
    getProxyWithSession() {
        // Many proxy providers support sticky sessions
        const url = new URL(this.proxyUrl);
        url.username = `${url.username}-session-${this.sessionId}`;
        return url.toString();
    }
}

Sticky sessions ensure the same exit IP for your entire browsing session.

Step 5: Handle the ForterTokenCookie

The ForterTokenCookie is Forter's primary session identifier. It contains encrypted behavioral and fingerprint data.

Understanding Token Generation

Forter's JavaScript generates this cookie through client-side processing. The token includes session duration, pages visited, interaction counts, fingerprint hashes, and behavioral metrics.

You cannot simply copy a token from a legitimate session. The encrypted data must match your current session's behavior.

Ensuring Valid Token Generation

Allow the JavaScript to execute properly and generate authentic tokens:

async function establishForterSession(page, siteUrl) {
    // Navigate to homepage first
    await page.goto(siteUrl, { waitUntil: 'networkidle2' });
    
    // Wait for Forter script to load and execute
    await page.waitForFunction(() => {
        return document.cookie.includes('forterToken') || 
               document.cookie.includes('_forter');
    }, { timeout: 10000 });
    
    // Simulate realistic browsing behavior
    await simulateBrowsing(page);
    
    // Verify token exists
    const cookies = await page.cookies();
    const forterCookie = cookies.find(c => 
        c.name.toLowerCase().includes('forter')
    );
    
    if (!forterCookie) {
        throw new Error('Forter token not generated');
    }
    
    return forterCookie;
}

async function simulateBrowsing(page) {
    // Scroll naturally
    await page.evaluate(async () => {
        const scrollHeight = document.documentElement.scrollHeight;
        let currentPosition = 0;
        
        while (currentPosition < scrollHeight * 0.7) {
            currentPosition += Math.random() * 200 + 50;
            window.scrollTo(0, currentPosition);
            await new Promise(r => setTimeout(r, Math.random() * 300 + 100));
        }
    });
    
    // Random pauses
    await sleep(Math.random() * 2000 + 1000);
}

The token quality directly correlates with how natural your pre-checkout behavior appears.

Step 6: Build Complete Session Flow

All these components must work together in a coherent session flow.

Full Implementation Example

Here's a complete example combining all techniques:

const puppeteer = require('puppeteer-extra');
const StealthPlugin = require('puppeteer-extra-plugin-stealth');

puppeteer.use(StealthPlugin());

class ForterBypass {
    constructor(proxyUrl) {
        this.proxyUrl = proxyUrl;
        this.browser = null;
        this.page = null;
    }
    
    async initialize() {
        this.browser = await puppeteer.launch({
            headless: false,
            args: [
                `--proxy-server=${this.proxyUrl}`,
                '--no-sandbox',
                '--disable-setuid-sandbox',
                '--disable-blink-features=AutomationControlled',
                '--window-size=1920,1080'
            ]
        });
        
        this.page = await this.browser.newPage();
        
        // Apply all fingerprint spoofs
        await this.applyFingerprints();
        
        // Set realistic viewport
        await this.page.setViewport({
            width: 1920,
            height: 1080,
            deviceScaleFactor: 1
        });
    }
    
    async applyFingerprints() {
        await applyCanvasSpoof(this.page);
        await applyWebGLSpoof(this.page);
        await applyAudioSpoof(this.page);
    }
    
    async browseProduct(productUrl) {
        // Navigate naturally
        await this.page.goto(productUrl, { waitUntil: 'networkidle2' });
        
        // Establish behavioral baseline
        await this.simulateProductView();
        
        // Add to cart with human-like interaction
        await humanClick(this.page, '[data-add-to-cart]');
        
        await sleep(Math.random() * 1000 + 500);
    }
    
    async simulateProductView() {
        // Scroll through product images
        await this.page.evaluate(async () => {
            const images = document.querySelectorAll('img');
            for (const img of images.slice(0, 5)) {
                img.scrollIntoView({ behavior: 'smooth' });
                await new Promise(r => setTimeout(r, Math.random() * 500 + 200));
            }
        });
        
        // Simulate reading product description
        await sleep(Math.random() * 3000 + 2000);
    }
    
    async proceedToCheckout() {
        await humanClick(this.page, '[data-checkout-button]');
        await this.page.waitForNavigation({ waitUntil: 'networkidle2' });
        
        // Fill checkout form with natural timing
        await this.fillCheckoutForm();
    }
    
    async fillCheckoutForm() {
        const formData = {
            email: 'example@email.com',
            firstName: 'John',
            lastName: 'Smith',
            address: '123 Main Street',
            city: 'New York',
            zip: '10001'
        };
        
        for (const [field, value] of Object.entries(formData)) {
            const selector = `[name="${field}"], [id="${field}"]`;
            await humanClick(this.page, selector);
            await sleep(Math.random() * 300 + 100);
            await humanType(this.page, selector, value);
            await sleep(Math.random() * 500 + 200);
        }
    }
    
    async cleanup() {
        if (this.browser) {
            await this.browser.close();
        }
    }
}

This structure maintains behavioral consistency throughout the entire session.

Step 7: Handle TLS Fingerprinting

TLS fingerprinting is often overlooked but critically important. Your browser's TLS handshake reveals information that Forter uses for verification.

Understanding TLS Fingerprints

When your browser establishes an HTTPS connection, it sends a Client Hello message containing cipher suites, extensions, and protocol preferences.

Each browser and automation tool has a unique TLS fingerprint. Python's requests library looks nothing like Chrome. Selenium with ChromeDriver differs from actual Chrome.

Why TLS Matters for Forter

Forter correlates TLS fingerprints with User-Agent strings. If you claim to be Chrome but your TLS fingerprint matches a Python script, instant detection.

This happens because many developers spoof User-Agent headers but forget about lower-level network signatures.

Solution: Use Real Browser Network Stack

The safest approach is using actual browser automation rather than HTTP libraries:

// Good: Real browser TLS fingerprint
const browser = await puppeteer.launch();

// Bad: Python requests with spoofed User-Agent
// The TLS fingerprint still exposes you

For cases where browser automation isn't practical, libraries like curl-impersonate or tls-client can mimic browser TLS fingerprints:

# Python example using tls_client
import tls_client

session = tls_client.Session(
    client_identifier="chrome_120",
    random_tls_extension_order=True
)

response = session.get(
    "https://target-site.com",
    headers={
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"
    }
)

This approach matches Chrome's TLS fingerprint while still using Python for automation logic.

Step 8: Implement Session Persistence

Forter tracks sessions across multiple visits. Proper session management prevents behavioral inconsistencies.

Save and restore cookies between sessions to maintain identity consistency:

const fs = require('fs');

class SessionManager {
    constructor(sessionFile) {
        this.sessionFile = sessionFile;
    }
    
    async saveCookies(page) {
        const cookies = await page.cookies();
        fs.writeFileSync(
            this.sessionFile, 
            JSON.stringify(cookies, null, 2)
        );
    }
    
    async loadCookies(page) {
        if (!fs.existsSync(this.sessionFile)) {
            return false;
        }
        
        const cookies = JSON.parse(
            fs.readFileSync(this.sessionFile, 'utf8')
        );
        
        // Filter expired cookies
        const validCookies = cookies.filter(cookie => {
            if (!cookie.expires) return true;
            return cookie.expires > Date.now() / 1000;
        });
        
        await page.setCookie(...validCookies);
        return true;
    }
    
    async clearSession() {
        if (fs.existsSync(this.sessionFile)) {
            fs.unlinkSync(this.sessionFile);
        }
    }
}

Maintaining cookies shows returning visitor behavior, which Forter considers less risky.

localStorage and sessionStorage

Some fingerprint data persists in browser storage. Handle it appropriately:

async function restoreStorageData(page, storageData) {
    await page.evaluateOnNewDocument((data) => {
        if (data.localStorage) {
            Object.keys(data.localStorage).forEach(key => {
                localStorage.setItem(key, data.localStorage[key]);
            });
        }
        if (data.sessionStorage) {
            Object.keys(data.sessionStorage).forEach(key => {
                sessionStorage.setItem(key, data.sessionStorage[key]);
            });
        }
    }, storageData);
}

async function saveStorageData(page) {
    return await page.evaluate(() => ({
        localStorage: { ...localStorage },
        sessionStorage: { ...sessionStorage }
    }));
}

This maintains consistency across browsing sessions.

Common Pitfalls to Avoid

Several mistakes consistently lead to detection.

Headless Mode Detection

Running in headless mode exposes multiple detection vectors. Even with stealth plugins, headed mode provides significantly better success rates.

If you must use headless, configure it properly:

const browser = await puppeteer.launch({
    headless: 'new', // Use new headless mode
    args: [
        '--disable-blink-features=AutomationControlled',
        '--disable-features=IsolateOrigins,site-per-process'
    ]
});

The newer headless mode has fewer detectable differences from headed browsers.

Timing Inconsistencies

Bots complete actions too quickly. Add realistic delays between all actions:

async function realisticDelay() {
    // Mix of short and long pauses
    const delays = [100, 200, 500, 800, 1500, 2000];
    const delay = delays[Math.floor(Math.random() * delays.length)];
    const variance = Math.random() * delay * 0.3;
    
    await sleep(delay + variance);
}

IP and Fingerprint Mismatch

Your IP geolocation must align with timezone, language settings, and other locale indicators. A US IP with European timezone settings triggers immediate flags.

Configure everything consistently:

await page.emulateTimezone('America/New_York');
await page.setExtraHTTPHeaders({
    'Accept-Language': 'en-US,en;q=0.9'
});

Monitoring and Adaptation

Forter continuously updates its detection methods. What works today may fail tomorrow.

Testing Your Setup

Before production use, verify your bypass works:

async function testFingerprintConsistency(page) {
    // Navigate to a fingerprint testing site
    await page.goto('https://browserleaks.com/javascript');
    
    // Check for automation indicators
    const webdriver = await page.evaluate(() => navigator.webdriver);
    if (webdriver) {
        console.error('WebDriver detected!');
        return false;
    }
    
    // Verify plugin count
    const plugins = await page.evaluate(() => navigator.plugins.length);
    if (plugins === 0) {
        console.error('No plugins detected - suspicious');
        return false;
    }
    
    return true;
}

Run these checks before attempting actual bypasses.

Rotating Fingerprint Profiles

Use different fingerprint configurations for different sessions:

const profiles = [
    { 
        userAgent: 'Mozilla/5.0 (Windows NT 10.0...',
        viewport: { width: 1920, height: 1080 },
        timezone: 'America/New_York'
    },
    {
        userAgent: 'Mozilla/5.0 (Macintosh...',
        viewport: { width: 2560, height: 1440 },
        timezone: 'America/Los_Angeles'
    }
];

function getRandomProfile() {
    return profiles[Math.floor(Math.random() * profiles.length)];
}

Vary profiles to avoid pattern detection across sessions.

Final Thoughts

Bypassing Forter requires a comprehensive approach addressing every detection layer simultaneously.

The critical elements are stealth browser automation, realistic behavioral simulation, consistent fingerprint spoofing, quality residential proxies, proper cookie and token handling, and geographic consistency.

No single technique works in isolation. Success comes from combining all methods into a coherent system that mimics legitimate user behavior.

Forter's machine learning models improve continuously. Regular testing and adaptation of your approach remains essential for long-term success.

Frequently Asked Questions

What is Forter Anti-Bot?

Forter Anti-Bot is a fraud prevention and identity intelligence system used by e-commerce websites to detect automated activity and fraudulent transactions. Unlike traditional WAFs, Forter operates at the transaction level, analyzing behavioral patterns, device fingerprints, and network signals to approve or decline purchases in real-time.

Is Forter the Same as Cloudflare?

No. Cloudflare blocks suspicious traffic at the network level before it reaches the website. Forter operates differently by allowing traffic through but monitoring user behavior throughout the session. It makes approve/decline decisions during checkout rather than blocking access upfront.

Why Does Forter Block My Automation?

Forter blocks automation because bots exhibit mechanical behavior patterns. Common detection triggers include unrealistic mouse movement, instant form completion, headless browser signatures, datacenter IP addresses, and inconsistent fingerprint attributes.

Can I Use Free Proxies to Bypass Forter?

Free proxies almost never work against Forter. The system maintains databases of known proxy IP ranges and flags traffic from datacenter IPs. Residential proxies from reputable providers are essential for reliable bypass attempts.

Do VPNs Help Bypass Forter Detection?

Standard VPNs typically don't help. Forter detects VPN IP addresses through multiple methods including IP reputation databases and network analysis. VPN traffic patterns often differ from regular residential traffic, making detection straightforward.

How Long Does Forter Track Sessions?

Forter tracks sessions continuously from the first page load through checkout completion. The system also correlates data across multiple sessions using persistent identifiers, device fingerprints, and behavioral patterns.

What Programming Languages Work Best?

JavaScript with Puppeteer or Playwright provides the best results because these tools use real browser engines with authentic network stacks. Python solutions require additional libraries for TLS fingerprint spoofing to avoid detection.

Does Forter Use Machine Learning?

Yes. Forter employs machine learning models trained on over 1.5 billion legitimate user sessions. These models continuously improve at distinguishing automated activity from human behavior, making static bypass methods increasingly ineffective over time.

]]>
<![CDATA[What Is IP Rotation? How it works and why you need it]]>https://roundproxies.com/blog/ip-rotation/6951d5eb26f439f88a95ab0bSat, 17 Jan 2026 02:43:21 GMTIP rotation is the process of automatically changing your device's IP address at set intervals, after specific requests, or with each new connection. This technique prevents websites from tracking your online activity, helps you avoid IP-based blocks, and enables large-scale data collection without detection.

Whether you're scraping competitor prices, running SEO audits, or simply want more privacy while browsing, understanding how IP rotation works gives you a significant advantage online.

Table of Contents

  • What Is IP Rotation?
  • Why IP Rotation Matters
  • How IP Rotation Works
  • Types of IP Rotation Methods
  • How to Implement IP Rotation (With Code Examples)
  • Common Use Cases for IP Rotation
  • IP Rotation vs Proxy Rotation: What's the Difference?
  • Best Practices for Effective IP Rotation
  • FAQ
  • Final Thoughts

What Is IP Rotation?

An IP address is a unique numerical identifier assigned to every device connected to the internet. Think of it as your device's home address online.

IP rotation changes this address automatically based on predefined rules. Instead of sending all your requests from one static IP, your connection cycles through multiple addresses.

This makes your traffic appear to originate from different devices or locations.

There are several ways IP rotation can be triggered:

  • Time-based: Your IP changes every few minutes or hours
  • Request-based: A new IP is assigned after each request or batch of requests
  • Session-based: Your IP remains stable for a browsing session, then changes
  • Random rotation: IPs are selected randomly from a pool with each connection

The rotation process can be handled by your Internet Service Provider (ISP), a VPN service, or a proxy provider like Roundproxies.com.

Why IP Rotation Matters

Your IP address reveals more about you than most people realize. It exposes your approximate location, your ISP, and enough technical details for third parties to track your activity across sessions.

Here's why rotating your IP addresses matters:

Prevents Tracking and Profiling

Advertisers and data brokers use your IP to build behavioral profiles. By cycling through different addresses, you disrupt their ability to connect your browsing sessions together.

Each request appears to come from a different user entirely.

Avoids IP-Based Blocks

Websites implement rate limits to prevent automated access. If you send too many requests from one IP, you'll get blocked.

IP rotation distributes your requests across multiple addresses. This keeps each individual IP under the detection threshold.

Bypasses Geo-Restrictions

Some content is locked to specific regions. Rotating through IPs from different countries lets you access location-specific data for market research, price comparison, or competitive analysis.

Reduces Cyberattack Risk

A static IP makes you an easier target for DDoS attacks. Attackers can flood your specific address with traffic to disrupt your service.

Rotating IPs forces attackers to constantly identify your new address, making sustained attacks more difficult.

Enables Large-Scale Data Collection

For web scraping operations, IP rotation is essential. Without it, target websites quickly identify and block your scraper based on suspicious request patterns from a single IP.

How IP Rotation Works

The mechanics behind IP rotation are straightforward once you understand the components involved.

The Basic Process

  1. You send a request to access a website
  2. Your rotation system (VPN, proxy, or ISP) intercepts this request
  3. The system assigns an IP address from its available pool
  4. Your request reaches the target website with the assigned IP
  5. Based on your rotation rules, the system either keeps or changes the IP for subsequent requests

IP Pools

Rotation systems maintain pools of available IP addresses. These pools can contain:

  • Datacenter IPs: Fast and affordable, but easier to detect
  • Residential IPs: Assigned by real ISPs, appear as regular home users
  • Mobile IPs: From cellular networks, highest trust level
  • ISP Proxies: Static residential IPs combining speed with legitimacy

The quality and size of your IP pool directly impacts your success rate. Larger pools with diverse geographic distribution provide better results.

Rotation Triggers

Your system needs rules to determine when to switch IPs. Common triggers include:

  • Fixed time intervals (every 5 minutes, every hour)
  • After a specific number of requests
  • When a block or CAPTCHA is detected
  • At the start of each new session
  • Based on intelligent algorithms monitoring response patterns

Types of IP Rotation Methods

Different situations call for different rotation approaches. Here are the main methods you can implement:

VPN Rotation

VPN services mask your IP by routing traffic through their servers. Some VPNs offer automatic IP rotation every few minutes without disconnecting you.

This method encrypts your traffic while changing your IP. The downside is that VPN IPs are often shared among many users, which can trigger additional verification from some websites.

Proxy Rotation

Rotating proxies assign a fresh IP from their pool with each request or session. You get a single endpoint (gateway) that handles all the IP switching automatically.

Proxy providers like Roundproxies.com offer different proxy types including Residential Proxies, Datacenter Proxies, ISP Proxies, and Mobile Proxies. Each type has specific advantages depending on your use case.

Residential proxies route your requests through real household connections, making detection extremely difficult.

Sticky Sessions

Sometimes you need IP consistency within a session while still rotating between sessions. Sticky sessions maintain the same IP for a defined duration or until you complete a specific task.

This approach works well for activities requiring login persistence or multi-step processes that validate session consistency.

Burst Rotation

Burst rotation changes your IP after a set number of requests rather than a time interval. If a website blocks IPs after 50 requests, you can configure rotation to switch after every 40 requests.

This method optimizes your IP usage while staying below detection thresholds.

Intelligent Rotation

Advanced rotation systems use algorithms to determine optimal switching times. They monitor response patterns, detect blocks in real-time, and automatically rotate before problems occur.

Intelligent rotation adapts to each target website's anti-bot measures dynamically.

How to Implement IP Rotation (With Code Examples)

Let's look at practical implementations using Python. These examples demonstrate the core concepts you can adapt to your specific needs.

Method 1: Manual Proxy Rotation with Requests

The simplest approach uses the Python requests library with a list of proxies.

import requests
import random

# Define your proxy list
proxy_list = [
    "http://user:pass@proxy1.example.com:8080",
    "http://user:pass@proxy2.example.com:8080",
    "http://user:pass@proxy3.example.com:8080",
    "http://user:pass@proxy4.example.com:8080",
]

def get_random_proxy():
    """Select a random proxy from the list."""
    proxy = random.choice(proxy_list)
    return {"http": proxy, "https": proxy}

def make_request(url):
    """Make a request through a randomly selected proxy."""
    proxy = get_random_proxy()
    try:
        response = requests.get(url, proxies=proxy, timeout=10)
        return response.text
    except requests.exceptions.RequestException as e:
        print(f"Request failed: {e}")
        return None

# Example usage
target_url = "https://httpbin.org/ip"
for i in range(5):
    result = make_request(target_url)
    print(f"Request {i+1}: {result}")

This script randomly selects a different proxy for each request. The httpbin.org/ip endpoint returns the IP address that made the request, letting you verify rotation is working.

Method 2: Request-Based Rotation

For more control, you can implement rotation after a specific number of requests.

import requests
from itertools import cycle

class ProxyRotator:
    def __init__(self, proxy_list, rotate_every=10):
        self.proxy_cycle = cycle(proxy_list)
        self.rotate_every = rotate_every
        self.request_count = 0
        self.current_proxy = next(self.proxy_cycle)
    
    def get_proxy(self):
        """Get current proxy, rotating after set number of requests."""
        self.request_count += 1
        
        if self.request_count >= self.rotate_every:
            self.current_proxy = next(self.proxy_cycle)
            self.request_count = 0
            print(f"Rotated to new proxy: {self.current_proxy}")
        
        return {"http": self.current_proxy, "https": self.current_proxy}
    
    def make_request(self, url):
        """Execute request with current proxy."""
        proxy = self.get_proxy()
        response = requests.get(url, proxies=proxy, timeout=10)
        return response

# Initialize rotator
proxies = [
    "http://user:pass@proxy1.example.com:8080",
    "http://user:pass@proxy2.example.com:8080",
]
rotator = ProxyRotator(proxies, rotate_every=5)

# Make 15 requests - proxy will rotate every 5
for i in range(15):
    response = rotator.make_request("https://httpbin.org/ip")
    print(f"Request {i+1}: {response.json()}")

The ProxyRotator class tracks request counts and automatically switches to the next proxy after the specified threshold.

Method 3: Browser-Based Rotation with Selenium

For JavaScript-heavy websites that require browser rendering, combine rotation with Selenium.

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
import random

def create_driver_with_proxy(proxy):
    """Create a Chrome driver configured with a proxy."""
    chrome_options = Options()
    chrome_options.add_argument(f'--proxy-server={proxy}')
    chrome_options.add_argument('--headless')
    chrome_options.add_argument('--no-sandbox')
    chrome_options.add_argument('--disable-dev-shm-usage')
    
    driver = webdriver.Chrome(options=chrome_options)
    return driver

def scrape_with_rotation(urls, proxy_list):
    """Scrape multiple URLs with proxy rotation."""
    results = []
    
    for url in urls:
        proxy = random.choice(proxy_list)
        driver = create_driver_with_proxy(proxy)
        
        try:
            driver.get(url)
            page_source = driver.page_source
            results.append({
                'url': url,
                'proxy': proxy,
                'content': page_source[:500]  # First 500 chars
            })
        finally:
            driver.quit()
    
    return results

# Example usage
proxy_list = [
    "proxy1.example.com:8080",
    "proxy2.example.com:8080",
]

urls_to_scrape = [
    "https://example.com/page1",
    "https://example.com/page2",
]

results = scrape_with_rotation(urls_to_scrape, proxy_list)

This approach creates a fresh browser instance with a new proxy for each URL. The browser is closed after each request to ensure a clean session.

Common Use Cases for IP Rotation

Different industries leverage IP rotation for various legitimate purposes.

Web Scraping and Data Collection

Businesses scrape competitor websites for pricing data, product information, and market trends. Without IP rotation, scrapers get blocked quickly.

Rotating residential IPs mimics organic user behavior, keeping your data collection running smoothly.

SEO Monitoring

SEO professionals need to check keyword rankings from multiple locations. Search engines personalize results based on location and user history.

IP rotation provides accurate, unbiased ranking data from different geographic perspectives.

Ad Verification

Advertisers verify their ads display correctly across different regions. They also check for fraudulent ad placements.

Rotating through IPs from target markets ensures accurate ad verification results.

Price Intelligence

E-commerce companies monitor competitor pricing to stay competitive. Many retailers show different prices based on user location or browsing history.

IP rotation reveals true pricing strategies without personalization bias.

Social Media Management

Marketing agencies manage multiple accounts across platforms. Using different IPs for each account prevents platform detection of multi-account usage.

Academic Research

Researchers collect public web data for analysis. IP rotation enables large-scale data gathering without disrupting target websites.

IP Rotation vs Proxy Rotation: What's the Difference?

These terms are often used interchangeably, but there's a subtle distinction.

IP rotation is the broader concept of changing your IP address regularly. This can happen through VPNs, your ISP's dynamic IP assignment, or proxy services.

Proxy rotation specifically refers to cycling through different proxy servers to achieve IP rotation. It's one method of implementing IP rotation.

In practice, when people discuss IP rotation for web scraping or data collection, they usually mean proxy rotation because dedicated proxy services offer the control and scale these activities require.

Best Practices for Effective IP Rotation

Follow these guidelines to maximize your success with IP rotation.

Combine with User-Agent Rotation

Websites track browser fingerprints alongside IP addresses. Rotate your user-agent string to appear as different browsers and devices.

import random

user_agents = [
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15",
    "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36",
]

headers = {"User-Agent": random.choice(user_agents)}

Respect Rate Limits

Even with rotation, don't hammer websites with requests. Add random delays between requests to mimic human browsing patterns.

Use Quality IP Providers

Free proxies are unreliable and often blacklisted. Invest in reputable providers with large, diverse IP pools and proper infrastructure.

Monitor Success Rates

Track how many requests succeed versus fail. If your success rate drops, investigate whether IPs are getting flagged and adjust your rotation strategy.

Match Proxy Type to Target

Datacenter proxies work for most basic scraping. High-security targets require residential or mobile proxies that appear as genuine user traffic.

FAQ

How often should I rotate my IP address?

The optimal rotation frequency depends on your target website. Start with rotating after every request, then test longer intervals. Some sites tolerate 10-20 requests per IP while others flag repeated access immediately.

Rotating IPs itself is legal. However, how you use rotation matters. Scraping publicly available data is generally legal. Bypassing authentication, accessing private data, or violating terms of service can create legal issues. Always consult legal advice for your specific use case.

Can websites detect IP rotation?

Sophisticated websites analyze request patterns beyond just IP addresses. They look at browser fingerprints, request timing, mouse movements, and JavaScript execution. Effective rotation combines IP changes with other anti-detection measures.

What's the best type of proxy for IP rotation?

Residential proxies offer the best success rates because they use real ISP-assigned addresses. Datacenter proxies are faster and cheaper but easier to detect. Mobile proxies provide highest trust but cost more. Choose based on your target's detection sophistication.

Do VPNs provide IP rotation?

Some VPNs offer IP rotation features that change your address at set intervals. However, VPN IPs are often shared among thousands of users and may be flagged by websites. Dedicated proxy services provide more control for professional use cases.

Final Thoughts

IP rotation is a fundamental technique for anyone working with web data at scale. It protects your privacy, prevents blocks, and enables legitimate data collection activities.

The key to successful IP rotation lies in choosing the right method for your specific needs. Simple projects can start with basic proxy rotation using Python's requests library. More demanding applications require residential proxies with intelligent rotation algorithms.

Start with a quality proxy provider that offers the IP types matching your targets. Test different rotation intervals to find the sweet spot between speed and detection avoidance.

As websites continue improving their anti-bot measures, effective IP rotation becomes increasingly valuable. Master this technique now to stay ahead in your data collection efforts.

]]>
<![CDATA[How to use Blinko: Self-hosted AI notes guide]]>https://roundproxies.com/blog/blinko/695157ac26f439f88a95a8e5Thu, 15 Jan 2026 02:18:06 GMTTired of your notes scattered across apps that don't talk to each other? Blinko is an open-source, self-hosted note-taking tool that combines quick idea capture with AI-powered search.

In this guide, you'll learn how to install Blinko using Docker, configure its AI features with Ollama or OpenAI, and master its unique Blinko vs Notes workflow system.

What is Blinko and Why Should You Care?

Blinko is a self-hosted note-taking application that uses AI-powered RAG (Retrieval-Augmented Generation) to help you search and retrieve notes using natural language queries. Unlike cloud-based tools like Notion or Obsidian, your data stays on your own server.

Blinko Note Taking Application Self-Hosted

The app separates quick captures ("Blinkos") from permanent documentation ("Notes"). This dual-system approach prevents your knowledge base from becoming cluttered with temporary reminders.

Key features include:

  • AI-enhanced note retrieval via OpenAI or Ollama
  • Full Markdown support with rich text formatting
  • Automatic tagging and note summarization
  • RSS feed integration
  • Music player for focused writing sessions
  • Multi-platform support (web, desktop, Android)

Prerequisites Before Installing Blinko

Before diving into installation, make sure you have:

Docker installed on your machine. Visit docker.com to download and install Docker Desktop or Docker Engine.

Basic terminal knowledge for running commands and editing configuration files.

A secure secret key for authentication. Generate one with:

openssl rand -base64 32

This command outputs a random 32-byte string encoded in base64. Save this somewhere safe—you'll need it during setup.

Step 1: Install Blinko Using Docker Compose

The fastest way to deploy Blinko is with Docker Compose. This method spins up both the Blinko application and its PostgreSQL database in one go.

Create a new directory for your Blinko installation:

mkdir blinko && cd blinko

Now create a docker-compose.yml file:

nano docker-compose.yml

Paste this configuration:

networks:
  blinko-network:
    driver: bridge

services:
  blinko-website:
    image: blinkospace/blinko:latest
    container_name: blinko-website
    environment:
      NODE_ENV: production
      NEXTAUTH_SECRET: YOUR_SECURE_SECRET_HERE
      DATABASE_URL: postgresql://postgres:your_db_password@postgres:5432/postgres
    depends_on:
      postgres:
        condition: service_healthy
    volumes:
      - ./blinko-data:/app/.blinko
    restart: always
    logging:
      options:
        max-size: "10m"
        max-file: "3"
    ports:
      - 1111:1111
    healthcheck:
      test: ["CMD", "wget", "--spider", "-q", "http://blinko-website:1111/"]
      interval: 30s
      timeout: 10s
      retries: 5
      start_period: 30s
    networks:
      - blinko-network

  postgres:
    image: postgres:14
    container_name: blinko-postgres
    restart: always
    ports:
      - 5435:5432
    environment:
      POSTGRES_DB: postgres
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: your_db_password
      TZ: UTC
    volumes:
      - ./postgres-data:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD", "pg_isready", "-U", "postgres", "-d", "postgres"]
      interval: 5s
      timeout: 10s
      retries: 5
    networks:
      - blinko-network

Replace YOUR_SECURE_SECRET_HERE with the secret you generated earlier. Also change your_db_password to a strong password.

Important security note: Never use default passwords in production. The NEXTAUTH_SECRET protects your authentication tokens—using a weak value exposes your instance to attacks.

Start the containers:

docker-compose up -d

The -d flag runs containers in detached mode (background). Docker pulls the necessary images and starts both services.

Check if everything is running:

docker-compose ps

You should see both blinko-website and blinko-postgres with status "Up".

Step 2: Create Your First Account

Open your browser and navigate to http://localhost:1111/signup (replace localhost with your server's IP if accessing remotely).

Enter a username and password. This creates the admin account for your Blinko instance.

After registration, you'll land on the main dashboard. The interface splits into a sidebar for navigation and a main area for writing.

Step 3: Understand the Blinko vs Notes System

Blinko uses a dual-content system that takes some getting used to. Here's the difference:

Blinkos (Quick Captures)

The Blinko section handles fleeting thoughts. Open the app, type something, hit enter. Done.

These entries support auto-archiving. You can configure them to automatically move to archive after a set period. Perfect for:

  • Daily tasks and reminders
  • Quick ideas you'll process later
  • Temporary information
  • Meeting notes that don't need permanent storage

Notes (Long-Form Content)

The Notes section stores permanent documentation. These entries support:

  • Rich text formatting
  • Hierarchical organization with tags
  • Permanent storage without auto-archiving

Use Notes for:

  • Project documentation
  • Research findings
  • Learning materials
  • Important reference documents

The key insight: dump everything into Blinkos first, then promote valuable content to Notes during your review sessions.

Step 4: Configure AI Features with Ollama

Blinko's AI features transform how you interact with your knowledge base. Instead of keyword searches, you can ask questions like "What were my thoughts on the marketing campaign last week?"

Option A: Connect to Ollama (Local AI)

If you want complete privacy, run AI locally with Ollama.

First, install Ollama on your machine:

curl -fsSL https://ollama.com/install.sh | sh

Pull a model. For general use, Llama 3.2 works well:

ollama pull llama3.2

For the embedding model (required for RAG):

ollama pull mxbai-embed-large

Start the Ollama server:

ollama serve

By default, Ollama listens on http://localhost:11434.

Now configure Blinko. Click your username in the top-left corner, then go to Settings > AI.

Set these values:

  • Model Provider: OpenAI (Ollama uses the OpenAI-compatible API)
  • API Endpoint: http://host.docker.internal:11434/v1 (for Docker on Mac/Windows) or http://YOUR_HOST_IP:11434/v1 (for Linux)
  • Model: Type your model name manually (e.g., llama3.2)
  • Text Embedding Model: mxbai-embed-large

Click "Check Connection" to verify. If successful, toggle the AI features on.

Common pitfall: If the connection fails, make sure you're using the /v1 suffix on the endpoint URL. Also verify Ollama is accessible from within Docker's network.

Option B: Connect to OpenAI

For faster responses without local hardware requirements, use OpenAI's API.

Get an API key from platform.openai.com.

In Blinko's AI settings:

  • Model Provider: OpenAI
  • API Key: Your OpenAI API key
  • Model: gpt-4o-mini (cost-effective) or gpt-4o (better quality)
  • Text Embedding Model: text-embedding-3-small

OpenAI charges per token. Expect costs of $0.01-0.10 per day with moderate usage.

Option C: Configure HTTP Proxy for AI Services

If you're in a network-restricted environment or need to route AI requests through a proxy server, Blinko has built-in HTTP proxy support.

This is useful when:

  • Your server can't directly reach OpenAI or other AI providers
  • You need to route traffic through a corporate proxy
  • You want to use residential or datacenter proxies for reliability
  • You're in a region with restricted access to AI services

Navigate to Settings > AI and scroll down to the Proxy Settings section.

Configure these fields:

  • Enable HTTP Proxy: Toggle this on
  • Proxy Host: Your proxy server address (e.g., proxy.example.com or 192.168.1.100)
  • Proxy Port: The proxy port number (e.g., 8080, 3128)
  • Proxy Username: (Optional) For authenticated proxies
  • Proxy Password: (Optional) For authenticated proxies

For example, if you're using a residential proxy service like Roundproxies.com, you would enter the proxy credentials provided by your service.

Example proxy configuration:

Host: us.smartproxy.com
Port: 10000
Username: your_username
Password: your_password

After configuring the proxy, click "Test Connection" to verify that Blinko can reach the AI service through your proxy.

Pro tip: If you need rotating proxies for high-volume AI requests, configure a local proxy rotation service like mitmproxy or a proxy manager, then point Blinko to your local proxy endpoint.

Step 5: Build Your RAG Knowledge Base

RAG (Retrieval-Augmented Generation) lets the AI search through your notes and provide contextual answers.

After configuring AI, you need to build the embedding index. This converts your notes into vectors the AI can search.

Go to Settings > AI and click "Rebuild" or "Force Rebuild".

This process:

  1. Reads all your notes
  2. Splits them into chunks
  3. Generates embeddings for each chunk
  4. Stores embeddings in the database

For a few hundred notes, this takes under a minute. Larger collections may take longer.

Once complete, use the AI chat feature. Type natural language queries:

  • "What did I write about project deadlines?"
  • "Find my notes on Docker configuration"
  • "Summarize my thoughts on the marketing proposal"

The AI retrieves relevant notes and generates responses based on your actual content.

Step 6: Use the Blinko Snap Desktop App

For quick captures without opening your browser, install Blinko Snap.

Download from the official releases page.

After installation, configure it to connect to your Blinko server:

  1. Right-click the system tray icon
  2. Enter your server URL (e.g., http://localhost:1111)
  3. Log in with your credentials

Now press Ctrl+Shift+Space (or Cmd+Shift+Space on Mac) to instantly open a capture window. Type your thought and press Enter.

The note syncs immediately with your Blinko server. No browser tab required.

Step 7: Set Up Tags and Organization

Tags help you categorize notes without rigid folder structures.

To add a tag, type #tagname anywhere in your note. Blinko recognizes it automatically.

For visual organization, add emojis to tags: #🎯goals, #💡ideas, #📚reading.

The sidebar shows all your tags. Click one to filter notes.

Pro tip: Enable AI auto-tagging in settings. The AI suggests tags based on note content, saving you manual work.

Step 8: Configure Backups and Data Export

Blinko stores everything in PostgreSQL. To back up your data:

docker exec blinko-postgres pg_dump -U postgres postgres > blinko_backup.sql

This creates a SQL dump of your entire database.

To restore from backup:

cat blinko_backup.sql | docker exec -i blinko-postgres psql -U postgres postgres

For automated backups, set up a cron job:

0 2 * * * docker exec blinko-postgres pg_dump -U postgres postgres > /backups/blinko_$(date +\%Y\%m\%d).sql

This runs daily at 2 AM, creating timestamped backup files.

Blinko also supports export through its web interface. Go to Settings and look for import/export options.

Step 9: Enable RSS Integration

Blinko can subscribe to RSS feeds, pulling external content directly into your knowledge base.

Go to Settings > RSS and add feed URLs.

New articles from subscribed feeds appear in a dedicated section. You can read them within Blinko and save interesting articles as notes.

This transforms Blinko from a personal note-taker into a full knowledge hub.

Step 10: Share Notes Publicly

Sometimes you need to share a note with someone who doesn't have a Blinko account.

Open any note, click the share icon, and enable public sharing. Blinko generates a unique URL.

Anyone with the link can view that specific note. Your other content remains private.

Disable sharing anytime by toggling it off in the note's settings.

Advanced: Access Blinko Remotely with Reverse Proxy

Running Blinko on your home server? Access it from anywhere with a reverse proxy.

Using Nginx, create a configuration:

server {
    listen 443 ssl;
    server_name blinko.yourdomain.com;

    ssl_certificate /path/to/cert.pem;
    ssl_certificate_key /path/to/key.pem;

    location / {
        proxy_pass http://localhost:1111;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
    }
}

The WebSocket headers (Upgrade and Connection) are required for Blinko's real-time features.

After setting up DNS to point to your server, access Blinko via https://blinko.yourdomain.com.

Troubleshooting Common Issues

Container won't start

Check logs:

docker-compose logs blinko-website

Common causes:

  • Database not ready yet (wait for healthcheck)
  • Invalid DATABASE_URL format
  • Port 1111 already in use

AI connection fails

If using Ollama inside Docker:

  • Use host.docker.internal instead of localhost on Mac/Windows
  • On Linux, use your actual machine IP or add --network host to the Ollama container

Verify Ollama is running:

curl http://localhost:11434/api/tags

The embedding index needs rebuilding. Go to Settings > AI and click "Force Rebuild".

Slow performance

Check available disk space. PostgreSQL struggles when storage runs low.

Also verify Docker resource allocation. On Docker Desktop, increase memory to at least 4GB.

Final Thoughts

Blinko fills a gap in the note-taking landscape. It gives you Notion-like features with Obsidian-style privacy, all wrapped in an AI-powered interface.

The Blinko/Notes dual system takes adjustment. Start by dumping everything into Blinkos, then promote important content during weekly reviews.

AI features require setup effort but pay off quickly. Natural language search eliminates the "where did I put that?" problem that plagues large note collections.

For self-hosters concerned about data ownership, Blinko checks every box. Your notes stay on your hardware. AI processing can happen entirely locally with Ollama.

]]>
<![CDATA[What is Janitor AI? Features & how it works in 2026]]>https://roundproxies.com/blog/janitor-ai/6966d54726f439f88a95b224Tue, 13 Jan 2026 23:36:27 GMTJanitor AI is a chatbot platform that lets users create and interact with customizable AI characters for roleplay, storytelling, and personalized conversations. Unlike traditional chatbots designed for customer service or automation, this platform focuses on character-driven experiences with deep personality customization.

Launched in June 2023 by Jan Zoltkowski, the platform attracted over one million users within its first week. Today it remains one of the most popular AI character platforms, particularly among creative communities and roleplay enthusiasts. According to HackerNoon, approximately 70% of users are women, highlighting the platform's welcoming and inclusive environment.

This guide covers everything you need to know, including how it works, key features, pricing options, safety considerations, and practical tips for getting started.

Why Janitor AI Matters

The platform fills a unique gap in the AI chatbot space. Most chatbots follow rigid scripts or focus on task-based interactions. This platform instead prioritizes creative expression and emotional engagement.

Users can design virtual personas with specific backstories, speaking styles, and personality traits. This makes the platform valuable for writers testing dialogue, language learners practicing conversation, and anyone seeking more expressive AI interactions.

The platform's flexibility also appeals to developers and creators who want character-driven experiences without building everything from scratch. You can have meaningful, extended conversations that feel more human than typical AI interactions.

Content creators use it for brainstorming story ideas and refining character voices. Some educators explore it as a tool for language practice through low-stakes roleplay scenarios. The applications extend beyond simple entertainment.

How Janitor AI Works

The platform functions as an interface layer that connects to large language models (LLMs). It doesn't run its own advanced AI model. Instead, it manages character settings, conversation memory, and user preferences while external models handle response generation.

The process works like this: you provide input through the chat interface. The system sends that input along with your character settings to the connected LLM. The model generates a response, and the interface shapes how that response appears based on your configuration.

Users can connect several different model options:

JanitorLLM (Beta) provides free basic chat functionality. It works well for casual conversations but has limitations compared to more advanced models. This option requires no setup or payment.

OpenAI GPT models connect through API keys. These offer higher quality responses but require payment based on token usage. Most users find GPT-4 variants produce the most natural conversations.

Community and local models like KoboldAI give users more control and can run without ongoing costs once configured. These require more technical setup but eliminate recurring expenses.

Response quality depends entirely on which model you select. The platform controls the character behavior and interface, not the underlying intelligence driving the conversation.

Key Features of Janitor AI

Character Customization

The standout feature is deep character customization. You can define personality traits, dialogue tone, backstories, and behavioral rules without writing any code.

Character definitions use natural language. Describe how your character should act, what topics they care about, and how they should respond to different situations. The platform translates these instructions into consistent character behavior.

You can also upload character images, add tags for discovery, and choose content ratings for your creations.

Immersive Mode

Immersive Mode changes how conversations flow. When enabled, it removes certain interface elements like message editing and deletion. This keeps focus on the conversation and helps maintain narrative continuity.

The mode also enables text streaming in many setups. Instead of responses appearing all at once, you see them generate word by word. This mimics real-time typing and makes interactions feel more natural.

To enable Immersive Mode, open a chat and check the settings menu (usually top right). Toggle the option on, and messages will flow without editing controls.

Text Streaming

Text streaming displays AI responses as they generate rather than waiting for the complete response. The effect looks like the AI is typing in real time.

This feature improves pacing for longer conversations and makes the experience feel more dynamic. It also helps you follow the AI's reasoning as responses develop.

Not every configuration supports streaming. If you notice responses appearing all at once, check whether your current model and API settings support this feature.

Privacy Controls

Conversations stay private by default. Your chats aren't visible to others unless you manually toggle the "Make Chat Public" option.

The platform processes conversations through the connected LLM and stores chat data for continuity. However, you control visibility settings for each conversation.

Janitor AI Pricing

The platform operates on a flexible pricing structure. Your costs depend on which model you choose and how you configure your setup.

Free Options

JanitorLLM Beta costs nothing to use. Select "Janitor LLM" in your API settings and you can chat without any charges. The free model works for basic conversations but has limitations on response quality and consistency.

Some community proxies and reverse proxy setups also provide free access to alternative models. Popular free options include certain Gemini variants, Deepseek models, and open-source alternatives available through OpenRouter.

Paid Options

Connecting premium models through API keys involves costs based on token usage. Here's what typical pricing looks like:

Model Input Cost (per 1M tokens) Output Cost (per 1M tokens)
GPT-3.5 Turbo $3.00 $6.00
GPT-4 $30.00 $60.00
GPT-4 Turbo $10.00 $30.00
GPT-4o $5.00 $15.00
Claude Sonnet 3.7 $10.00 $20.00

OpenAI offers a $5 trial credit that covers approximately 500 messages for new accounts. After that, usage bills to your payment method based on actual token consumption.

Local Hosting

KoboldAI and similar tools can run locally if you have adequate hardware. This eliminates ongoing costs once configured, though setup requires technical knowledge and appropriate GPU resources (typically 6-8 GB VRAM minimum).

How to Use Janitor AI

Getting started takes just a few minutes. Here's the basic process:

Step 1: Create an Account

Visit the Janitor AI website and click "Register." Enter your email and create a password, or sign up through Google, Discord, or X (Twitter).

Step 2: Configure Your Profile

In Settings, add a name and brief description about yourself. This helps AI characters understand how to interact with you.

Step 3: Browse Characters

Explore the character library using search, tags, or featured/trending sections. Click any character to view their profile, backstory, and traits.

Step 4: Start Chatting

Select a character and click "Start a new chat." Type your first message to begin the conversation. Many characters provide an opening scenario to help you jump in.

Step 5: Create Your Own Character

Click "Create a Character" to build a custom AI persona. Add an image, name, bio, personality definition, and example dialogues. Choose content ratings and publish when ready.

Setting Up API Connections

For better response quality, you can connect external models through API keys.

OpenAI Setup

  1. Create an account at platform.openai.com
  2. Navigate to Dashboard > API Keys
  3. Click "Create new secret key"
  4. Name your key and leave permissions on "All"
  5. Copy the generated key (you won't see it again)
  6. In Janitor AI, go to API Settings
  7. Select OpenAI and paste your key
  8. Choose your preferred model and save

Using OpenRouter for Alternative Models

OpenRouter provides access to many models through a single interface:

  1. Create a free OpenRouter account
  2. Generate an API key from the settings page
  3. Browse available models and copy the model name
  4. In Janitor AI API Settings, select "Proxy"
  5. Click "Add Configuration"
  6. Enter a name, paste the model name, set the URL to the OpenRouter endpoint, and add your API key
  7. Test the connection and activate if successful

This approach lets you try models like Deepseek, Gemini variants, or open-source options without separate accounts for each.

Is Janitor AI Safe?

The platform takes reasonable privacy precautions, but users should understand the limitations.

What the Platform Does Right

Conversations are private by default. Nothing gets shared unless you explicitly enable public visibility. The platform uses encryption for data transmission.

Considerations for Users

Janitor AI remains in beta, which means stability and security features continue developing. The platform isn't designed for enterprise-grade security or compliance requirements.

When connecting external models like OpenAI or Claude, you must follow those providers' terms of service. Using their models for content they prohibit (like explicit roleplay) can result in API account bans.

Best Practices

Avoid sharing sensitive personal information in conversations. Even private chats pass through external systems.

Use your own API keys rather than shared community keys when possible. This gives you more control over access and costs.

Stick to official sources and trusted community tools. Random proxies or keys from unverified sources can expose your data or steal credentials.

Review privacy policies to understand how your data gets stored and processed.

Janitor AI vs Traditional Chatbots

Understanding the differences helps you decide if Janitor AI fits your needs.

Feature Traditional Chatbots Janitor AI
Response Style Predefined scripts Dynamic, AI-generated
Personalization Basic, rule-based Deep personality settings
Language Understanding Keyword-based Full natural language
Primary Use Cases Customer service, FAQs Roleplay, creative writing, entertainment
Setup Requirements Manual conversation flows Character definition through natural language

Traditional chatbots excel when you need predictable, controlled responses for specific tasks. Janitor AI works better when you want flexible, human-like interactions that adapt to context.

The platform isn't designed for business automation, data processing, or customer support workflows. Its strength lies in creative and conversational applications.

Common Issues and Troubleshooting

Server Status Problems

Janitor AI sometimes experiences downtime or performance issues due to high traffic. Check the official Discord or social media for status updates before assuming something is broken on your end.

API Key Errors

If responses fail to generate, verify your API key is correct and active. Keys can expire or get revoked, especially if usage exceeds limits or violates terms.

Configuration Issues

Incorrect settings cause many problems. Double-check that your model selection matches your API key provider. Verify endpoint URLs if using proxies or alternative models.

Model Performance

JanitorLLM Beta can produce repetitive or lower-quality responses during peak times. Connecting a paid model usually improves consistency and creativity.

Contact Support

For persistent issues, reach out to support@janitorai.com with details about your problem and configuration.

Alternatives to Janitor AI

If Janitor AI doesn't meet your needs, several alternatives exist:

Character.AI offers a large library of free characters with strong creative focus and unlimited free chatting. The platform has stricter content filters than Janitor AI.

Replika functions as a personal AI companion with emphasis on emotional support and relationship building. It's more focused on personal connection than roleplay.

SillyTavern provides powerful local control options for users comfortable with technical setup. It works well with various AI models and offers extensive customization.

Chai AI offers mobile-friendly AI chat experiences with good accessibility for casual users.

Each alternative involves tradeoffs in features, content restrictions, pricing, and technical requirements.

Frequently Asked Questions

Is Janitor AI free to use?

Yes, Janitor AI offers free access through JanitorLLM Beta. Set your API settings to "Janitor LLM" and you can create characters and chat without charges. Connecting external models like OpenAI requires paid API access.

Can people see my Janitor AI chats?

No, conversations are private by default. They only become visible if you manually enable the "Make Chat Public" option for specific chats.

What is immersive mode in Janitor AI?

Immersive mode removes interface elements like message editing and enables text streaming. This creates more natural, story-driven conversations without distractions.

How do I get an API key for Janitor AI?

For OpenAI, create an account at platform.openai.com, navigate to API Keys, and generate a new key. Community proxies and reverse proxy setups available through Discord can provide alternative access, though these vary in reliability.

What models work best with Janitor AI?

GPT-4 variants offer the highest quality for most users. Deepseek and Gemini models provide good alternatives at lower costs. JanitorLLM Beta works for casual use when budget is a concern.

Is Janitor AI appropriate for minors?

The platform contains content intended for adults and allows NSFW material in unrestricted modes. Most characters and features target users 18 and older.

Final Thoughts

This platform stands out in the AI chatbot landscape by prioritizing creative expression and character-driven interactions. The platform combines accessible character creation tools with flexible model connections, making it valuable for writers, roleplay enthusiasts, and anyone seeking more personalized AI conversations.

Getting started requires minimal setup for free use, though connecting premium models unlocks significantly better response quality. The platform continues developing, so features and stability should improve over time.

The key advantage over competitors is the balance between accessibility and customization. You can start chatting immediately with the free model, then gradually explore more advanced options as you learn what works best for your use case.

Whether you want to build custom characters for storytelling, practice conversations in a low-pressure environment, or simply explore what character-based AI can do, this platform provides a solid starting point with room to grow.

For users who want proxies to enhance privacy or manage multiple accounts, services like Roundproxies.com offer residential and datacenter proxy options that work well with AI platforms.

]]>
<![CDATA[Python Web Scraping 2026: Step-by-step tutorial]]>https://roundproxies.com/blog/web-scraping-python/6952657226f439f88a95ac57Tue, 13 Jan 2026 11:11:35 GMTWant to automatically extract data from websites without copying and pasting for hours? Web scraping with Python lets you do exactly that.

Python remains the go-to language for scraping because of its readable syntax and massive library ecosystem. Whether you're collecting product prices, monitoring competitor websites, or building datasets for machine learning, this guide has you covered.

In this guide, you'll learn how to build scrapers that work on both static HTML pages and JavaScript-heavy dynamic sites. We'll cover multiple approaches from simple HTTP requests to full browser automation.

What Is Web Scraping with Python?

Web scraping with Python involves writing automated scripts that fetch web pages, parse their HTML content, and extract specific data elements. The process mimics what a human does when browsing—visiting URLs, reading content, and copying information—but at machine speed.

Python dominates the scraping landscape because it offers libraries for every step of the process. You can send HTTP requests with requests, parse HTML with BeautifulSoup, control browsers with Playwright or Selenium, and build full crawlers with Scrapy.

The scraping workflow breaks down into four stages. First, you connect to the target page. Second, you retrieve its HTML content. Third, you locate and extract the data you need. Fourth, you store that data in a usable format like CSV or JSON.

Static sites return complete HTML from the server. Dynamic sites load content through JavaScript after the initial page load. These two types require different scraping approaches.

Setting Up Your Python Scraping Environment

Before writing any scraping code, you need a properly configured Python environment. This prevents library conflicts and keeps your projects organized.

Prerequisites

You need Python 3.8 or newer installed on your system. Check your version by running this command in your terminal:

python3 --version

You should see output like Python 3.11.4 or similar. If Python isn't installed, download it from the official Python website.

Creating a Virtual Environment

Virtual environments isolate your project dependencies. Create one with these commands:

mkdir python-scraper
cd python-scraper
python3 -m venv venv

Activate the virtual environment. On macOS and Linux:

source venv/bin/activate

On Windows:

venv\Scripts\activate

Your terminal prompt should now show (venv) indicating the environment is active. All packages you install will stay within this environment.

Installing Core Libraries

Install the essential scraping libraries with pip:

pip install requests beautifulsoup4 lxml playwright selenium scrapy

This installs everything you need for both static and dynamic scraping. We'll cover each library in detail throughout this guide.

Method 1: Scraping Static Sites with Requests and Beautiful Soup

For static HTML pages, the combination of requests and BeautifulSoup provides the fastest and simplest approach. This method works when the data you want exists in the initial HTML response from the server.

Understanding Static vs Dynamic Content

Open your browser's developer tools and view the page source. If you can see the data you want in that raw HTML, the page is static.

Scrape dynmiac or static pages with Python

If the source shows empty containers or JavaScript variables, the page loads content dynamically.

Making HTTP Requests

The requests library handles HTTP communication. Here's how to fetch a webpage:

import requests

url = "https://quotes.toscrape.com/"
response = requests.get(url)

print(response.status_code)  # Should be 200
print(response.text[:500])   # First 500 characters of HTML

The get() method sends an HTTP GET request to the specified URL. The response object contains the status code, headers, and the page content in the text attribute.

Get Requests with Python

Always check the status code before parsing. A 200 means success. Codes like 403 or 429 indicate blocking or rate limiting.

Adding Request Headers

Websites can detect scrapers by examining request headers. Mimic a real browser by setting appropriate headers:

import requests

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
    "Accept-Language": "en-US,en;q=0.5",
    "Accept-Encoding": "gzip, deflate, br",
    "Connection": "keep-alive",
}

url = "https://quotes.toscrape.com/"
response = requests.get(url, headers=headers)

The User-Agent header tells the server which browser you're using. Using a current Chrome or Firefox user agent reduces the chance of being blocked.

Parsing HTML with Beautiful Soup

Beautiful Soup transforms raw HTML text into a navigable tree structure. You can then search for elements using tag names, CSS classes, or CSS selectors.

from bs4 import BeautifulSoup

html = response.text
soup = BeautifulSoup(html, "lxml")

# Find elements by tag
all_divs = soup.find_all("div")

# Find by class
quotes = soup.find_all("div", class_="quote")

# Find by CSS selector
authors = soup.select(".author")

The lxml parser offers better performance than Python's built-in html.parser. Install it with pip install lxml if you haven't already.

Extracting Data from Elements

Once you've located elements, extract their text content or attribute values:

from bs4 import BeautifulSoup

soup = BeautifulSoup(html, "lxml")

# Get text content
quote_element = soup.select_one(".text")
quote_text = quote_element.get_text(strip=True)

# Get attribute value
link_element = soup.select_one("a")
href = link_element.get("href")

# Get all text from nested elements
full_text = soup.get_text(separator=" ", strip=True)

The get_text() method extracts visible text from an element and its children.

Extract text from website with Python

The strip=True parameter removes leading and trailing whitespace.

Complete Static Scraping Example

Here's a full working scraper that extracts quotes from the practice site:

import requests
from bs4 import BeautifulSoup
import csv

def scrape_quotes():
    """Scrape all quotes from quotes.toscrape.com"""
    
    base_url = "https://quotes.toscrape.com"
    all_quotes = []
    page = 1
    
    while True:
        # Build URL for current page
        url = f"{base_url}/page/{page}/"
        
        # Fetch the page
        response = requests.get(url)
        
        # Check if page exists
        if response.status_code != 200:
            break
            
        # Parse HTML
        soup = BeautifulSoup(response.text, "lxml")
        
        # Find all quote containers
        quote_elements = soup.select(".quote")
        
        # Stop if no quotes found
        if not quote_elements:
            break
        
        # Extract data from each quote
        for quote in quote_elements:
            text = quote.select_one(".text").get_text(strip=True)
            author = quote.select_one(".author").get_text(strip=True)
            tags = [tag.get_text() for tag in quote.select(".tag")]
            
            all_quotes.append({
                "text": text,
                "author": author,
                "tags": ", ".join(tags)
            })
        
        print(f"Scraped page {page}: {len(quote_elements)} quotes")
        page += 1
    
    return all_quotes

# Run the scraper
quotes = scrape_quotes()
print(f"Total quotes collected: {len(quotes)}")

This scraper automatically handles pagination by incrementing the page number until no more quotes appear. The while loop continues until either the server returns an error or no quote elements exist on the page.

Method 2: Scraping Dynamic Sites with Playwright

Modern websites often load content through JavaScript after the initial page render. Traditional HTTP clients can't execute JavaScript, so you need a browser automation tool.

Playwright, developed by Microsoft, provides fast and reliable browser automation. It supports Chromium, Firefox, and WebKit, offering cross-browser compatibility.

Installing Playwright

Install Playwright and download the browser binaries:

pip install playwright
playwright install

The second command downloads Chromium, Firefox, and WebKit browsers that Playwright will control.

Basic Playwright Usage

Playwright launches a real browser and lets you interact with pages programmatically:

from playwright.sync_api import sync_playwright

def scrape_dynamic_page():
    with sync_playwright() as p:
        # Launch browser in headless mode
        browser = p.chromium.launch(headless=True)
        
        # Create new page
        page = browser.new_page()
        
        # Navigate to URL
        page.goto("https://quotes.toscrape.com/scroll")
        
        # Wait for content to load
        page.wait_for_selector(".quote")
        
        # Get page content
        content = page.content()
        print(content[:1000])
        
        # Close browser
        browser.close()

scrape_dynamic_page()

The headless=True parameter runs the browser without a visible window. Set it to False during development to watch your scraper work.

Waiting for Dynamic Content

Dynamic pages require explicit waits. The content might not exist immediately after navigation completes.

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch(headless=True)
    page = browser.new_page()
    
    page.goto("https://quotes.toscrape.com/scroll")
    
    # Wait for specific element
    page.wait_for_selector(".quote", timeout=10000)
    
    # Or wait for network to be idle
    page.wait_for_load_state("networkidle")
    
    # Extract data
    quotes = page.locator(".quote").all()
    
    for quote in quotes:
        text = quote.locator(".text").text_content()
        author = quote.locator(".author").text_content()
        print(f"{author}: {text[:50]}...")
    
    browser.close()

The timeout parameter specifies how long to wait in milliseconds. Increase it for slow-loading pages.

Handling Infinite Scroll

Some sites load more content as you scroll down. Automate scrolling to trigger these loads:

from playwright.sync_api import sync_playwright
import time

def scrape_infinite_scroll():
    with sync_playwright() as p:
        browser = p.chromium.launch(headless=True)
        page = browser.new_page()
        
        page.goto("https://quotes.toscrape.com/scroll")
        
        # Scroll to load all content
        previous_height = 0
        while True:
            # Scroll to bottom
            page.evaluate("window.scrollTo(0, document.body.scrollHeight)")
            
            # Wait for new content
            time.sleep(2)
            
            # Get new height
            new_height = page.evaluate("document.body.scrollHeight")
            
            # Break if no new content loaded
            if new_height == previous_height:
                break
            
            previous_height = new_height
        
        # Now extract all loaded quotes
        quotes = page.locator(".quote").all()
        print(f"Found {len(quotes)} quotes after scrolling")
        
        browser.close()

scrape_infinite_scroll()

This script scrolls until the page height stops increasing, indicating all content has loaded.

Complete Playwright Scraping Example

Here's a production-ready scraper for dynamic content:

from playwright.sync_api import sync_playwright
import json
import time

def scrape_dynamic_quotes():
    """Scrape quotes from dynamic JavaScript-rendered page"""
    
    all_quotes = []
    
    with sync_playwright() as p:
        browser = p.chromium.launch(headless=True)
        
        # Set viewport size
        page = browser.new_page(viewport={"width": 1920, "height": 1080})
        
        # Navigate to page
        page.goto("https://quotes.toscrape.com/scroll")
        
        # Wait for initial quotes
        page.wait_for_selector(".quote")
        
        # Scroll to load all quotes
        for _ in range(10):  # Maximum 10 scroll attempts
            page.evaluate("window.scrollTo(0, document.body.scrollHeight)")
            time.sleep(1)
        
        # Extract quotes using Playwright locators
        quote_elements = page.locator(".quote").all()
        
        for element in quote_elements:
            text = element.locator(".text").text_content()
            author = element.locator(".author").text_content()
            
            # Clean the text
            text = text.strip().strip('"').strip('"')
            author = author.strip()
            
            all_quotes.append({
                "text": text,
                "author": author
            })
        
        browser.close()
    
    return all_quotes

# Run scraper
quotes = scrape_dynamic_quotes()

# Save to JSON
with open("quotes.json", "w", encoding="utf-8") as f:
    json.dump(quotes, f, indent=2, ensure_ascii=False)

print(f"Saved {len(quotes)} quotes to quotes.json")

Method 3: Browser Automation with Selenium

Selenium has been the industry standard for browser automation for over a decade. While Playwright is newer and often faster, Selenium remains widely used and well-documented.

Installing Selenium

Install Selenium with pip:

pip install selenium

Modern Selenium (version 4.6+) automatically downloads the correct browser driver. You just need Chrome or Firefox installed on your system.

Basic Selenium Usage

Selenium controls browsers through a WebDriver interface:

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By

def basic_selenium_scrape():
    # Configure Chrome options
    options = Options()
    options.add_argument("--headless")  # Run without GUI
    
    # Initialize driver
    driver = webdriver.Chrome(options=options)
    
    try:
        # Navigate to page
        driver.get("https://quotes.toscrape.com/")
        
        # Find elements
        quotes = driver.find_elements(By.CSS_SELECTOR, ".quote")
        
        for quote in quotes:
            text = quote.find_element(By.CSS_SELECTOR, ".text").text
            author = quote.find_element(By.CSS_SELECTOR, ".author").text
            print(f"{author}: {text[:50]}...")
    
    finally:
        # Always close the driver
        driver.quit()

basic_selenium_scrape()

The try/finally block ensures the browser closes even if an error occurs. Failing to close browsers leads to resource leaks.

Explicit Waits in Selenium

Selenium's explicit waits handle dynamic content loading:

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

def selenium_with_waits():
    options = Options()
    options.add_argument("--headless")
    
    driver = webdriver.Chrome(options=options)
    
    try:
        driver.get("https://quotes.toscrape.com/scroll")
        
        # Wait up to 10 seconds for quotes to appear
        wait = WebDriverWait(driver, 10)
        wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, ".quote")))
        
        quotes = driver.find_elements(By.CSS_SELECTOR, ".quote")
        print(f"Found {len(quotes)} quotes")
        
    finally:
        driver.quit()

selenium_with_waits()

Expected conditions include presence_of_element_located, visibility_of_element_located, element_to_be_clickable, and many more.

Interacting with Page Elements

Selenium can fill forms, click buttons, and simulate keyboard input:

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys

def selenium_interactions():
    options = Options()
    options.add_argument("--headless")
    
    driver = webdriver.Chrome(options=options)
    
    try:
        driver.get("https://quotes.toscrape.com/login")
        
        # Find and fill username field
        username = driver.find_element(By.ID, "username")
        username.send_keys("testuser")
        
        # Find and fill password field
        password = driver.find_element(By.ID, "password")
        password.send_keys("testpass")
        
        # Submit the form
        password.send_keys(Keys.RETURN)
        
        # Wait for redirect
        import time
        time.sleep(2)
        
        print(f"Current URL: {driver.current_url}")
        
    finally:
        driver.quit()

selenium_interactions()

Method 4: Using Scrapy for Large-Scale Scraping

Scrapy is a complete web crawling framework designed for extracting data from websites at scale. Unlike simple scripts, Scrapy handles concurrency, rate limiting, and data pipelines automatically.

Installing Scrapy

Install Scrapy with pip:

pip install scrapy

Creating a Scrapy Project

Scrapy uses a project structure. Create one with the command line tool:

scrapy startproject quotescraper
cd quotescraper
scrapy genspider quotes quotes.toscrape.com

This creates a directory structure with configuration files and a spider template.

Writing a Scrapy Spider

Spiders define how to crawl pages and extract data. Edit the generated spider file:

# quotescraper/spiders/quotes.py

import scrapy

class QuotesSpider(scrapy.Spider):
    name = "quotes"
    start_urls = ["https://quotes.toscrape.com/"]
    
    def parse(self, response):
        # Extract quotes from current page
        for quote in response.css(".quote"):
            yield {
                "text": quote.css(".text::text").get(),
                "author": quote.css(".author::text").get(),
                "tags": quote.css(".tag::text").getall()
            }
        
        # Follow pagination links
        next_page = response.css("li.next a::attr(href)").get()
        if next_page:
            yield response.follow(next_page, self.parse)

The yield statement returns extracted data and continues crawling. Scrapy handles the asynchronous execution behind the scenes.

Running the Spider

Run your spider from the command line:

scrapy crawl quotes -o quotes.json

The -o flag specifies the output file. Scrapy supports JSON, CSV, and XML formats natively.

Configuring Scrapy Settings

Adjust settings in settings.py to control crawling behavior:

# quotescraper/settings.py

# Respect robots.txt
ROBOTSTXT_OBEY = True

# Add delay between requests
DOWNLOAD_DELAY = 1

# Limit concurrent requests
CONCURRENT_REQUESTS = 8

# Set user agent
USER_AGENT = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) Chrome/120.0.0.0"

# Enable auto-throttle
AUTOTHROTTLE_ENABLED = True
AUTOTHROTTLE_START_DELAY = 1
AUTOTHROTTLE_MAX_DELAY = 10

The auto-throttle feature automatically adjusts request rates based on server response times. This helps avoid overloading target servers.

Handling Anti-Bot Protection

Websites increasingly deploy anti-bot measures to block scrapers. Understanding these defenses helps you build more resilient scrapers.

Common Anti-Bot Techniques

Rate limiting restricts how many requests an IP address can make within a time period. Exceed the limit and you'll receive 429 errors or temporary blocks.

JavaScript challenges require browsers to execute scripts before accessing content. Simple HTTP clients fail these checks immediately.

CAPTCHAs ask users to solve puzzles proving they're human. These appear after suspicious activity detection.

Fingerprinting analyzes browser characteristics like screen resolution, installed fonts, and WebGL capabilities. Headless browsers often have detectable fingerprints.

Rotating User Agents

Vary your User-Agent header to appear as different browsers:

import random
import requests

USER_AGENTS = [
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 Chrome/120.0.0.0 Safari/537.36",
    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 Chrome/120.0.0.0 Safari/537.36",
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:121.0) Gecko/20100101 Firefox/121.0",
    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 Safari/605.1.15",
]

def get_random_headers():
    return {
        "User-Agent": random.choice(USER_AGENTS),
        "Accept": "text/html,application/xhtml+xml",
        "Accept-Language": "en-US,en;q=0.9",
    }

response = requests.get(url, headers=get_random_headers())

Adding Request Delays

Avoid triggering rate limits by spacing out requests:

import time
import random

def respectful_scrape(urls):
    results = []
    
    for url in urls:
        response = requests.get(url, headers=get_random_headers())
        results.append(response.text)
        
        # Random delay between 1 and 3 seconds
        delay = random.uniform(1, 3)
        time.sleep(delay)
    
    return results

Random delays appear more natural than fixed intervals.

Handling Request Failures

Build retry logic for transient failures:

import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry

def create_session_with_retries():
    session = requests.Session()
    
    retries = Retry(
        total=3,
        backoff_factor=1,
        status_forcelist=[429, 500, 502, 503, 504]
    )
    
    adapter = HTTPAdapter(max_retries=retries)
    session.mount("http://", adapter)
    session.mount("https://", adapter)
    
    return session

session = create_session_with_retries()
response = session.get(url)

The backoff factor increases wait time between retries exponentially.

Using Proxies for Web Scraping

Proxies route your requests through different IP addresses. This helps avoid IP-based blocks and rate limits.

Types of Proxies

Datacenter proxies come from cloud servers. They're fast and cheap but easily detected.

Residential proxies use IP addresses from real ISP customers. They appear legitimate but cost more.

Mobile proxies route through cellular networks. They're hardest to detect but most expensive.

Rotating proxies automatically switch IPs for each request. This provides the best protection against blocks.

If you need reliable proxies for your scraping projects, providers like Roundproxies.com offer residential, datacenter, ISP, and mobile proxy options with automatic rotation.

Implementing Proxy Rotation

Here's how to rotate proxies with requests:

import requests
import random

PROXIES = [
    "http://user:pass@proxy1.example.com:8080",
    "http://user:pass@proxy2.example.com:8080",
    "http://user:pass@proxy3.example.com:8080",
]

def get_with_proxy(url):
    proxy = random.choice(PROXIES)
    proxies = {
        "http": proxy,
        "https": proxy
    }
    
    response = requests.get(url, proxies=proxies, timeout=30)
    return response

response = get_with_proxy("https://example.com")

Using Proxies with Playwright

Playwright supports proxy configuration at browser launch:

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch(
        headless=True,
        proxy={
            "server": "http://proxy.example.com:8080",
            "username": "user",
            "password": "pass"
        }
    )
    
    page = browser.new_page()
    page.goto("https://example.com")
    
    browser.close()

Using Proxies with Selenium

Configure Selenium to use proxies through Chrome options:

from selenium import webdriver
from selenium.webdriver.chrome.options import Options

def selenium_with_proxy(proxy_url):
    options = Options()
    options.add_argument("--headless")
    options.add_argument(f"--proxy-server={proxy_url}")
    
    driver = webdriver.Chrome(options=options)
    return driver

driver = selenium_with_proxy("http://proxy.example.com:8080")
driver.get("https://example.com")
driver.quit()

Storing Scraped Data

Once you've extracted data, you need to store it in a useful format. The choice depends on data structure and intended use.

Saving to CSV

CSV works well for tabular data with consistent fields:

import csv

def save_to_csv(data, filename):
    if not data:
        return
    
    fieldnames = data[0].keys()
    
    with open(filename, "w", newline="", encoding="utf-8") as f:
        writer = csv.DictWriter(f, fieldnames=fieldnames)
        writer.writeheader()
        writer.writerows(data)

quotes = [
    {"text": "Quote 1", "author": "Author 1"},
    {"text": "Quote 2", "author": "Author 2"}
]

save_to_csv(quotes, "quotes.csv")

Saving to JSON

JSON preserves nested structures and mixed data types:

import json

def save_to_json(data, filename):
    with open(filename, "w", encoding="utf-8") as f:
        json.dump(data, f, indent=2, ensure_ascii=False)

quotes = [
    {"text": "Quote 1", "author": "Author 1", "tags": ["wisdom", "life"]},
    {"text": "Quote 2", "author": "Author 2", "tags": ["humor"]}
]

save_to_json(quotes, "quotes.json")

Saving to SQLite Database

Databases handle large datasets and enable complex queries:

import sqlite3

def save_to_sqlite(data, db_name, table_name):
    conn = sqlite3.connect(db_name)
    cursor = conn.cursor()
    
    # Create table
    cursor.execute(f"""
        CREATE TABLE IF NOT EXISTS {table_name} (
            id INTEGER PRIMARY KEY AUTOINCREMENT,
            text TEXT,
            author TEXT,
            tags TEXT
        )
    """)
    
    # Insert data
    for item in data:
        cursor.execute(
            f"INSERT INTO {table_name} (text, author, tags) VALUES (?, ?, ?)",
            (item["text"], item["author"], ", ".join(item.get("tags", [])))
        )
    
    conn.commit()
    conn.close()

save_to_sqlite(quotes, "quotes.db", "quotes")

Common Mistakes and How to Avoid Them

Years of web scraping with Python experience reveal patterns in what goes wrong. Learning from these mistakes saves hours of debugging.

Not Handling Errors Gracefully

Network requests fail. Servers go down. Elements disappear. Wrap critical code in try/except blocks:

import requests
from bs4 import BeautifulSoup

def safe_scrape(url):
    try:
        response = requests.get(url, timeout=30)
        response.raise_for_status()
        
        soup = BeautifulSoup(response.text, "lxml")
        title = soup.select_one("h1")
        
        return title.get_text() if title else None
        
    except requests.RequestException as e:
        print(f"Request failed: {e}")
        return None
    except Exception as e:
        print(f"Parsing failed: {e}")
        return None

Ignoring robots.txt

The robots.txt file specifies which pages scrapers should avoid. Respecting it demonstrates good citizenship:

from urllib.robotparser import RobotFileParser

def can_scrape(url, user_agent="*"):
    parser = RobotFileParser()
    
    # Find robots.txt URL
    from urllib.parse import urlparse
    parsed = urlparse(url)
    robots_url = f"{parsed.scheme}://{parsed.netloc}/robots.txt"
    
    parser.set_url(robots_url)
    parser.read()
    
    return parser.can_fetch(user_agent, url)

if can_scrape("https://example.com/page"):
    # Proceed with scraping
    pass

Scraping Too Aggressively

Hammering servers with requests can get you blocked and may even cause legal issues. Always add delays and respect rate limits.

Not Validating Extracted Data

Empty or malformed data causes downstream problems. Validate before saving:

def validate_quote(quote):
    if not quote.get("text"):
        return False
    if not quote.get("author"):
        return False
    if len(quote["text"]) < 10:
        return False
    return True

valid_quotes = [q for q in quotes if validate_quote(q)]

Hardcoding Selectors

Websites change their HTML structure frequently. Make selectors configurable:

SELECTORS = {
    "quote_container": ".quote",
    "quote_text": ".text",
    "quote_author": ".author",
    "next_page": "li.next a"
}

def scrape_with_config(soup, selectors):
    quotes = []
    
    for container in soup.select(selectors["quote_container"]):
        text_el = container.select_one(selectors["quote_text"])
        author_el = container.select_one(selectors["quote_author"])
        
        if text_el and author_el:
            quotes.append({
                "text": text_el.get_text(strip=True),
                "author": author_el.get_text(strip=True)
            })
    
    return quotes

Advanced Techniques for 2026

The scraping landscape continues to evolve. Here are cutting-edge approaches that will define web scraping with Python in 2026 and beyond.

Async Scraping with HTTPX

HTTPX offers async capabilities that dramatically speed up scraping multiple pages:

import httpx
import asyncio
from bs4 import BeautifulSoup

async def fetch_page(client, url):
    """Fetch a single page asynchronously"""
    response = await client.get(url)
    return response.text

async def scrape_multiple_pages(urls):
    """Scrape multiple pages concurrently"""
    
    async with httpx.AsyncClient() as client:
        # Create tasks for all URLs
        tasks = [fetch_page(client, url) for url in urls]
        
        # Execute all requests concurrently
        pages = await asyncio.gather(*tasks)
        
        results = []
        for html in pages:
            soup = BeautifulSoup(html, "lxml")
            # Extract data from each page
            title = soup.select_one("h1")
            if title:
                results.append(title.get_text(strip=True))
        
        return results

# Generate URLs for first 10 pages
urls = [f"https://quotes.toscrape.com/page/{i}/" for i in range(1, 11)]

# Run async scraper
results = asyncio.run(scrape_multiple_pages(urls))
print(f"Scraped {len(results)} pages")

Async scraping can be 5-10x faster than synchronous approaches when scraping many pages.

AI-Powered Data Extraction

Large language models can extract structured data from messy HTML without brittle CSS selectors:

import openai
from bs4 import BeautifulSoup

def extract_with_ai(html, prompt):
    """Use AI to extract structured data from HTML"""
    
    # Clean HTML to reduce tokens
    soup = BeautifulSoup(html, "lxml")
    text = soup.get_text(separator="\n", strip=True)
    
    response = openai.ChatCompletion.create(
        model="gpt-4",
        messages=[
            {"role": "system", "content": "Extract structured data from the provided text."},
            {"role": "user", "content": f"{prompt}\n\nText:\n{text[:4000]}"}
        ]
    )
    
    return response.choices[0].message.content

# Example usage
prompt = "Extract all product names and prices as JSON"
result = extract_with_ai(html, prompt)

This approach handles layout changes gracefully since it focuses on semantic meaning rather than DOM structure.

Headless Browser Stealth Mode

Modern anti-bot systems detect headless browsers through various fingerprinting techniques. Use stealth plugins to appear more human:

from playwright.sync_api import sync_playwright
from playwright_stealth import stealth_sync

def stealth_scrape(url):
    """Scrape with stealth mode to avoid detection"""
    
    with sync_playwright() as p:
        browser = p.chromium.launch(headless=True)
        page = browser.new_page()
        
        # Apply stealth settings
        stealth_sync(page)
        
        # Now navigate
        page.goto(url)
        
        content = page.content()
        browser.close()
        
        return content

Install the stealth plugin with pip install playwright-stealth.

Building Resilient Scrapers

Production scrapers need monitoring and automatic recovery:

import logging
import time
from datetime import datetime

# Configure logging
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(levelname)s - %(message)s',
    handlers=[
        logging.FileHandler('scraper.log'),
        logging.StreamHandler()
    ]
)

class ResilientScraper:
    def __init__(self, max_retries=3, backoff_factor=2):
        self.max_retries = max_retries
        self.backoff_factor = backoff_factor
        self.stats = {
            "success": 0,
            "failed": 0,
            "retries": 0
        }
    
    def scrape_with_retry(self, url, scrape_func):
        """Attempt scraping with exponential backoff"""
        
        for attempt in range(self.max_retries):
            try:
                result = scrape_func(url)
                self.stats["success"] += 1
                logging.info(f"Successfully scraped: {url}")
                return result
                
            except Exception as e:
                self.stats["retries"] += 1
                wait_time = self.backoff_factor ** attempt
                
                logging.warning(
                    f"Attempt {attempt + 1} failed for {url}: {e}. "
                    f"Retrying in {wait_time}s"
                )
                
                time.sleep(wait_time)
        
        self.stats["failed"] += 1
        logging.error(f"All attempts failed for: {url}")
        return None
    
    def get_stats(self):
        return self.stats

Distributed Scraping Architecture

For very large projects, distribute scraping across multiple machines:

# Simple task queue using Redis
import redis
import json

class ScrapingQueue:
    def __init__(self, redis_url="redis://localhost:6379"):
        self.redis = redis.from_url(redis_url)
        self.queue_name = "scraping_tasks"
        self.results_name = "scraping_results"
    
    def add_task(self, url):
        """Add URL to scraping queue"""
        task = {"url": url, "added_at": datetime.now().isoformat()}
        self.redis.lpush(self.queue_name, json.dumps(task))
    
    def get_task(self):
        """Get next URL from queue"""
        task_json = self.redis.rpop(self.queue_name)
        if task_json:
            return json.loads(task_json)
        return None
    
    def save_result(self, url, data):
        """Save scraping result"""
        result = {"url": url, "data": data}
        self.redis.lpush(self.results_name, json.dumps(result))
    
    def pending_count(self):
        """Get count of pending tasks"""
        return self.redis.llen(self.queue_name)

Run multiple worker processes that pull URLs from the shared queue. This scales horizontally across machines.

Comparing Python Scraping Libraries

Here's a quick comparison to help you choose the right tool:

Library Best For Learning Curve Speed Dynamic Content
Requests + BS4 Static sites Easy Fast No
HTTPX Async static scraping Medium Very Fast No
Playwright Dynamic sites, stealth Medium Medium Yes
Selenium Legacy projects, testing Medium Slow Yes
Scrapy Large-scale crawling Hard Fast With plugins

Choose based on your specific requirements. Start simple and add complexity only when needed.

Ethical Scraping Best Practices

Responsible scraping benefits everyone. Follow these guidelines to maintain good relationships with target sites.

Respecting Server Resources

Your scraper impacts real infrastructure. Space out requests to avoid overwhelming servers:

import time
import random

def polite_scraper(urls, min_delay=1, max_delay=3):
    """Scrape with respectful delays"""
    
    for url in urls:
        # Your scraping logic here
        yield scrape_page(url)
        
        # Random delay appears more natural
        delay = random.uniform(min_delay, max_delay)
        time.sleep(delay)

Checking Terms of Service

Many websites explicitly address scraping in their terms. Review these before starting any project. Some sites offer official APIs that provide cleaner data access.

Identifying Your Scraper

Include contact information in your User-Agent so site owners can reach you:

USER_AGENT = "MyScraperBot/1.0 (+https://mysite.com/scraper-info; contact@mysite.com)"

This transparency builds trust and often prevents blocks.

Caching Responses

Don't re-scrape data you already have. Cache responses to reduce server load:

import hashlib
import os
import json

class ResponseCache:
    def __init__(self, cache_dir="cache"):
        self.cache_dir = cache_dir
        os.makedirs(cache_dir, exist_ok=True)
    
    def _get_cache_path(self, url):
        """Generate cache file path from URL"""
        url_hash = hashlib.md5(url.encode()).hexdigest()
        return os.path.join(self.cache_dir, f"{url_hash}.json")
    
    def get(self, url):
        """Retrieve cached response"""
        path = self._get_cache_path(url)
        if os.path.exists(path):
            with open(path, "r") as f:
                return json.load(f)
        return None
    
    def set(self, url, data):
        """Cache response data"""
        path = self._get_cache_path(url)
        with open(path, "w") as f:
            json.dump(data, f)

cache = ResponseCache()

# Check cache before requesting
cached = cache.get(url)
if cached:
    data = cached
else:
    data = scrape_page(url)
    cache.set(url, data)

FAQ

Web scraping with Python exists in a legal gray area. Scraping publicly available data is generally acceptable, but violating a site's terms of service or scraping private information can create legal issues. Always check the target site's terms before scraping.

What's the best Python library for web scraping?

The best tool for web scraping with Python depends on your needs. For static HTML pages, requests plus BeautifulSoup offers the simplest solution. For JavaScript-heavy sites, Playwright provides modern browser automation. Scrapy excels at large-scale crawling projects.

How do I avoid getting blocked while scraping?

Use realistic request headers, add random delays between requests, rotate IP addresses with proxies, and respect the site's robots.txt file. Avoid patterns that distinguish bots from humans.

Can I scrape websites that require login?

Yes. Browser automation tools like Playwright and Selenium can fill login forms and maintain session cookies. For API-based authentication, requests can handle cookies and tokens directly.

How often should I run my scraper?

That depends on how frequently the target data changes. For real-time prices, you might scrape hourly. For static content, weekly or monthly might suffice. Always consider server impact when scheduling frequent scrapes.

What's the difference between web scraping and web crawling?

Scraping extracts specific data from pages you've identified. Crawling discovers pages by following links across a website. Many projects combine both—crawling to find pages, then scraping to extract data.

Conclusion

Web scraping with Python remains one of the most powerful ways to collect data from the internet in 2026. Whether you're building a price monitoring tool, gathering research data, or feeding machine learning models, Python's ecosystem has you covered.

For static sites, start with requests and BeautifulSoup. They're simple, fast, and handle most use cases. When you encounter JavaScript-rendered content, switch to Playwright or Selenium for full browser automation.

Large-scale projects benefit from Scrapy's built-in handling of concurrency, retries, and data pipelines. Its learning curve pays off quickly when you're crawling thousands of pages.

Remember to scrape responsibly. Respect robots.txt files, add delays between requests, and use proxies when needed. Aggressive scraping damages websites and gets your IP banned.

The techniques in this guide form a solid foundation. Practice on the Quotes to Scrape sandbox site until the concepts become second nature. Then apply what you've learned to real-world projects.

]]>