Learn how to configure Puppeteer to evade Cloudflare defenses and build a reliable scraper using stealth plugins, proxy rotation, and CAPTCHA handling.
Cloudflare is widely used to prevent automated website access, often presenting a barrier to scrapers with JavaScript challenges, fingerprint detection, and CAPTCHAs. Puppeteer, a powerful browser automation tool built by the Chrome team, can mimic real user behavior and execute JavaScript in a full browser environment. In this guide, you'll learn how to configure Puppeteer to evade Cloudflare defenses and build a reliable scraper using stealth plugins, proxy rotation, and CAPTCHA handling.
Puppeteer gives you low-level control over Chromium, but Cloudflare is built to detect exactly that kind of automation. If you're running a basic script without modifying fingerprints, you'll likely get blocked quickly. Cloudflare doesn't rely on one method; it combines several signals to decide whether a session looks real or automated.
One of the first layers is JavaScript execution. Cloudflare uses in-browser challenges to check whether scripts are being evaluated correctly. If Puppeteer runs in headless mode or hasn't been patched with stealth techniques, it can fail these tests. Things like incorrect property values or missing JavaScript execution paths often give it away.
Fingerprinting is another area where Puppeteer needs extra configuration. Cloudflare checks for markers like navigator.webdriver
, a lack of browser plugins, and unusual timezone offsets. A fresh Puppeteer instance in headless mode will expose several immediately unless you're using puppeteer-extra-plugin-stealth
or manually patching the browser environment.
Then there's the TLS fingerprint during the SSL handshake. Most real browsers generate specific JA3/JA4 values when negotiating a secure connection. Depending on how it's launched, Puppeteer may produce a different fingerprint depending on how it handles TLS options. That mismatch can signal automation to Cloudflare before the page even finishes loading.
Cloudflare tracks network-level behavior. If your Puppeteer scraper makes too many requests from the same IP, loads pages too quickly, or skips normal user interactions like scrolling or clicking, you'll hit rate limits or CAPTCHAs. Combining stealth techniques with proxy rotation and human-like timing helps reduce this frequency.
Puppeteer is easy to detect on its own. Out of the box, it sets off multiple red flags like navigator.webdriver
being true
, missing browser plugins, and a fixed viewport that doesn't match normal devices. You'll want to use puppeteer-extra with the stealth plugin to get around that. This combination patches most of the obvious indicators that automation is running.
Start by installing the packages:
npm install puppeteer-extra puppeteer puppeteer-extra-plugin-stealth
Once installed, you can load the plugin like this:
const puppeteer = require('puppeteer-extra');
const StealthPlugin = require('puppeteer-extra-plugin-stealth');
// Apply the stealth plugin, which patches many automation signals like navigator.webdriver, WebGL, plugins, etc.
puppeteer.use(StealthPlugin());
The stealth plugin automatically masks a lot of things that Cloudflare checks. For example, it modifies navigator.webdriver
to return false
, simulates installed plugins, fakes proper language and timezone settings, and even patches WebGL metadata.
You'll also want to randomize your user agent and viewport size manually. These two values are frequently used to fingerprint bots. Here's how you can set them on a new page:
const browser = await puppeteer.launch({ headless: false }); // Headed mode reduces bot detection
const page = await browser.newPage();
// Set a realistic user-agent to match the IP's region and browser version
await page.setUserAgent(
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36'
);
// Randomize viewport slightly to avoid fingerprinting from consistent dimensions
await page.setViewport({
width: Math.floor(1024 + Math.random() * 100),
height: Math.floor(768 + Math.random() * 100),
});
Randomizing values slightly each session helps reduce the chance of being fingerprinted across multiple visits. If the website you're targeting requires authentication or tends to challenge new sessions, you can also preload cookies and session storage. This makes Puppeteer behave more like a returning visitor than a fresh instance every time.
For harder challenges, especially ones that use Cloudflare's Turnstile you might need to run Chromium in non-headless mode. While headless mode has improved recently, some sites still flag it. You can toggle this when launching:
const browser = await puppeteer.launch({
headless: false, // Avoid headless if site fingerprinting is aggressive
args: ['--no-sandbox'], // Required in some CI environments — otherwise omit for local runs
});
Combining these techniques gives you a stronger baseline for bypassing detection. You're not invisible, but you're much less obvious. From here, you can start layering on proxy support and CAPTCHA handling to stay under the radar.
Rotating IP addresses is one of the most effective ways to avoid being flagged when scraping Cloudflare-protected sites. If you send too many requests from a single IP, Cloudflare will start issuing challenges or block you outright. Using a pool of proxies, especially residential ones, makes it harder for detection systems to link traffic back to automation. These can be passed to Puppeteer during browser launch using the --proxy-server
flag.
Here's an example of launching Puppeteer with a single proxy:
const proxy = 'http://user:pass@proxyhost:port'; // Format: http(s)://user:pass@ip:port
const browser = await puppeteer.launch({
headless: false,
args: [`--proxy-server=${proxy}`],
});
To rotate proxies, loop through a list of proxy addresses in your script and launch a new browser instance for each one. For higher-volume scraping, you'll want to carefully manage concurrency and session state to avoid reinitializing the browser too often.
Cloudflare doesn't just block based on IP it also challenges behavior that looks automated. That's where CAPTCHA detection and solving come in. The most common CAPTCHAs used are hCaptcha and Turnstile. You can detect if a CAPTCHA is being shown by checking for specific elements on the page, like iframe sources that include "captcha" or forms that block submission.
Example check for a CAPTCHA:
// Look for iframes likely tied to CAPTCHA providers (Cloudflare, hCaptcha, Turnstile, etc.)
const isCaptcha = await page.$('iframe[src*="captcha"], iframe[src*="turnstile"]');
if (isCaptcha) {
console.log('CAPTCHA triggered');
// You may want to skip, retry with a new proxy, or solve it with a 3rd-party service
}
For solving, you can integrate with services like SoCaptcha or 2Captcha. These services typically take a sitekey and URL, solve the challenge externally, and return a token you inject into the page to move forward.
Once you solve a CAPTCHA or pass Cloudflare's JS checks, persisting session data cookies and local storage is especially helpful. This reduces the chance of running into another challenge immediately after. You can save session state with Puppeteer like this:
// Save session cookies and localStorage for reuse across scraping sessions
const cookies = await page.cookies();
const localStorageData = await page.evaluate(() => {
const data = {};
for (let i = 0; i < localStorage.length; i++) {
const key = localStorage.key(i);
data[key] = localStorage.getItem(key);
}
return data;
});
// You'll want to write these to disk to reuse across runs
// fs.writeFileSync('cookies.json', JSON.stringify(cookies))
// fs.writeFileSync('localStorage.json', JSON.stringify(localStorageData))
Keeping session data allows your scraper to appear more consistent across visits. When combined with rotating IPs and solving CAPTCHAs, it becomes much easier to move through Cloudflare defenses without being flagged repeatedly.
When dealing with complex Cloudflare challenges, especially Turnstile CAPTCHAs, integrating a professional CAPTCHA solving service like SoCaptcha can significantly improve your success rate. SoCaptcha specializes in solving various CAPTCHA types including Cloudflare Turnstile, hCaptcha, and reCAPTCHA.
Here's how to integrate SoCaptcha with your Puppeteer scraper:
const axios = require('axios');
async function solveCaptchaWithSoCaptcha(page, sitekey, pageUrl) {
try {
// Submit CAPTCHA to SoCaptcha
const submitResponse = await axios.post('https://api.socaptcha.com/createTask', {
clientKey: 'YOUR_API_KEY',
task: {
type: 'TurnstileTaskProxyless',
websiteURL: pageUrl,
websiteKey: sitekey
}
});
const taskId = submitResponse.data.taskId;
// Poll for solution
let solution = null;
for (let i = 0; i < 30; i++) {
await new Promise(resolve => setTimeout(resolve, 2000));
const resultResponse = await axios.post('https://api.socaptcha.com/getTaskResult', {
clientKey: 'YOUR_API_KEY',
taskId: taskId
});
if (resultResponse.data.status === 'ready') {
solution = resultResponse.data.solution.token;
break;
}
}
if (solution) {
// Inject the solution token into the page
await page.evaluate((token) => {
const callback = window.turnstileCallback || window.captchaCallback;
if (callback) callback(token);
}, solution);
return true;
}
} catch (error) {
console.error('CAPTCHA solving failed:', error);
}
return false;
}
To maximize your success rate when bypassing Cloudflare with Puppeteer, follow these best practices:
When implementing Cloudflare bypass techniques, avoid these common mistakes:
Bypassing Cloudflare with Puppeteer requires a multi-layered approach combining stealth techniques, proxy rotation, and professional CAPTCHA solving services. While the landscape continues to evolve, the techniques outlined in this guide provide a solid foundation for successful web scraping in 2025.
For the most reliable results, especially when dealing with complex Turnstile challenges, consider using professional services like SoCaptcha that specialize in CAPTCHA solving and provide high success rates with minimal setup.
Need reliable CAPTCHA solving? Try SoCaptcha's API for automated Cloudflare Turnstile, hCaptcha, and reCAPTCHA solving with high success rates and fast response times.
Get started with SoCaptcha's powerful CAPTCHA solving API