It’s not a tutorial but it’s a simple idea ; )
Repo Github: Link
First thing: Cloudflare is a middleware between you and the target host.
How Cloudflare works: It isolates you before you can access the target. It validates your IP, browser, user-agent, JavaScript, etc.
-
Validate IP: No problem if you use a legit VPN (NordVPN)
-
Browser: I want to automate the process, which can be a problem… or maybe not 😁
-
User-Agent: Very easy. P.S. Using a fake user agent could work, but the validation process might fail (I haven’t tested this yet)
-
Javascript: Easy
I use python for write a simple script.
pip install undetected_chromedriver
It’s a modified Selenium ChromeDriver that makes it undetected.
NEXT
options.add_argument('--disable-blink-features=AutomationControlled')
This is the CORE. This option changes the property navigator.webdriver = undefined
Next, we start Selenium, open the browser, validate it, and get the cookies from the target where it has validated us.
With these cookies, we create a Session with the requests library and we can scrape everything without a problem.
P.S.
headers = {
'User-Agent': driver.execute_script("return navigator.userAgent")
}
Thanks to this, we pass the user-agent from Selenium to the requests library.
AND WE SCRAPE EVERYTHING
XOXO Jate