undetected-chromedriver/README.md

133 lines
4.3 KiB
Markdown
Raw Normal View History

2019-12-22 06:15:59 -07:00
# undetected_chromedriver
2019-12-22 06:11:07 -07:00
2020-05-12 11:42:15 -06:00
https://github.com/ultrafunkamsterdam/undetected-chromedriver
2019-12-22 06:11:07 -07:00
2020-09-13 03:44:38 -06:00
Optimized Selenium Chromedriver patch which does not trigger anti-bot services like Distill Network / Imperva / DataDome / Botprotect.io
2019-12-22 06:11:07 -07:00
Automatically downloads the driver binary and patches it.
2020-01-02 01:55:42 -07:00
2020-09-01 20:11:20 -06:00
* **Tested on version 75,76,77,78,79,80,81,83,84,85,86**
2020-01-02 01:55:42 -07:00
2020-05-12 11:42:15 -06:00
* **patching also works on MS Edge (chromium-based) webdriver binary**
2019-12-22 06:11:07 -07:00
2020-09-19 10:46:59 -06:00
## fixed botprotect.io / perimeterX ##
<img src="https://i.imgur.com/WO4yA60.png" width="400">
<img src="https://i.imgur.com/62dpHG9.png" width="400">
[https://imgur.com/a/nqeq7bd](https://imgur.com/a/nqeq7bd)
2020-09-13 03:44:38 -06:00
## New ##
By default, the console log function is disabled to prevent certain detections.
Until a cleaner solution is found, use the following to manually enable it
```python
import undetected_chromedriver as uc
driver = uc.Chrome(enable_console_log=True)
```
2020-01-02 01:55:42 -07:00
2019-12-22 06:47:16 -07:00
## Installation ##
```
2020-09-03 14:12:09 -06:00
pip install undetected-chromedriver
2019-12-22 06:47:16 -07:00
```
2019-12-22 06:11:07 -07:00
2019-12-22 06:47:41 -07:00
## Usage ##
2019-12-22 06:15:59 -07:00
2020-06-15 11:16:19 -06:00
To prevent unnecessary hair-pulling and issue-rasing, please mind the **[important note at the end of this document](#important-note) .**
<br>
2019-12-22 06:15:59 -07:00
2019-12-22 06:19:55 -07:00
#### the easy way (recommended) ####
2019-12-22 06:15:59 -07:00
```python
2020-06-15 11:23:26 -06:00
import undetected_chromedriver as uc
driver = uc.Chrome()
2019-12-22 06:11:07 -07:00
driver.get('https://distilnetworks.com')
2020-06-15 11:23:26 -06:00
# To target specific version
import undetected_chromedriver as uc
2020-09-01 20:11:20 -06:00
uc.TARGET_VERSION = 85
2020-06-15 11:23:26 -06:00
driver = uc.Chrome()
2019-12-22 06:15:59 -07:00
```
2019-12-22 06:47:41 -07:00
2019-12-22 06:19:55 -07:00
#### patches selenium module ####
Needs to be done before importing from selenium package
2019-12-22 06:15:59 -07:00
```python
import undetected_chromedriver as uc
uc.install()
2019-12-22 06:11:07 -07:00
from selenium.webdriver import Chrome
driver = Chrome()
driver.get('https://distilnetworks.com')
2019-12-22 06:15:59 -07:00
````
2019-12-22 06:47:41 -07:00
2019-12-22 06:19:55 -07:00
#### the customized way ####
2019-12-22 06:15:59 -07:00
```python
import undetected_chromedriver as uc
2019-12-22 06:15:59 -07:00
2019-12-22 06:11:07 -07:00
#specify chromedriver version to download and patch
#this did not work correctly until 1.2.1
uc.TARGET_VERSION = 78
2019-12-22 06:15:59 -07:00
2019-12-22 06:11:07 -07:00
# or specify your own chromedriver binary to patch
undetected_chromedriver.install(
executable_path='c:/users/user1/chromedriver.exe',
)
from selenium.webdriver import Chrome, ChromeOptions
opts = ChromeOptions()
opts.add_argument(f'--proxy-server=socks5://127.0.0.1:9050')
driver = Chrome(options=opts)
driver.get('https://distilnetworks.com')
2019-12-22 06:15:59 -07:00
```
2019-12-22 06:47:41 -07:00
2020-06-15 12:36:00 -06:00
### datadome.co ####
These guys have actually a powerful product, and a link to this repo, which makes me wanna test their product.
Make sure you use a "clean" ip for this one.
```
# STANDARD chromedriver
from selenium import webdriver
chrome = webdriver.Chrome()
chrome.get('https://datadome.co/customers-stories/toppreise-ends-web-scraping-and-content-theft-with-datadome/')
chrome.save_screenshot('datadome_regular_webdriver.png')
True
# after this detectioon, you'll keep being nagged with puzzles, even if you use another machine from the same same network (they use a very tight but effective regime, possibly combination of fingerprinting and ip-flagging).
# UNDETECTED chromedriver (headless,even)
import undetected_chromedriver as uc
options = uc.ChromeOptions()
options.headless=True
options.add_argument('--headless')
chrome = uc.Chrome(options=options)
chrome.get('https://datadome.co/customers-stories/toppreise-ends-web-scraping-and-content-theft-with-datadome/')
chrome.save_screenshot('datadome_undetected_webddriver.png')
```
2020-07-02 06:03:52 -06:00
**Check both saved screenhots [here](https://imgur.com/a/fEmqadP)**
2020-06-15 12:36:00 -06:00
2020-07-02 06:05:00 -06:00
## important note ##
2020-06-15 11:16:19 -06:00
the default blank page on start plays a BIG role in the anti-detection workings of the module. You will only become undetectable from the moment you use driver.get(url) to navigate to some url (and next and next and next). This automatically means that if you enter a url in the browser screen by hand right after launch, you are NOT protected! New Tabs: same story. If you really need multi-tabs, then open the tab with the blank page (hint: url is `data:,` including comma, and yes, driver accepts it) and do your thing as usual. If you follow these "rules" (actually its default behaviour), then you will have a great time for now.
TL;DR and for the visual-minded:
```python
In [1]: import undetected_chromedriver as uc
In [2]: driver = uc.Chrome()
In [3]: driver.execute_script('return navigator.webdriver')
Out[3]: True # Detectable
In [4]: driver.get('https://distilnetworks.com') # starts magic
In [4]: driver.execute_script('return navigator.webdriver')
In [5]: None # Undetectable!
```
2020-07-02 06:05:00 -06:00
## end important note ##
2019-12-22 06:47:41 -07:00
2020-06-15 12:36:00 -06:00