undetected-chromedriver/README.md

139 lines
3.9 KiB
Markdown
Raw Normal View History

2.2.6 - hodl your breath (#161) * 2.2.2 * fixed a number of bugs - specifying custom profile - specifying custom binary path - downloading, patching and storing now (if not explicity specified) happens in a writable folder, instead of the current working dir. Committer: UltrafunkAmsterdam <UltrafunkAmsterdam@github> * tidy up * uncomment block * - support for specifying and reusing the user profile folder. if a user-data-dir is specified, that folder will NOT be deleted on exit. example: options.add_argument('--user-data-dir=c:\\temp') - uses a platform specific app data folder to store driver instead of the current workdir. - impoved headless mode. fixed detection by notification perms. - eliminates the "restore tabs" notification at startup - added methods find_elements_by_text and find_element_by_text - updated docs (partly) -known issues: - extensions not running. this is due to the inner workings of chromedriver. still working on this. - driver window is not always closing along with a program exit. - MacOS: startup nag notifications. might be solved by re(using) a profile directory. - known stuff: - some specific use cases, network conditions or behaviour can cause being detected. * Squashed commit of the following: commit 7ce8e7a236cbee770cb117145d4bf6dc245b936a Author: ultrafunkamsterdam <info@blackhat-security.nl> Date: Fri Apr 30 18:22:39 2021 +0200 readme change commit f214dcf33f26f8b35616d7b61cf6dee656596c3f Author: ultrafunkamsterdam <info@blackhat-security.nl> Date: Fri Apr 30 18:18:09 2021 +0200 - make sure options cannot be reused as it will cause double and conflicting arguments to chrome commit cf059a638cc9139f6fda5da23072488d06577071 Author: ultrafunkamsterdam <info@blackhat-security.nl> Date: Thu Apr 29 12:54:49 2021 +0200 - support for specifying and reusing the user profile folder. if a user-data-dir is specified, that folder will NOT be deleted on exit. example: options.add_argument('--user-data-dir=c:\\temp') - uses a platform specific app data folder to store driver instead of the current workdir. - impoved headless mode. fixed detection by notification perms. - eliminates the "restore tabs" notification at startup - added methods find_elements_by_text and find_element_by_text - updated docs (partly) -known issues: - extensions not running. this is due to the inner workings of chromedriver. still working on this. - driver window is not always closing along with a program exit. - MacOS: startup nag notifications. might be solved by re(using) a profile directory. - known stuff: - some specific use cases, network conditions or behaviour can cause being detected. commit b40d23c6495e89172ddb36ac1a9014bea1319d08 Author: ultrafunkamsterdam <info@blackhat-security.nl> Date: Tue Apr 27 20:41:18 2021 +0200 uncomment block commit d99809c8c61ea38efe9f97aa319170e5e34a8e5a Author: ultrafunkamsterdam <info@blackhat-security.nl> Date: Tue Apr 27 20:19:51 2021 +0200 tidy up * . * 2.2.7 Co-authored-by: ultrafunkamsterdam <info@blackhat-security.nl>
2021-05-01 14:49:59 -06:00
# undetected_chromedriver #
https://github.com/ultrafunkamsterdam/undetected-chromedriver
Optimized Selenium Chromedriver patch which does not trigger anti-bot services like Distill Network / Imperva / DataDome / Botprotect.io
Automatically downloads the driver binary and patches it.
* **Tested until current chrome beta versions**
* **Works also on Brave Browser and many other Chromium based browsers**
* **Python 3.6++**
## Installation ##
```
pip install undetected-chromedriver
```
## Usage ##
To prevent unnecessary hair-pulling and issue-raising, please mind the **[important note at the end of this document](#important-note) .**
<br>
#### The Version 2 way ####
Literally, this is all you have to do. Settings are included and your browser executable found automagically.
```python
import undetected_chromedriver.v2 as uc
driver = uc.Chrome()
with driver:
driver.get('https://coinfaucet.eu') # known url using cloudflare's "under attack mode"
```
<br>
<br>
#### the easy way (v1 old stuff) ####
```python
import undetected_chromedriver as uc
driver = uc.Chrome()
driver.get('https://distilnetworks.com')
```
#### target specific chrome version (v1 old stuff) ####
```python
import undetected_chromedriver as uc
uc.TARGET_VERSION = 85
driver = uc.Chrome()
```
#### monkeypatch mode (v1 old stuff) ####
Needs to be done before importing from selenium package
```python
import undetected_chromedriver as uc
uc.install()
from selenium.webdriver import Chrome
driver = Chrome()
driver.get('https://distilnetworks.com')
```
#### the customized way (v1 old stuff) ####
```python
import undetected_chromedriver as uc
#specify chromedriver version to download and patch
uc.TARGET_VERSION = 78
# or specify your own chromedriver binary (why you would need this, i don't know)
uc.install(
executable_path='c:/users/user1/chromedriver.exe',
)
opts = uc.ChromeOptions()
opts.add_argument(f'--proxy-server=socks5://127.0.0.1:9050')
driver = uc.Chrome(options=opts)
driver.get('https://distilnetworks.com')
```
#### datadome.co example (v1 old stuff) ####
These guys have actually a powerful product, and a link to this repo, which makes me wanna test their product.
Make sure you use a "clean" ip for this one.
```python
#
# STANDARD selenium Chromedriver
#
from selenium import webdriver
chrome = webdriver.Chrome()
chrome.get('https://datadome.co/customers-stories/toppreise-ends-web-scraping-and-content-theft-with-datadome/')
chrome.save_screenshot('datadome_regular_webdriver.png')
True # it caused my ip to be flagged, unfortunately
#
# UNDETECTED chromedriver (headless,even)
#
import undetected_chromedriver as uc
options = uc.ChromeOptions()
options.headless=True
options.add_argument('--headless')
chrome = uc.Chrome(options=options)
chrome.get('https://datadome.co/customers-stories/toppreise-ends-web-scraping-and-content-theft-with-datadome/')
chrome.save_screenshot('datadome_undetected_webddriver.png')
```
**Check both saved screenhots [here](https://imgur.com/a/fEmqadP)**
## important note (v1 old stuff) ####
Due to the inner workings of the module, it is needed to browse programmatically (ie: using .get(url) ). Never use the gui to navigate. Using your keybord and mouse for navigation causes possible detection! New Tabs: same story. If you really need multi-tabs, then open the tab with the blank page (hint: url is `data:,` including comma, and yes, driver accepts it) and do your thing as usual. If you follow these "rules" (actually its default behaviour), then you will have a great time for now.
TL;DR and for the visual-minded:
```python
In [1]: import undetected_chromedriver as uc
In [2]: driver = uc.Chrome()
In [3]: driver.execute_script('return navigator.webdriver')
Out[3]: True # Detectable
In [4]: driver.get('https://distilnetworks.com') # starts magic
In [4]: driver.execute_script('return navigator.webdriver')
In [5]: None # Undetectable!
```
## end important note ##