Merge branch 'ultrafunkamsterdam:master' into master

This commit is contained in:
Sohag Das Sourav 2023-07-05 16:59:27 +05:30 committed by GitHub
commit 4219281b1f
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
6 changed files with 277 additions and 33 deletions

51
.github/workflows/workflow.yml vendored Normal file
View File

@ -0,0 +1,51 @@
name: Python package
on:
push:
branches: [ "master" ]
pull_request:
branches: [ "master" ]
jobs:
build:
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
python-version: ["3.8", "3.9", "3.10","3.11"]
steps:
- uses: actions/checkout@v3
- name: Setup Chrome
uses: browser-actions/setup-chrome@v1.2.0
with:
chrome-version: stable
- name: set chrome in path
run: |
echo "/opt/hostedtoolcache/chromium/stable/x64" >> $GITHUB_PATH
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v3
with:
python-version: ${{ matrix.python-version }}
- name: Install package
run: |
python -m pip install --upgrade pip
if [ -f requirements.txt ]; then pip install -r requirements.txt; else pip install -U . ; fi
- name: run example
run: |
python example/test_workflow.py
- name: Upload a Build Artifact
uses: actions/upload-artifact@v3.1.2
with:
# Artifact name
name: screenshots
# A file, directory or wildcard pattern that describes what to upload
path: /home/runner/work/_temp/*p*

View File

@ -7,9 +7,22 @@ Optimized Selenium Chromedriver patch which does not trigger anti-bot services l
Automatically downloads the driver binary and patches it. Automatically downloads the driver binary and patches it.
* Tested until current chrome beta versions * Tested until current chrome beta versions
* Works also on Brave Browser and many other Chromium based browsers, some tweaking * Works also on Brave Browser and many other Chromium based browsers, but you need to know what you're doing and needs some tweaking.
* Python 3.6++** * Python 3.6++**
## Installation ##
```
pip install undetected-chromedriver
```
or , if you're feeling adventurous, install directly via github
```
pip install git+https://www.github.com/ultrafunkamsterdam/undetected-chromedriver@master # replace @master with @branchname for other branches
```
- - - - - -
## Message for all ## ## Message for all ##
I will be putting limits on the issue tracker. It has beeen abused too long. I will be putting limits on the issue tracker. It has beeen abused too long.
@ -17,6 +30,33 @@ any good news?
Yes, i've opened [Undetected-Discussions](https://github.com/ultrafunkamsterdam/undetected-chromedriver/discussions) which i think will help us better in the long run. Yes, i've opened [Undetected-Discussions](https://github.com/ultrafunkamsterdam/undetected-chromedriver/discussions) which i think will help us better in the long run.
- - - - - -
What this is not
---
**THIS PACKAGE DOES NOT, and i repeat DOES NOT hide your IP address, so when running from a datacenter (even smaller ones), chances are large you will not pass! Also, if your ip reputation at home is low, you won't pass!**
Running following code from home , and from a datacenter.
```python
import undetected_chromedriver as uc
driver = uc.Chrome(headless=True,use_subprocess=False)
driver.get('https://nowsecure.nl')
driver.save_screenshot('nowsecure.png')
```
<div style="display:flex;flex-direction:row">
<img src="https://github.com/ultrafunkamsterdam/undetected-chromedriver/assets/21027969/262dad3e-33e9-4d67-b061-b30bc74ac9bc" width="720"/>
<img src="https://github.com/ultrafunkamsterdam/undetected-chromedriver/assets/21027969/5e1d463b-3f88-496a-9a43-a39830f909da" width="720"/>
</div>
<!-- ![nowscure_local](https://github.com/ultrafunkamsterdam/undetected-chromedriver/assets/21027969/262dad3e-33e9-4d67-b061-b30bc74ac9bc) -->
<!-- ![nowsecure_dc](https://github.com/ultrafunkamsterdam/undetected-chromedriver/assets/21027969/5e1d463b-3f88-496a-9a43-a39830f909da) -->
## 3.5.0 ##
- selenium 4.10 caused some issues. 3.5.0 is compatible and has selenium 4.9 or above pinned. I can't support <4.9 any longer.
- Removed some kwargs from constructor: service_args, service_creationflags, service_log_path.
- added find_elements_recursive generator function. which is more of a convenience funtion as lots of websites seem to serve different content from different frames, making it hard
to use find_elements
## 3.4.5 ## ## 3.4.5 ##
- What a week. Had the recent advancedments in Automation-Detection algorithms pwned (so i thought) with 3.4.0, but apparently, for some OS-es this caused an error when interacting with elements. Had to revert back using a different method, fix bugs, and now eventually was still able to stick to the initial idea (+ fixing bugs) - What a week. Had the recent advancedments in Automation-Detection algorithms pwned (so i thought) with 3.4.0, but apparently, for some OS-es this caused an error when interacting with elements. Had to revert back using a different method, fix bugs, and now eventually was still able to stick to the initial idea (+ fixing bugs)
- Update to chrome 110 caused another surprise, this time for HEADLESS users. - Update to chrome 110 caused another surprise, this time for HEADLESS users.
@ -143,11 +183,7 @@ the solution is simple:
**newsflash: https://github.com/ultrafunkamsterdam/undetected-chromedriver/pull/255** **newsflash: https://github.com/ultrafunkamsterdam/undetected-chromedriver/pull/255**
## Installation ##
```
pip install undetected-chromedriver
```
## Usage ## ## Usage ##

View File

@ -1,10 +1,13 @@
import time import time
import logging
logging.basicConfig(level=10)
from selenium.common.exceptions import WebDriverException from selenium.common.exceptions import WebDriverException
from selenium.webdriver.remote.webdriver import By from selenium.webdriver.remote.webdriver import By
import selenium.webdriver.support.expected_conditions as EC # noqa import selenium.webdriver.support.expected_conditions as EC # noqa
from selenium.webdriver.support.wait import WebDriverWait from selenium.webdriver.support.wait import WebDriverWait
import undetected_chromedriver as uc import undetected_chromedriver as uc
@ -164,7 +167,8 @@ def main(args=None):
print("lets go to UC project page") print("lets go to UC project page")
driver.get("https://www.github.com/ultrafunkamsterdam/undetected-chromedriver") driver.get("https://www.github.com/ultrafunkamsterdam/undetected-chromedriver")
input("press a key if you have RTFM")
sleep(2)
driver.quit() driver.quit()

121
example/test_workflow.py Normal file
View File

@ -0,0 +1,121 @@
# coding: utf-8
import time
import logging
import os
from selenium.webdriver.support.wait import WebDriverWait
import selenium.webdriver.support.expected_conditions as EC
from selenium.common.exceptions import TimeoutException
import undetected_chromedriver as uc
from pathlib import Path
logging.basicConfig(level=10)
logger = logging.getLogger('test')
def main():
####
# this block is a dirty helper since
# in the action runner devices serveral chrome versions exists
# and i need to ensure it takes the one which is installed
# by the task.
####
for k,v in os.environ.items():
logger.info("%s = %s" % (k,v))
logger.info('==== END ENV ==== ')
tmp = Path('/tmp').resolve()
for item in tmp.rglob('**'):
logger.info('found %s ' % item)
if item.is_dir():
if 'chrome-' in item.name:
logger.info('adding %s to PATH' % str(item))
logger.info('current PATH: %s' % str(os.environ.get('PATH')))
path_list = os.environ['PATH'].split(os.pathsep)
path_list.insert(0, str(item))
os.environ['PATH'] = os.pathsep.join(path_list)
logger.info('new PATH %s:' % str(os.environ.get('PATH')))
browser_executable_path = str(item / 'chrome')
break
####
# test really starts here
#3##
driver = uc.Chrome(headless=True, browser_executable_path=browser_executable_path)
logging.getLogger().setLevel(10)
driver.get('chrome://version')
driver.save_screenshot('/home/runner/work/_temp/versioninfo.png')
driver.get('chrome://settings/help')
driver.save_screenshot('/home/runner/work/_temp/helpinfo.png')
driver.get('https://www.google.com')
driver.save_screenshot('/home/runner/work/_temp/google.com.png')
driver.get('https://bot.incolumitas.com/#botChallenge')
pdfdata = driver.execute_cdp_cmd('Page.printToPDF', {})
if pdfdata:
if 'data' in pdfdata:
data = pdfdata['data']
import base64
buffer = base64.b64decode(data)
with open('/home/runner/work/_temp/report.pdf', 'w+b') as f:
f.write(buffer)
driver.get('https://www.nowsecure.nl')
logger.info('current url %s' % driver.current_url)
try:
WebDriverWait(driver,15).until(EC.title_contains('moment'))
except TimeoutException:
pass
logger.info('current page source:\n%s' % driver.page_source)
logger.info('current url %s' % driver.current_url)
try:
WebDriverWait(driver,15).until(EC.title_contains('nowSecure'))
logger.info('PASSED CLOUDFLARE!')
except TimeoutException:
logger.info('timeout')
print(driver.current_url)
logger.info('current page source:\n%s\n' % driver.page_source)
#logger.info('trying to save a screenshot via imgur')
driver.save_screenshot('/home/runner/work/_temp/nowsecure.png')
#driver.get('https://imgur.com/upload')
#driver.find_element('css selector', 'input').send_keys('/home/runner/work/_temp/nowsecure.png')
#time.sleep(1)
#logger.info('current url %s' % driver.current_url)
#time.sleep(1)
#logger.info(f'A SCREENSHOT IS SAVED ON {driver.current_url} <<< if this ends onlywith /upload than it failed. after all we are running from a datacenter no human being would ever surf the internet from ')
#time.sleep(5)
driver.quit()
if __name__ == "__main__":
main()

View File

@ -41,7 +41,7 @@ setup(
version=version, version=version,
packages=["undetected_chromedriver"], packages=["undetected_chromedriver"],
install_requires=[ install_requires=[
"selenium>=4.0.0", "selenium>=4.9.0",
"requests", "requests",
"websockets", "websockets",
], ],

View File

@ -17,7 +17,7 @@ by UltrafunkAmsterdam (https://github.com/ultrafunkamsterdam)
from __future__ import annotations from __future__ import annotations
__version__ = "3.4.7" __version__ = "3.5.0"
import json import json
import logging import logging
@ -33,7 +33,7 @@ from weakref import finalize
import selenium.webdriver.chrome.service import selenium.webdriver.chrome.service
import selenium.webdriver.chrome.webdriver import selenium.webdriver.chrome.webdriver
from selenium.webdriver.common.by import By from selenium.webdriver.common.by import By
import selenium.webdriver.common.service import selenium.webdriver.chromium.service
import selenium.webdriver.remote.command import selenium.webdriver.remote.command
import selenium.webdriver.remote.webdriver import selenium.webdriver.remote.webdriver
@ -109,11 +109,11 @@ class Chrome(selenium.webdriver.chrome.webdriver.WebDriver):
browser_executable_path=None, browser_executable_path=None,
port=0, port=0,
enable_cdp_events=False, enable_cdp_events=False,
service_args=None, # service_args=None,
service_creationflags=None, # service_creationflags=None,
desired_capabilities=None, desired_capabilities=None,
advanced_elements=False, advanced_elements=False,
service_log_path=None, # service_log_path=None,
keep_alive=True, keep_alive=True,
log_level=0, log_level=0,
headless=False, headless=False,
@ -383,9 +383,15 @@ class Chrome(selenium.webdriver.chrome.webdriver.WebDriver):
options.arguments.extend(["--no-sandbox", "--test-type"]) options.arguments.extend(["--no-sandbox", "--test-type"])
if headless or options.headless: if headless or options.headless:
if self.patcher.version_main < 108: #workaround until a better checking is found
options.add_argument("--headless=chrome") try:
elif self.patcher.version_main >= 108: if self.patcher.version_main < 108:
options.add_argument("--headless=chrome")
elif self.patcher.version_main >= 108:
options.add_argument("--headless=new")
except:
logger.warning("could not detect version_main."
"therefore, we are assuming it is chrome 108 or higher")
options.add_argument("--headless=new") options.add_argument("--headless=new")
options.add_argument("--window-size=1920,1080") options.add_argument("--window-size=1920,1080")
@ -439,26 +445,15 @@ class Chrome(selenium.webdriver.chrome.webdriver.WebDriver):
) )
self.browser_pid = browser.pid self.browser_pid = browser.pid
if service_creationflags:
service = selenium.webdriver.common.service.Service( service = selenium.webdriver.chromium.service.ChromiumService(
self.patcher.executable_path, port, service_args, service_log_path self.patcher.executable_path
) )
for attr_name in ("creationflags", "creation_flags"):
if hasattr(service, attr_name):
setattr(service, attr_name, service_creationflags)
break
else:
service = None
super(Chrome, self).__init__( super(Chrome, self).__init__(
executable_path=self.patcher.executable_path, service=service,
port=port,
options=options, options=options,
service_args=service_args,
desired_capabilities=desired_capabilities,
service_log_path=service_log_path,
keep_alive=keep_alive, keep_alive=keep_alive,
service=service, # needed or the service will be re-created
) )
self.reactor = None self.reactor = None
@ -714,10 +709,45 @@ class Chrome(selenium.webdriver.chrome.webdriver.WebDriver):
if not capabilities: if not capabilities:
capabilities = self.options.to_capabilities() capabilities = self.options.to_capabilities()
super(selenium.webdriver.chrome.webdriver.WebDriver, self).start_session( super(selenium.webdriver.chrome.webdriver.WebDriver, self).start_session(
capabilities, browser_profile capabilities
) )
# super(Chrome, self).start_session(capabilities, browser_profile) # super(Chrome, self).start_session(capabilities, browser_profile)
def find_elements_recursive(self, by, value):
"""
find elements in all frames
this is a generator function, which is needed
since if it would return a list of elements, they
will be stale on arrival.
using generator, when the element is returned we are in the correct frame
to use it directly
Args:
by: By
value: str
Returns: Generator[webelement.WebElement]
"""
def search_frame(f=None):
if not f:
# ensure we are on main content frame
self.switch_to.default_content()
else:
self.switch_to.frame(f)
for elem in self.find_elements(by, value):
yield elem
# switch back to main content, otherwise we will get StaleElementReferenceException
self.switch_to.default_content()
# search root frame
for elem in search_frame():
yield elem
# get iframes
frames = self.find_elements('css selector', 'iframe')
# search per frame
for f in frames:
for elem in search_frame(f):
yield elem
def quit(self): def quit(self):
try: try:
self.service.process.kill() self.service.process.kill()
@ -733,7 +763,7 @@ class Chrome(selenium.webdriver.chrome.webdriver.WebDriver):
os.kill(self.browser_pid, 15) os.kill(self.browser_pid, 15)
logger.debug("gracefully closed browser") logger.debug("gracefully closed browser")
except Exception as e: # noqa except Exception as e: # noqa
logger.debug(e, exc_info=True) pass
if ( if (
hasattr(self, "keep_user_data_dir") hasattr(self, "keep_user_data_dir")
and hasattr(self, "user_data_dir") and hasattr(self, "user_data_dir")
@ -852,5 +882,7 @@ def find_chrome_executable():
): ):
candidates.add(os.sep.join((item, subitem, "chrome.exe"))) candidates.add(os.sep.join((item, subitem, "chrome.exe")))
for candidate in candidates: for candidate in candidates:
logger.debug('checking if %s exists and is executable' % candidate)
if os.path.exists(candidate) and os.access(candidate, os.X_OK): if os.path.exists(candidate) and os.access(candidate, os.X_OK):
logger.debug('found! using %s' % candidate)
return os.path.normpath(candidate) return os.path.normpath(candidate)