如何在selenium中抓取数据而不被Python中的机器人检测到？_Python_Python 3.x_Selenium_Selenium Webdriver

如何在selenium中抓取数据而不被Python中的机器人检测到？

python python-3.x selenium selenium-webdriver

如何在selenium中抓取数据而不被Python中的机器人检测到？,python,python-3.x,selenium,selenium-webdriver,Python,Python 3.x,Selenium,Selenium Webdriver,我是硒的新手，很困惑为什么这不起作用。我试图先登录到他们的页面，因为它需要一个帐户才能查看他们的文章。我想我已经完成了那部分。然而，现在，当我试图查看这篇文章时，它告诉我，我无法查看它，因为它是一个机器人我现在的代码是 from selenium import webdriver from selenium.webdriver.support import expected_conditions as EC from selenium.webdriver.common.by import By

我是硒的新手，很困惑为什么这不起作用。我试图先登录到他们的页面，因为它需要一个帐户才能查看他们的文章。我想我已经完成了那部分。然而，现在，当我试图查看这篇文章时，它告诉我，我无法查看它，因为它是一个机器人

我现在的代码是

from selenium import webdriver
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.chrome.options import Options

CHROMEDRIVER_PATH = './chromedriver'

chrome_options = Options()
chrome_options.add_argument("--headless")
chrome_options.add_argument("start-maximized")
chrome_options.add_argument("--disable-blink-features")
chrome_options.add_argument("--disable-blink-features=AutomationControlled")

LOGIN_PAGE = "https://www.seekingalpha.com/login"
ACCOUNT = "ACCOUNT"
PASSWORD = "PASSWORD"

driver = webdriver.Chrome(executable_path=CHROMEDRIVER_PATH, chrome_options=chrome_options)
driver.execute_script("Object.defineProperty(navigator, 'webdriver', {get: () => undefined})")
driver.execute_cdp_cmd('Network.setUserAgentOverride', {"userAgent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.53 Safari/537.36'})

wait = WebDriverWait(driver, 30)
driver.get("https://www.seekingalpha.com/login")
wait.until(EC.element_to_be_clickable((By.NAME, "email"))).send_keys(ACCOUNT)
wait.until(EC.element_to_be_clickable((By.ID, "signInPasswordField"))).send_keys(PASSWORD)
wait.until(EC.element_to_be_clickable((By.XPATH, "//button[text()='Sign in']"))).click()

driver.get("https://seekingalpha.com/article/4414043-agenus-inc-agen-ceo-garo-armen-on-q4-2020-results-earnings-call-transcript")
text_element = driver.find_elements_by_xpath('//*')

text = text_element

for t in text:
    print(t.text)

我得到

Is this happening to you frequently? Please report it on our feedback forum.
If you have an ad-blocker enabled you may be blocked from proceeding. Please disable your ad-blocker and refresh.
Reference ID: cbbe4cb0-b4c7-11eb-87a2-97a8b0029776
To continue, please prove you are not a robot
...

我不确定这是否合法。如果你有合法的理由在该网站上使用selenium，请咨询网站管理员。你一直被标记为机器人的原因可能有很多。我不能给你一个绝对的答案，因为每个网站都可以实现自己的技术。报废可能是一个尝试/错误调查的问题。但是，关于可能出现的问题，有一些提示：您从已知属于数据中心的服务器或其他服务器执行脚本，您的IP在开始实施算法时已被标记（并且仍将其列入黑名单），您浏览网站的方式（延迟、鼠标移动等）不是人类的代表，等等。。。