Python 3.x 在抓取web驱动程序时定义用户代理字符串

Python 3.x 在抓取web驱动程序时定义用户代理字符串,python-3.x,phantomjs,Python 3.x,Phantomjs,初始化类时,我的实现代码如下所示。执行以下操作时发生错误 FirefoxOptions options = new FirefoxOptions(); ^ SyntaxError: invalid syntax 请帮助使用useragent初始化webdriver。我希望我能避免机器人的自然刮擦。 使用“Mozilla/5.0(Windows NT 6.1;Win64;x64;rv:47.0)Gecko/20100101 Firefox/47.

初始化类时,我的实现代码如下所示。执行以下操作时发生错误

FirefoxOptions options = new FirefoxOptions();
                         ^
SyntaxError: invalid syntax
请帮助使用useragent初始化webdriver。我希望我能避免机器人的自然刮擦。 使用“Mozilla/5.0(Windows NT 6.1;Win64;x64;rv:47.0)Gecko/20100101 Firefox/47.0”作为代理


冲浪之后,我找到了适合我的东西。请建议我如何检查是否相应地设置了webagent

from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
from time import sleep
from bs4 import BeautifulSoup
import pandas as pd

dcap = dict(DesiredCapabilities.PHANTOMJS)
dcap["phantomjs.page.settings.userAgent"] = (
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/53 "
"(KHTML, like Gecko) Chrome/15.0.87")

driver = webdriver.PhantomJS(desired_capabilities=dcap,executable_path=r"C:/PathtoExec/phantomjs.exe")
driver.get("https://www.webpagecontainingtables.com")
soup=BeautifulSoup(driver.page_source,'lxml')
table = soup.find_all('table')[4]
df = pd.read_html(str(table),header=0)
print(df)

chromedriver和geckodriver popout broswer,我不想要,因此更喜欢使用phantomjs
from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
from time import sleep
from bs4 import BeautifulSoup
import pandas as pd

dcap = dict(DesiredCapabilities.PHANTOMJS)
dcap["phantomjs.page.settings.userAgent"] = (
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/53 "
"(KHTML, like Gecko) Chrome/15.0.87")

driver = webdriver.PhantomJS(desired_capabilities=dcap,executable_path=r"C:/PathtoExec/phantomjs.exe")
driver.get("https://www.webpagecontainingtables.com")
soup=BeautifulSoup(driver.page_source,'lxml')
table = soup.find_all('table')[4]
df = pd.read_html(str(table),header=0)
print(df)