如何在python中使用selenium刮取youtube评论?
我正试图抓取youtube上的评论,以便每一行都包含视频标题、评论作者和评论本身。如下面的代码所示,我成功地打开了驱动器,并删除了一些身份验证和cookie消息。滚动到足以加载第一条注释。发生这种情况后,我仍然无法通过xpath获取注释文本,如下所示如何在python中使用selenium刮取youtube评论?,python,selenium,Python,Selenium,我正试图抓取youtube上的评论,以便每一行都包含视频标题、评论作者和评论本身。如下面的代码所示,我成功地打开了驱动器,并删除了一些身份验证和cookie消息。滚动到足以加载第一条注释。发生这种情况后,我仍然无法通过xpath获取注释文本,如下所示 csv_file = open('funda_youtube_comments.csv', 'w', encoding="UTF-8", newline="") writer = csv.writer(csv
csv_file = open('funda_youtube_comments.csv', 'w', encoding="UTF-8", newline="")
writer = csv.writer(csv_file)
writer.writerow(['title', 'comment', 'author'])
PATH = r"C:\Users\veiza\OneDrive\Desktop\AUAS\University\Quarter 2\Online Data Mining\Project1test\chromedriver.exe"
driver = webdriver.Chrome(PATH)
driver.implicitly_wait(10)
driver.get("https://www.youtube.com/watch?v=VWQaP9txG6M&t=76s")
driver.maximize_window()
time.sleep(2)
driver.execute_script('window.scrollTo(0,700);')
wait = WebDriverWait(driver, 20)
wait.until(EC.presence_of_element_located((By.XPATH, "//div[@id='dismiss-button']"))).click()
time.sleep(2)
WebDriverWait(driver,10).until(EC.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR,"iframe[src^='https://consent.google.com']")))
WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.XPATH,"//div[@id='introAgreeButton']"))).click()
time.sleep(2)
title = driver.title
print(title)
time.sleep(5)
totalcomments= len(driver.find_elements_by_xpath("""//*[@id="content-text"]"""))
if totalcomments < 50:
index = totalcomments
else:
index = 50
youtube_dict ={}
ccount = 0
while ccount < index:
try:
comment = driver.find_elements_by_xpath('//*[@id="content-text"]')[ccount].text
except:
comment = ""
try:
authors = driver.find_elements_by_xpath('//a[@id="author-text"]/span')[ccount].text
except:
authors = ""
try:
title = title
except:
title = ""
youtube_dict['comment'] = comment
youtube_dict['author'] = authors
youtube_dict['video title'] = title
writer.writerow(youtube_dict.values())
ccount = ccount + 1
print(youtube_dict)
driver.close()
csv\u file=open('funda\u youtube\u comments.csv','w',encoding=“UTF-8”,newline=”“)
writer=csv.writer(csv\u文件)
writer.writerow(['title','comment','author']))
PATH=r“C:\Users\veiza\OneDrive\Desktop\AUAS\University\Quarter 2\Online Data Mining\Project1test\chromedriver.exe”
driver=webdriver.Chrome(路径)
驱动程序。隐式等待(10)
驱动程序。获取(“https://www.youtube.com/watch?v=VWQaP9txG6M&t=76s")
驱动程序。最大化_窗口()
时间。睡眠(2)
driver.execute_脚本('window.scrollTo(0700);'))
wait=WebDriverWait(驱动程序,20)
等待.until(位于((By.XPATH,//div[@id='dismise-button']))的元素的EC.presence_)。单击()
时间。睡眠(2)
WebDriverWait(driver,10)。直到(EC.frame\u to\u be\u available\u和\u switch\u to \u it)((通过.CSS\u选择器,“iframe[src^=”)https://consent.google.com']")))
WebDriverWait(driver,10).until(EC.element可点击((By.XPATH,//div[@id='introAgreeButton']))。点击()
时间。睡眠(2)
title=driver.title
印刷品(标题)
时间。睡眠(5)
totalcomments=len(驱动程序。通过xpath(“”/*[@id=“content text”]“”)查找元素)
如果totalcomments<50:
索引=总评论
其他:
指数=50
youtube_dict={}
帐户=0
当帐户<索引时:
尝试:
comment=driver。通过xpath('/*[@id=“content text”]')[ccount]查找元素
除:
comment=“”
尝试:
authors=driver。通过xpath('//a[@id=“author text”]/span')[ccount]查找元素
除:
作者=“”
尝试:
头衔
除:
title=“”
youtube_dict['comment']=评论
youtube_dict['author']=作者
youtube_dict['video title']=标题
writer.writerow(youtube_dict.values())
帐户=帐户+1
打印(youtube_dict)
驱动程序关闭()
我做错了什么?如果你想让它变得简单,你可以使用tube\u dl
pip install tube_dl
此模块具有Comments类,可帮助您处理注释。
下面是它的简单用法:
from tube_dl.comments import Comments
comments = Comments('yt url').process_comments()
#如果需要有限的注释,可以指定该注释。示例:过程注释(计数=45)
请随时在github.com/shekharchander/tube_dl上提出问题。我很乐意解决问题。我可以从youtube上获得评论。下面您可以看到解决方案
options = Options()
options.add_argument("--headless")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
PATH = r"C:\Users\veiza\OneDrive\Desktop\AUAS\University\Quarter 2\Online Data " \
r"Mining\Project1test\chromedriver.exe "
driver = webdriver.Chrome(executable_path=PATH, options=options)
driver.get(response.url)
time.sleep(5)
try:
title = driver.find_element_by_xpath('//*[@id="container"]/h1/yt-formatted-string').text
comment_section = driver.find_element_by_xpath('//*[@id="comments"]')
except exceptions.NoSuchElementException:
error = "Error: Double check selector OR "
error += "element may not yet be on the screen at the time of the find operation"
print(error)
driver.execute_script("arguments[0].scrollIntoView();", comment_section)
time.sleep(7)
last_height = driver.execute_script("return document.documentElement.scrollHeight")
while True:
driver.execute_script("window.scrollTo(0, document.documentElement.scrollHeight);")
time.sleep(2)
new_height = driver.execute_script("return document.documentElement.scrollHeight")
if new_height == last_height:
break
last_height = new_height
driver.execute_script("window.scrollTo(0, document.documentElement.scrollHeight);")
try:
accounts_elems = driver.find_elements_by_xpath('//*[@id="author-text"]')
comment_elems = driver.find_elements_by_xpath('//*[@id="content-text"]')
except exceptions.NoSuchElementException:
error = "Error: Double check selector OR "
error += "element may not yet be on the screen at the time of the find operation"
print(error)
accounts = [elem.text for elem in accounts_elems]
comments = [elem.text for elem in comment_elems]
for comment_index in range(len(comment_elems)):
yield {
'title': title,
'url': driver.current_url,
'account': accounts[comment_index],
'comment': comments[comment_index]
}
>>我仍然无法通过xpath获取注释文本。这是什么意思?你有例外吗?你得到空值了吗?我得到了空值@missioned当使用Selenium很难做一些事情时,这通常意味着它是非法的。如果你真的想使用他们的API,那通常比坚持使用Selenium作为锤子要快得多。@ConradB事实上,我成功地收集了youtube上的评论。下面你会发现我花时间粘贴解决方案并编写可读代码的解决方案,这个答案应该得到10分。组织良好的示例代码。@ConradB哈哈,谢谢你