如何在python中使用selenium刮取youtube评论?

如何在python中使用selenium刮取youtube评论?,python,selenium,Python,Selenium,我正试图抓取youtube上的评论,以便每一行都包含视频标题、评论作者和评论本身。如下面的代码所示,我成功地打开了驱动器,并删除了一些身份验证和cookie消息。滚动到足以加载第一条注释。发生这种情况后,我仍然无法通过xpath获取注释文本,如下所示 csv_file = open('funda_youtube_comments.csv', 'w', encoding="UTF-8", newline="") writer = csv.writer(csv

我正试图抓取youtube上的评论,以便每一行都包含视频标题、评论作者和评论本身。如下面的代码所示,我成功地打开了驱动器,并删除了一些身份验证和cookie消息。滚动到足以加载第一条注释。发生这种情况后,我仍然无法通过xpath获取注释文本,如下所示

csv_file = open('funda_youtube_comments.csv', 'w', encoding="UTF-8", newline="")
writer = csv.writer(csv_file)

writer.writerow(['title', 'comment', 'author'])

PATH = r"C:\Users\veiza\OneDrive\Desktop\AUAS\University\Quarter 2\Online Data Mining\Project1test\chromedriver.exe"
driver = webdriver.Chrome(PATH)
driver.implicitly_wait(10)
driver.get("https://www.youtube.com/watch?v=VWQaP9txG6M&t=76s")
driver.maximize_window()
time.sleep(2)
driver.execute_script('window.scrollTo(0,700);')
wait = WebDriverWait(driver, 20)
wait.until(EC.presence_of_element_located((By.XPATH, "//div[@id='dismiss-button']"))).click()
time.sleep(2)
WebDriverWait(driver,10).until(EC.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR,"iframe[src^='https://consent.google.com']")))
WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.XPATH,"//div[@id='introAgreeButton']"))).click()
time.sleep(2)
title = driver.title
print(title)
time.sleep(5)

totalcomments= len(driver.find_elements_by_xpath("""//*[@id="content-text"]"""))
if totalcomments < 50:
    index = totalcomments
else:
    index = 50

youtube_dict ={}

ccount = 0
while ccount < index:
    try:
        comment = driver.find_elements_by_xpath('//*[@id="content-text"]')[ccount].text
    except:
        comment = ""
    try:
        authors = driver.find_elements_by_xpath('//a[@id="author-text"]/span')[ccount].text
    except:
        authors = ""
    try:
        title = title
    except:
        title = ""

    youtube_dict['comment'] = comment
    youtube_dict['author'] = authors
    youtube_dict['video title'] = title

    writer.writerow(youtube_dict.values())
    ccount = ccount + 1

print(youtube_dict)
driver.close()
csv\u file=open('funda\u youtube\u comments.csv','w',encoding=“UTF-8”,newline=”“)
writer=csv.writer(csv\u文件)
writer.writerow(['title','comment','author']))
PATH=r“C:\Users\veiza\OneDrive\Desktop\AUAS\University\Quarter 2\Online Data Mining\Project1test\chromedriver.exe”
driver=webdriver.Chrome(路径)
驱动程序。隐式等待(10)
驱动程序。获取(“https://www.youtube.com/watch?v=VWQaP9txG6M&t=76s")
驱动程序。最大化_窗口()
时间。睡眠(2)
driver.execute_脚本('window.scrollTo(0700);'))
wait=WebDriverWait(驱动程序,20)
等待.until(位于((By.XPATH,//div[@id='dismise-button']))的元素的EC.presence_)。单击()
时间。睡眠(2)
WebDriverWait(driver,10)。直到(EC.frame\u to\u be\u available\u和\u switch\u to \u it)((通过.CSS\u选择器,“iframe[src^=”)https://consent.google.com']")))
WebDriverWait(driver,10).until(EC.element可点击((By.XPATH,//div[@id='introAgreeButton']))。点击()
时间。睡眠(2)
title=driver.title
印刷品(标题)
时间。睡眠(5)
totalcomments=len(驱动程序。通过xpath(“”/*[@id=“content text”]“”)查找元素)
如果totalcomments<50:
索引=总评论
其他:
指数=50
youtube_dict={}
帐户=0
当帐户<索引时:
尝试:
comment=driver。通过xpath('/*[@id=“content text”]')[ccount]查找元素
除:
comment=“”
尝试:
authors=driver。通过xpath('//a[@id=“author text”]/span')[ccount]查找元素
除:
作者=“”
尝试:
头衔
除:
title=“”
youtube_dict['comment']=评论
youtube_dict['author']=作者
youtube_dict['video title']=标题
writer.writerow(youtube_dict.values())
帐户=帐户+1
打印(youtube_dict)
驱动程序关闭()

我做错了什么?

如果你想让它变得简单,你可以使用tube\u dl

pip install tube_dl
此模块具有Comments类,可帮助您处理注释。 下面是它的简单用法:

from tube_dl.comments import Comments
comments = Comments('yt url').process_comments() 
#如果需要有限的注释,可以指定该注释。示例:过程注释(计数=45)


请随时在github.com/shekharchander/tube_dl上提出问题。我很乐意解决问题。

我可以从youtube上获得评论。下面您可以看到解决方案

options = Options()
        options.add_argument("--headless")
        options.add_experimental_option("excludeSwitches", ["enable-automation"])
        options.add_experimental_option('useAutomationExtension', False)
        PATH = r"C:\Users\veiza\OneDrive\Desktop\AUAS\University\Quarter 2\Online Data " \
               r"Mining\Project1test\chromedriver.exe "
        driver = webdriver.Chrome(executable_path=PATH, options=options)
        driver.get(response.url)
        time.sleep(5)

        try:
            title = driver.find_element_by_xpath('//*[@id="container"]/h1/yt-formatted-string').text
            comment_section = driver.find_element_by_xpath('//*[@id="comments"]')
        except exceptions.NoSuchElementException:
            error = "Error: Double check selector OR "
            error += "element may not yet be on the screen at the time of the find operation"
            print(error)

        driver.execute_script("arguments[0].scrollIntoView();", comment_section)
        time.sleep(7)

        last_height = driver.execute_script("return document.documentElement.scrollHeight")

        while True:
            driver.execute_script("window.scrollTo(0, document.documentElement.scrollHeight);")
            time.sleep(2)
            new_height = driver.execute_script("return document.documentElement.scrollHeight")
            if new_height == last_height:
                break
            last_height = new_height

        driver.execute_script("window.scrollTo(0, document.documentElement.scrollHeight);")

        try:
            accounts_elems = driver.find_elements_by_xpath('//*[@id="author-text"]')
            comment_elems = driver.find_elements_by_xpath('//*[@id="content-text"]')
        except exceptions.NoSuchElementException:
            error = "Error: Double check selector OR "
            error += "element may not yet be on the screen at the time of the find operation"
            print(error)

        accounts = [elem.text for elem in accounts_elems]
        comments = [elem.text for elem in comment_elems]

        for comment_index in range(len(comment_elems)):
            yield {
                'title': title,
                'url': driver.current_url,
                'account': accounts[comment_index],
                'comment': comments[comment_index]
            }

>>我仍然无法通过xpath获取注释文本。这是什么意思?你有例外吗?你得到空值了吗?我得到了空值@missioned当使用Selenium很难做一些事情时,这通常意味着它是非法的。如果你真的想使用他们的API,那通常比坚持使用Selenium作为锤子要快得多。@ConradB事实上,我成功地收集了youtube上的评论。下面你会发现我花时间粘贴解决方案并编写可读代码的解决方案,这个答案应该得到10分。组织良好的示例代码。@ConradB哈哈,谢谢你