如何在python中使用selenium刮取youtube评论？_Python_Selenium

如何在python中使用selenium刮取youtube评论？

python selenium

如何在python中使用selenium刮取youtube评论？,python,selenium,Python,Selenium,我正试图抓取youtube上的评论，以便每一行都包含视频标题、评论作者和评论本身。如下面的代码所示，我成功地打开了驱动器，并删除了一些身份验证和cookie消息。滚动到足以加载第一条注释。发生这种情况后，我仍然无法通过xpath获取注释文本，如下所示 csv_file = open('funda_youtube_comments.csv', 'w', encoding="UTF-8", newline="") writer = csv.writer(csv

我正试图抓取youtube上的评论，以便每一行都包含视频标题、评论作者和评论本身。如下面的代码所示，我成功地打开了驱动器，并删除了一些身份验证和cookie消息。滚动到足以加载第一条注释。发生这种情况后，我仍然无法通过xpath获取注释文本，如下所示

csv_file = open('funda_youtube_comments.csv', 'w', encoding="UTF-8", newline="")
writer = csv.writer(csv_file)

writer.writerow(['title', 'comment', 'author'])

PATH = r"C:\Users\veiza\OneDrive\Desktop\AUAS\University\Quarter 2\Online Data Mining\Project1test\chromedriver.exe"
driver = webdriver.Chrome(PATH)
driver.implicitly_wait(10)
driver.get("https://www.youtube.com/watch?v=VWQaP9txG6M&t=76s")
driver.maximize_window()
time.sleep(2)
driver.execute_script('window.scrollTo(0,700);')
wait = WebDriverWait(driver, 20)
wait.until(EC.presence_of_element_located((By.XPATH, "//div[@id='dismiss-button']"))).click()
time.sleep(2)
WebDriverWait(driver,10).until(EC.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR,"iframe[src^='https://consent.google.com']")))
WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.XPATH,"//div[@id='introAgreeButton']"))).click()
time.sleep(2)
title = driver.title
print(title)
time.sleep(5)

totalcomments= len(driver.find_elements_by_xpath("""//*[@id="content-text"]"""))
if totalcomments < 50:
    index = totalcomments
else:
    index = 50

youtube_dict ={}

ccount = 0
while ccount < index:
    try:
        comment = driver.find_elements_by_xpath('//*[@id="content-text"]')[ccount].text
    except:
        comment = ""
    try:
        authors = driver.find_elements_by_xpath('//a[@id="author-text"]/span')[ccount].text
    except:
        authors = ""
    try:
        title = title
    except:
        title = ""

    youtube_dict['comment'] = comment
    youtube_dict['author'] = authors
    youtube_dict['video title'] = title

    writer.writerow(youtube_dict.values())
    ccount = ccount + 1

print(youtube_dict)
driver.close()

csv\u file=open（'funda\u youtube\u comments.csv'，'w'，encoding=“UTF-8”，newline=”“）
writer=csv.writer（csv\u文件）
writer.writerow（['title'，'comment'，'author']））
PATH=r“C:\Users\veiza\OneDrive\Desktop\AUAS\University\Quarter 2\Online Data Mining\Project1test\chromedriver.exe”
driver=webdriver.Chrome（路径）
驱动程序。隐式等待（10）
驱动程序。获取（“https://www.youtube.com/watch?v=VWQaP9txG6M&t=76s")
驱动程序。最大化_窗口（）
时间。睡眠（2）
driver.execute_脚本（'window.scrollTo（0700）；'））
wait=WebDriverWait（驱动程序，20）
等待.until（位于（（By.XPATH，//div[@id='dismise-button']））的元素的EC.presence_）。单击（）
时间。睡眠（2）
WebDriverWait（driver，10）。直到（EC.frame\u to\u be\u available\u和\u switch\u to \u it）（（通过.CSS\u选择器，“iframe[src^=”）https://consent.google.com']")))
WebDriverWait（driver，10）.until（EC.element可点击（（By.XPATH，//div[@id='introAgreeButton']））。点击（）
时间。睡眠（2）
title=driver.title
印刷品（标题）
时间。睡眠（5）
totalcomments=len（驱动程序。通过xpath（“”/*[@id=“content text”]“”）查找元素）
如果totalcomments<50：
索引=总评论
其他：
指数=50
youtube_dict={}
帐户=0
当帐户<索引时：
尝试：
comment=driver。通过xpath（'/*[@id=“content text”]'）[ccount]查找元素
除：
comment=“”
尝试：
authors=driver。通过xpath（'//a[@id=“author text”]/span'）[ccount]查找元素
除：
作者=“”
尝试：
头衔
除：
title=“”
youtube_dict['comment']=评论
youtube_dict['author']=作者
youtube_dict['video title']=标题
writer.writerow（youtube_dict.values（））
帐户=帐户+1
打印（youtube_dict）
驱动程序关闭（）

我做错了什么？

如果你想让它变得简单，你可以使用tube\u dl

pip install tube_dl

此模块具有Comments类，可帮助您处理注释。下面是它的简单用法：

from tube_dl.comments import Comments
comments = Comments('yt url').process_comments()

#如果需要有限的注释，可以指定该注释。示例：过程注释（计数=45）

请随时在github.com/shekharchander/tube_dl上提出问题。我很乐意解决问题。

我可以从youtube上获得评论。下面您可以看到解决方案

options = Options()
        options.add_argument("--headless")
        options.add_experimental_option("excludeSwitches", ["enable-automation"])
        options.add_experimental_option('useAutomationExtension', False)
        PATH = r"C:\Users\veiza\OneDrive\Desktop\AUAS\University\Quarter 2\Online Data " \
               r"Mining\Project1test\chromedriver.exe "
        driver = webdriver.Chrome(executable_path=PATH, options=options)
        driver.get(response.url)
        time.sleep(5)

        try:
            title = driver.find_element_by_xpath('//*[@id="container"]/h1/yt-formatted-string').text
            comment_section = driver.find_element_by_xpath('//*[@id="comments"]')
        except exceptions.NoSuchElementException:
            error = "Error: Double check selector OR "
            error += "element may not yet be on the screen at the time of the find operation"
            print(error)

        driver.execute_script("arguments[0].scrollIntoView();", comment_section)
        time.sleep(7)

        last_height = driver.execute_script("return document.documentElement.scrollHeight")

        while True:
            driver.execute_script("window.scrollTo(0, document.documentElement.scrollHeight);")
            time.sleep(2)
            new_height = driver.execute_script("return document.documentElement.scrollHeight")
            if new_height == last_height:
                break
            last_height = new_height

        driver.execute_script("window.scrollTo(0, document.documentElement.scrollHeight);")

        try:
            accounts_elems = driver.find_elements_by_xpath('//*[@id="author-text"]')
            comment_elems = driver.find_elements_by_xpath('//*[@id="content-text"]')
        except exceptions.NoSuchElementException:
            error = "Error: Double check selector OR "
            error += "element may not yet be on the screen at the time of the find operation"
            print(error)

        accounts = [elem.text for elem in accounts_elems]
        comments = [elem.text for elem in comment_elems]

        for comment_index in range(len(comment_elems)):
            yield {
                'title': title,
                'url': driver.current_url,
                'account': accounts[comment_index],
                'comment': comments[comment_index]
            }

>>我仍然无法通过xpath获取注释文本。这是什么意思？你有例外吗？你得到空值了吗？我得到了空值@missioned当使用Selenium很难做一些事情时，这通常意味着它是非法的。如果你真的想使用他们的API，那通常比坚持使用Selenium作为锤子要快得多。@ConradB事实上，我成功地收集了youtube上的评论。下面你会发现我花时间粘贴解决方案并编写可读代码的解决方案，这个答案应该得到10分。组织良好的示例代码。@ConradB哈哈，谢谢你