Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/334.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 抓取动态HTML(YouTube评论)_Python_Web Scraping_Beautifulsoup_Python Requests_Dynamic Html - Fatal编程技术网

Python 抓取动态HTML(YouTube评论)

Python 抓取动态HTML(YouTube评论),python,web-scraping,beautifulsoup,python-requests,dynamic-html,Python,Web Scraping,Beautifulsoup,Python Requests,Dynamic Html,有了漂亮的Soup和请求库,我可以抓取HTML内容,但不能抓取JavaScript或AJAX调用加载的内容 我如何通过Python脚本来模拟这一点?因为我们滚动页面时会加载YouTube评论。我发现了两种方法;一个使用Selenium,另一个使用lxml请求,我一点也不懂 示例(): 您需要使用selenium: 这里有一个技巧,Youtube只在您向下滚动视频时加载评论,如果您向下滚动或在其他地方,则不会加载评论,因此,请先滚动到该向下部分,然后等待加载评论,然后再滚动到底部,或在您需要时加载

有了漂亮的Soup和请求库,我可以抓取HTML内容,但不能抓取JavaScript或AJAX调用加载的内容

我如何通过Python脚本来模拟这一点?因为我们滚动页面时会加载YouTube评论。我发现了两种方法;一个使用Selenium,另一个使用lxml请求,我一点也不懂

示例():


您需要使用selenium:

这里有一个技巧,Youtube只在您向下滚动视频时加载评论,如果您向下滚动或在其他地方,则不会加载评论,因此,请先滚动到该向下部分,然后等待加载评论,然后再滚动到底部,或在您需要时加载评论:

from selenium import webdriver

import time

driver=webdriver.Chrome()

driver.get('https://www.youtube.com/watch?v=iFPMz36std4')

driver.execute_script('window.scrollTo(1, 500);')

#now wait let load the comments
time.sleep(5)

driver.execute_script('window.scrollTo(1, 3000);')



comment_div=driver.find_element_by_xpath('//*[@id="contents"]')
comments=comment_div.find_elements_by_xpath('//*[@id="content-text"]')
for comment in comments:
    print(comment.text)
部分输出:

#can't post full output its too long
I love Kygo's Stranger Things and Netflix's Stranger Things <3
Stranger Things, Kygo and OneRepublic, could it be better?
Amazing Vibe!!!!!!!!!

Using Selenium would do the trick.

Though I have a different way of scrolling down. This function will help you to scroll down by calling regularly javascript and check whether the height of the window changed between the actual and previous scroll down.

def scrollDown(pause, driver):
    """
    Function to scroll down till end of page.
    """
    import time
    lastHeight = driver.execute_script("return document.body.scrollHeight")

    while True:
        driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
        time.sleep(pause)
        newHeight = driver.execute_script("return document.body.scrollHeight")
        if newHeight == lastHeight:
            break
        lastHeight = newHeight

# Main Code
driver = webdriver.Chrome()

# Instantiate browser and navigate to page

driver.get('https://www.youtube.com/watch?v=iFPMz36std4')
scrollDown(6, driver)

# Page soup 
soup = BeautifulSoup(driver.page_source, "html.parser")
#无法发布完整输出太长
我喜欢Kygo的陌生人的东西和Netflix使用硒的陌生人的东西。
虽然我有不同的向下滚动方式。此函数将通过定期调用javascript帮助您向下滚动,并检查窗口的高度是否在实际向下滚动和上一次向下滚动之间发生变化


请将代码、错误、样本数据或文本输出以纯文本形式发布在此处,而不是以图像形式发布,这些图像可能难以阅读,无法复制粘贴以帮助测试代码或用于回答,并且对使用屏幕阅读器的人不利。您可以编辑问题以在问题正文中添加代码。使用
{}
按钮格式化任何代码块,或使用四个空格缩进以获得相同效果。我们无法将您的屏幕截图作为代码运行。您必须使用浏览器实例,如phantomjs或headless chrome来加载页面并呈现动态内容。“我一点也不懂”:这是您的基本困难,就在这里。使用硒很可能实现您想要的功能。但是,这不是一个教程网站。你需要去找到其中的一个,并且学会写一些代码来尝试做你想做的事情。我们可以不使用硒(不打开浏览器和向下滚动)来做它,比如删除视频的所有评论,你应该考虑YouTube自己的API,使你能够轻松地拉取这种数据。请在此处阅读更多信息:
#can't post full output its too long
I love Kygo's Stranger Things and Netflix's Stranger Things <3
Stranger Things, Kygo and OneRepublic, could it be better?
Amazing Vibe!!!!!!!!!

Using Selenium would do the trick.

Though I have a different way of scrolling down. This function will help you to scroll down by calling regularly javascript and check whether the height of the window changed between the actual and previous scroll down.

def scrollDown(pause, driver):
    """
    Function to scroll down till end of page.
    """
    import time
    lastHeight = driver.execute_script("return document.body.scrollHeight")

    while True:
        driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
        time.sleep(pause)
        newHeight = driver.execute_script("return document.body.scrollHeight")
        if newHeight == lastHeight:
            break
        lastHeight = newHeight

# Main Code
driver = webdriver.Chrome()

# Instantiate browser and navigate to page

driver.get('https://www.youtube.com/watch?v=iFPMz36std4')
scrollDown(6, driver)

# Page soup 
soup = BeautifulSoup(driver.page_source, "html.parser")