Web scraping 如何首先从特定频道抓取新的YouTube视频?

Web scraping 如何首先从特定频道抓取新的YouTube视频?,web-scraping,youtube,Web Scraping,Youtube,我有一个频道,我想从它得到一个新的上传视频尽快。最好的方法是什么?我知道有两个选择: 使用YouTube API 直接通过url访问频道 使用选项1,我需要调用api以获得视频列表。既然有了配额,我想我将无法进行api调用。我认为选项2是最好的选择,因为我可以随时调用url 是否首先通过api提供新视频?或者,根据用户来自的地区,通过url访问的视频是否在不同的时间提供给用户?我自己建立了一个url刮刀。我每分钟访问一次url。还有人在我拿到录像带前8分钟拿到了录像带。我不明白为什么会出现这种情

我有一个频道,我想从它得到一个新的上传视频尽快。最好的方法是什么?我知道有两个选择:

  • 使用YouTube API
  • 直接通过url访问频道
  • 使用选项1,我需要调用api以获得视频列表。既然有了配额,我想我将无法进行api调用。我认为选项2是最好的选择,因为我可以随时调用url


    是否首先通过api提供新视频?或者,根据用户来自的地区,通过url访问的视频是否在不同的时间提供给用户?我自己建立了一个url刮刀。我每分钟访问一次url。还有人在我拿到录像带前8分钟拿到了录像带。我不明白为什么会出现这种情况。

    您可以尝试RSS订阅感兴趣的频道。它包含带有UTC时间戳的新视频(因此您提到的时区没有问题)


    该频道视频的RSS链接可以在该频道页面的源位置找到。打开页面源代码并搜索“rssUrl”:

    您可以尝试RSS提要以获取感兴趣的频道。它包含带有UTC时间戳的新视频(因此您提到的时区没有问题)


    该频道视频的RSS链接可以在该频道页面的源位置找到。打开页面源代码并搜索“rssUrl”:

    只需在MadRay写的内容基础上做一点扩展,就可以使用此URL进行一些简单的字符串替换

    使用通道ID:

    "https://www.youtube.com/feeds/videos.xml?channel_id=UCXuqSBlHAE6Xw-yeJA0Tunw"
    
    使用频道名称:

    https://www.youtube.com/feeds/videos.xml?user=LinusTechTips
    
    我冒昧地帮你分析了一下

    from bs4 import BeautifulSoup
    import requests
    
    url="https://www.youtube.com/feeds/videos.xml?user=LinusTechTips"
    html = requests.get(url)
    soup = BeautifulSoup(html.text, "lxml")
    
    for entry in soup.find_all("entry"):
        for title in entry.find_all("title"):
            print(title.text)
        for link in entry.find_all("link"):
            print(link["href"])
        for name in entry.find_all("name"):
            print(name.text)
        for pub in entry.find_all("published"):
            print(pub.text)
    
    答复:

    FINALLY Wireless Headphones that Sound GREAT
    https://www.youtube.com/watch?v=rei5vMQmD4Q
    Linus Tech Tips
    2020-01-30T20:04:37+00:00
    Don't give Apple your MONEY - Mac Pro Upgrade Adventure
    https://www.youtube.com/watch?v=zcLbSCinX3U
    Linus Tech Tips
    2020-01-29T19:59:56+00:00
    We got the Kick-Proof TV from China!
    https://www.youtube.com/watch?v=4eSADWuZskk
    Linus Tech Tips
    2020-01-28T19:46:09+00:00
    Everything went wrong... Water Cooled 8K Camera Final Test
    https://www.youtube.com/watch?v=OEUCNh5g-2I
    Linus Tech Tips
    2020-01-27T20:08:27+00:00
    I'm Returning my Mac Pro
    https://www.youtube.com/watch?v=mIB389tqzCI
    Linus Tech Tips
    2020-01-26T19:59:45+00:00
    The RGB HDMI cable ISN'T as dumb as you'd think...
    https://www.youtube.com/watch?v=nva6oPszm60
    Linus Tech Tips
    2020-01-25T20:06:23+00:00
    I am NOT Retiring... yet - WAN Show Jan 24, 2020
    https://www.youtube.com/watch?v=cxjhTVR_dJw
    Linus Tech Tips
    2020-01-25T02:29:50+00:00
    The Best VR Headset... got BETTER!?
    https://www.youtube.com/watch?v=AGScX_8plYw
    Linus Tech Tips
    2020-01-23T19:52:00+00:00
    I've been thinking of retiring.
    https://www.youtube.com/watch?v=hAsZCTL__lo
    Linus Tech Tips
    2020-01-23T06:35:25+00:00
    It’s time to upgrade your GPU - RX 5600 XT
    https://www.youtube.com/watch?v=rKn-vWDMkwQ
    Linus Tech Tips
    2020-01-22T19:59:36+00:00
    WE FINALLY DID IT!! - Water Cooling the 8K Camera!
    https://www.youtube.com/watch?v=imJ9QgOJHzY
    Linus Tech Tips
    2020-01-21T19:59:47+00:00
    We Water Cooled an SSD!!
    https://www.youtube.com/watch?v=lQmI5A27Iv8
    Linus Tech Tips
    2020-01-20T20:17:22+00:00
    Should you buy a $50 CPU??
    https://www.youtube.com/watch?v=JISJ_YTI9s0
    Linus Tech Tips
    2020-01-19T20:19:02+00:00
    Apple’s Pro Display XDR – A PC Guy’s Perspective
    https://www.youtube.com/watch?v=X089oYPc5Pg
    Linus Tech Tips
    2020-01-18T19:59:29+00:00
    The NSA is Giving Out It's Hacks for Free! - WAN Show Jan 17, 2020
    https://www.youtube.com/watch?v=af6FBA-n7eA
    Linus Tech Tips
    2020-01-18T03:00:04+00:00
    

    但是,请记住,在您的请求中使用标题,并注意一次点击YouTube后端的次数过多,因为您的IP将被暂停12小时。祝你好运

    只需在MadRay写的内容基础上做一点扩展,您就可以使用此URL进行一些简单的字符串替换

    使用通道ID:

    "https://www.youtube.com/feeds/videos.xml?channel_id=UCXuqSBlHAE6Xw-yeJA0Tunw"
    
    使用频道名称:

    https://www.youtube.com/feeds/videos.xml?user=LinusTechTips
    
    我冒昧地帮你分析了一下

    from bs4 import BeautifulSoup
    import requests
    
    url="https://www.youtube.com/feeds/videos.xml?user=LinusTechTips"
    html = requests.get(url)
    soup = BeautifulSoup(html.text, "lxml")
    
    for entry in soup.find_all("entry"):
        for title in entry.find_all("title"):
            print(title.text)
        for link in entry.find_all("link"):
            print(link["href"])
        for name in entry.find_all("name"):
            print(name.text)
        for pub in entry.find_all("published"):
            print(pub.text)
    
    答复:

    FINALLY Wireless Headphones that Sound GREAT
    https://www.youtube.com/watch?v=rei5vMQmD4Q
    Linus Tech Tips
    2020-01-30T20:04:37+00:00
    Don't give Apple your MONEY - Mac Pro Upgrade Adventure
    https://www.youtube.com/watch?v=zcLbSCinX3U
    Linus Tech Tips
    2020-01-29T19:59:56+00:00
    We got the Kick-Proof TV from China!
    https://www.youtube.com/watch?v=4eSADWuZskk
    Linus Tech Tips
    2020-01-28T19:46:09+00:00
    Everything went wrong... Water Cooled 8K Camera Final Test
    https://www.youtube.com/watch?v=OEUCNh5g-2I
    Linus Tech Tips
    2020-01-27T20:08:27+00:00
    I'm Returning my Mac Pro
    https://www.youtube.com/watch?v=mIB389tqzCI
    Linus Tech Tips
    2020-01-26T19:59:45+00:00
    The RGB HDMI cable ISN'T as dumb as you'd think...
    https://www.youtube.com/watch?v=nva6oPszm60
    Linus Tech Tips
    2020-01-25T20:06:23+00:00
    I am NOT Retiring... yet - WAN Show Jan 24, 2020
    https://www.youtube.com/watch?v=cxjhTVR_dJw
    Linus Tech Tips
    2020-01-25T02:29:50+00:00
    The Best VR Headset... got BETTER!?
    https://www.youtube.com/watch?v=AGScX_8plYw
    Linus Tech Tips
    2020-01-23T19:52:00+00:00
    I've been thinking of retiring.
    https://www.youtube.com/watch?v=hAsZCTL__lo
    Linus Tech Tips
    2020-01-23T06:35:25+00:00
    It’s time to upgrade your GPU - RX 5600 XT
    https://www.youtube.com/watch?v=rKn-vWDMkwQ
    Linus Tech Tips
    2020-01-22T19:59:36+00:00
    WE FINALLY DID IT!! - Water Cooling the 8K Camera!
    https://www.youtube.com/watch?v=imJ9QgOJHzY
    Linus Tech Tips
    2020-01-21T19:59:47+00:00
    We Water Cooled an SSD!!
    https://www.youtube.com/watch?v=lQmI5A27Iv8
    Linus Tech Tips
    2020-01-20T20:17:22+00:00
    Should you buy a $50 CPU??
    https://www.youtube.com/watch?v=JISJ_YTI9s0
    Linus Tech Tips
    2020-01-19T20:19:02+00:00
    Apple’s Pro Display XDR – A PC Guy’s Perspective
    https://www.youtube.com/watch?v=X089oYPc5Pg
    Linus Tech Tips
    2020-01-18T19:59:29+00:00
    The NSA is Giving Out It's Hacks for Free! - WAN Show Jan 17, 2020
    https://www.youtube.com/watch?v=af6FBA-n7eA
    Linus Tech Tips
    2020-01-18T03:00:04+00:00
    

    但是,请记住,在您的请求中使用标题,并注意一次点击YouTube后端的次数过多,因为您的IP将被暂停12小时。祝你好运

    什么是“太多”?一分钟一次好吗?一分钟一次就够了。我想说的是,每秒12到16次的请求会让你感到痛苦。现在,当您刚开始工作时,这似乎已经足够了,但是当您正在进行一些繁重的数据采集时,您必须对旋转代理进行创新。享受网页抓取的乐趣!什么是“太多”?一分钟一次好吗?一分钟一次就够了。我想说的是,每秒12到16次的请求会让你感到痛苦。现在,当您刚开始工作时,这似乎已经足够了,但是当您正在进行一些繁重的数据采集时,您必须对旋转代理进行创新。享受网页抓取的乐趣!