Web scraping 如何首先从特定频道抓取新的YouTube视频?
我有一个频道,我想从它得到一个新的上传视频尽快。最好的方法是什么?我知道有两个选择:Web scraping 如何首先从特定频道抓取新的YouTube视频?,web-scraping,youtube,Web Scraping,Youtube,我有一个频道,我想从它得到一个新的上传视频尽快。最好的方法是什么?我知道有两个选择: 使用YouTube API 直接通过url访问频道 使用选项1,我需要调用api以获得视频列表。既然有了配额,我想我将无法进行api调用。我认为选项2是最好的选择,因为我可以随时调用url 是否首先通过api提供新视频?或者,根据用户来自的地区,通过url访问的视频是否在不同的时间提供给用户?我自己建立了一个url刮刀。我每分钟访问一次url。还有人在我拿到录像带前8分钟拿到了录像带。我不明白为什么会出现这种情
是否首先通过api提供新视频?或者,根据用户来自的地区,通过url访问的视频是否在不同的时间提供给用户?我自己建立了一个url刮刀。我每分钟访问一次url。还有人在我拿到录像带前8分钟拿到了录像带。我不明白为什么会出现这种情况。您可以尝试RSS订阅感兴趣的频道。它包含带有UTC时间戳的新视频(因此您提到的时区没有问题)
该频道视频的RSS链接可以在该频道页面的源位置找到。打开页面源代码并搜索“rssUrl”:您可以尝试RSS提要以获取感兴趣的频道。它包含带有UTC时间戳的新视频(因此您提到的时区没有问题)
该频道视频的RSS链接可以在该频道页面的源位置找到。打开页面源代码并搜索“rssUrl”:只需在MadRay写的内容基础上做一点扩展,就可以使用此URL进行一些简单的字符串替换 使用通道ID:
"https://www.youtube.com/feeds/videos.xml?channel_id=UCXuqSBlHAE6Xw-yeJA0Tunw"
使用频道名称:
https://www.youtube.com/feeds/videos.xml?user=LinusTechTips
我冒昧地帮你分析了一下
from bs4 import BeautifulSoup
import requests
url="https://www.youtube.com/feeds/videos.xml?user=LinusTechTips"
html = requests.get(url)
soup = BeautifulSoup(html.text, "lxml")
for entry in soup.find_all("entry"):
for title in entry.find_all("title"):
print(title.text)
for link in entry.find_all("link"):
print(link["href"])
for name in entry.find_all("name"):
print(name.text)
for pub in entry.find_all("published"):
print(pub.text)
答复:
FINALLY Wireless Headphones that Sound GREAT
https://www.youtube.com/watch?v=rei5vMQmD4Q
Linus Tech Tips
2020-01-30T20:04:37+00:00
Don't give Apple your MONEY - Mac Pro Upgrade Adventure
https://www.youtube.com/watch?v=zcLbSCinX3U
Linus Tech Tips
2020-01-29T19:59:56+00:00
We got the Kick-Proof TV from China!
https://www.youtube.com/watch?v=4eSADWuZskk
Linus Tech Tips
2020-01-28T19:46:09+00:00
Everything went wrong... Water Cooled 8K Camera Final Test
https://www.youtube.com/watch?v=OEUCNh5g-2I
Linus Tech Tips
2020-01-27T20:08:27+00:00
I'm Returning my Mac Pro
https://www.youtube.com/watch?v=mIB389tqzCI
Linus Tech Tips
2020-01-26T19:59:45+00:00
The RGB HDMI cable ISN'T as dumb as you'd think...
https://www.youtube.com/watch?v=nva6oPszm60
Linus Tech Tips
2020-01-25T20:06:23+00:00
I am NOT Retiring... yet - WAN Show Jan 24, 2020
https://www.youtube.com/watch?v=cxjhTVR_dJw
Linus Tech Tips
2020-01-25T02:29:50+00:00
The Best VR Headset... got BETTER!?
https://www.youtube.com/watch?v=AGScX_8plYw
Linus Tech Tips
2020-01-23T19:52:00+00:00
I've been thinking of retiring.
https://www.youtube.com/watch?v=hAsZCTL__lo
Linus Tech Tips
2020-01-23T06:35:25+00:00
It’s time to upgrade your GPU - RX 5600 XT
https://www.youtube.com/watch?v=rKn-vWDMkwQ
Linus Tech Tips
2020-01-22T19:59:36+00:00
WE FINALLY DID IT!! - Water Cooling the 8K Camera!
https://www.youtube.com/watch?v=imJ9QgOJHzY
Linus Tech Tips
2020-01-21T19:59:47+00:00
We Water Cooled an SSD!!
https://www.youtube.com/watch?v=lQmI5A27Iv8
Linus Tech Tips
2020-01-20T20:17:22+00:00
Should you buy a $50 CPU??
https://www.youtube.com/watch?v=JISJ_YTI9s0
Linus Tech Tips
2020-01-19T20:19:02+00:00
Apple’s Pro Display XDR – A PC Guy’s Perspective
https://www.youtube.com/watch?v=X089oYPc5Pg
Linus Tech Tips
2020-01-18T19:59:29+00:00
The NSA is Giving Out It's Hacks for Free! - WAN Show Jan 17, 2020
https://www.youtube.com/watch?v=af6FBA-n7eA
Linus Tech Tips
2020-01-18T03:00:04+00:00
但是,请记住,在您的请求中使用标题,并注意一次点击YouTube后端的次数过多,因为您的IP将被暂停12小时。祝你好运 只需在MadRay写的内容基础上做一点扩展,您就可以使用此URL进行一些简单的字符串替换 使用通道ID:
"https://www.youtube.com/feeds/videos.xml?channel_id=UCXuqSBlHAE6Xw-yeJA0Tunw"
使用频道名称:
https://www.youtube.com/feeds/videos.xml?user=LinusTechTips
我冒昧地帮你分析了一下
from bs4 import BeautifulSoup
import requests
url="https://www.youtube.com/feeds/videos.xml?user=LinusTechTips"
html = requests.get(url)
soup = BeautifulSoup(html.text, "lxml")
for entry in soup.find_all("entry"):
for title in entry.find_all("title"):
print(title.text)
for link in entry.find_all("link"):
print(link["href"])
for name in entry.find_all("name"):
print(name.text)
for pub in entry.find_all("published"):
print(pub.text)
答复:
FINALLY Wireless Headphones that Sound GREAT
https://www.youtube.com/watch?v=rei5vMQmD4Q
Linus Tech Tips
2020-01-30T20:04:37+00:00
Don't give Apple your MONEY - Mac Pro Upgrade Adventure
https://www.youtube.com/watch?v=zcLbSCinX3U
Linus Tech Tips
2020-01-29T19:59:56+00:00
We got the Kick-Proof TV from China!
https://www.youtube.com/watch?v=4eSADWuZskk
Linus Tech Tips
2020-01-28T19:46:09+00:00
Everything went wrong... Water Cooled 8K Camera Final Test
https://www.youtube.com/watch?v=OEUCNh5g-2I
Linus Tech Tips
2020-01-27T20:08:27+00:00
I'm Returning my Mac Pro
https://www.youtube.com/watch?v=mIB389tqzCI
Linus Tech Tips
2020-01-26T19:59:45+00:00
The RGB HDMI cable ISN'T as dumb as you'd think...
https://www.youtube.com/watch?v=nva6oPszm60
Linus Tech Tips
2020-01-25T20:06:23+00:00
I am NOT Retiring... yet - WAN Show Jan 24, 2020
https://www.youtube.com/watch?v=cxjhTVR_dJw
Linus Tech Tips
2020-01-25T02:29:50+00:00
The Best VR Headset... got BETTER!?
https://www.youtube.com/watch?v=AGScX_8plYw
Linus Tech Tips
2020-01-23T19:52:00+00:00
I've been thinking of retiring.
https://www.youtube.com/watch?v=hAsZCTL__lo
Linus Tech Tips
2020-01-23T06:35:25+00:00
It’s time to upgrade your GPU - RX 5600 XT
https://www.youtube.com/watch?v=rKn-vWDMkwQ
Linus Tech Tips
2020-01-22T19:59:36+00:00
WE FINALLY DID IT!! - Water Cooling the 8K Camera!
https://www.youtube.com/watch?v=imJ9QgOJHzY
Linus Tech Tips
2020-01-21T19:59:47+00:00
We Water Cooled an SSD!!
https://www.youtube.com/watch?v=lQmI5A27Iv8
Linus Tech Tips
2020-01-20T20:17:22+00:00
Should you buy a $50 CPU??
https://www.youtube.com/watch?v=JISJ_YTI9s0
Linus Tech Tips
2020-01-19T20:19:02+00:00
Apple’s Pro Display XDR – A PC Guy’s Perspective
https://www.youtube.com/watch?v=X089oYPc5Pg
Linus Tech Tips
2020-01-18T19:59:29+00:00
The NSA is Giving Out It's Hacks for Free! - WAN Show Jan 17, 2020
https://www.youtube.com/watch?v=af6FBA-n7eA
Linus Tech Tips
2020-01-18T03:00:04+00:00
但是,请记住,在您的请求中使用标题,并注意一次点击YouTube后端的次数过多,因为您的IP将被暂停12小时。祝你好运 什么是“太多”?一分钟一次好吗?一分钟一次就够了。我想说的是,每秒12到16次的请求会让你感到痛苦。现在,当您刚开始工作时,这似乎已经足够了,但是当您正在进行一些繁重的数据采集时,您必须对旋转代理进行创新。享受网页抓取的乐趣!什么是“太多”?一分钟一次好吗?一分钟一次就够了。我想说的是,每秒12到16次的请求会让你感到痛苦。现在,当您刚开始工作时,这似乎已经足够了,但是当您正在进行一些繁重的数据采集时,您必须对旋转代理进行创新。享受网页抓取的乐趣!