Python 如何从YouTube搜索中抓取视频？_Python_Selenium

Python 如何从YouTube搜索中抓取视频？

python selenium

Python 如何从YouTube搜索中抓取视频？,python,selenium,Python,Selenium,我想搜索一个特定的关键字，然后刮所有的视频网址我知道我要粘贴的代码不会这样做，但我想展示我所做的 chrome_path = r"C:\Users\Admin\Documents\chromedriver\chromedriver.exe" driver = webdriver.Chrome(chrome_path) driver.get("https://www.youtube.com/results?sp=CAISAggBUBQ%253D&q=minecraft") links

我想搜索一个特定的关键字，然后刮所有的视频网址

我知道我要粘贴的代码不会这样做，但我想展示我所做的

chrome_path = r"C:\Users\Admin\Documents\chromedriver\chromedriver.exe"
driver = webdriver.Chrome(chrome_path)
driver.get("https://www.youtube.com/results?sp=CAISAggBUBQ%253D&q=minecraft")

links = driver.find_elements_by_partial_link_text('/watch')
for link in links:
    links = (links.get_attribute("href"))

如何将链接刮取并保存到文件中？

这是您的代码，它提供了视频的标题和url 轻松：）

事实上，您不应该从youtube.com/results中获取结果。您必须先检查robots.txt，然后才能删除任何网站。要了解更多关于robots.txt的信息，请阅读此wiki页面

这是youtube的robots.txt文件

不过，您还有另一个选择，可以使用youtube搜索API

此脚本使用

urllib

从YouTube结果的第一页提取结果，并通过使用

BeautifulSoup

解析页面来打印视频的所有链接（如果您使用的是python 3.*则安装

BeautifulSoup4

）

看看ID<代码>项目段-897216，所有的链接都会考虑更新你的问题，你想自动完成哪些步骤？在这个“URL”中，我找不到任何

部分链接文本

，而

链接文本

设置为

/watch

。我只找到了

以后再看

。但我认为这不是你想要的。谢谢，从长远来看，使用他们优秀的API可能会让你过得更好。比使用硒等更快。很多关于如何做到这一点的帖子。看看这个例子：鉴于谷歌不尊重robots.txt，冒充他们的爬虫的用户代理等等，我想说你不尊重他们的robots.txt.Name是合乎道德的。你好，我发现这已经不起作用了。我不是编码员。你知道我如何解决/更新它吗？@MateoCriado我想如果你的错误是不完整的结果，那么这就不再有效了，因为当用户滚动时，搜索结果页面会在底部自动加载更多的结果。今晚晚些时候我会调试它

from bs4 import BeautifulSoup
import urllib.request


def SearchVid(search):
    responce = urllib.request.urlopen('https://www.youtube.com/results?search_query='+search)

    soup = BeautifulSoup(responce)    
    divs = soup.find_all("div", { "class" : "yt-lockup-content"})


    for i in divs:
        href= i.find('a', href=True)
        print(href.text,  "\nhttps://www.youtube.com"+href['href'], '\n')
        with open(SearchString.replace("%20", "_")+'.txt', 'a') as writer:
            writer.write("https://www.youtube.com"+href['href']+'\n')

print("What are you looking for?")
SearchString = input()
SearchVid(SearchString.replace(" ", "%20"))

import urllib.request
from bs4 import BeautifulSoup

textToSearch = 'python tutorials'
query = urllib.parse.quote(textToSearch)
url = "https://www.youtube.com/results?search_query=" + query
response = urllib.request.urlopen(url)
html = response.read()
soup = BeautifulSoup(html, 'html.parser')
for vid in soup.findAll(attrs={'class':'yt-uix-tile-link'}):
    if not vid['href'].startswith("https://googleads.g.doubleclick.net/"):
        print('https://www.youtube.com' + vid['href'])