Python Youtube Web刮板无法正常工作_Python_Web Scraping_Youtube

Python Youtube Web刮板无法正常工作

python web-scraping youtube

Python Youtube Web刮板无法正常工作,python,web-scraping,youtube,Python,Web Scraping,Youtube,因此，我构建了一个小脚本，可以返回youtube上任何搜索视频的URL。但在再次打开它之后，发现用youtube抓取网页的工作不正常。当打印soup时，它返回的内容与Youtube上的inspect元素完全不同。有人能帮我解决这个问题吗。。。这是我的密码： import requests from lxml import html import webbrowser from bs4 import BeautifulSoup import time import tkinter from py

因此，我构建了一个小脚本，可以返回youtube上任何搜索视频的URL。但在再次打开它之后，发现用youtube抓取网页的工作不正常。当打印

soup

时，它返回的内容与Youtube上的inspect元素完全不同。有人能帮我解决这个问题吗。。。这是我的密码：

import requests
from lxml import html
import webbrowser
from bs4 import BeautifulSoup
import time
import tkinter
from pytube import YouTube

headers= {"User-Agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.162 Safari/537.36"}

def video_finder():
    word = input("Enter video title: ")
    if ' ' in word:
        new = word.replace(' ', '+')
        print(new)
    else:
        pass

    vid = requests.get('https://www.youtube.com/results?search_query={}'.format(new))
    soup = BeautifulSoup(vid.text, features='lxml')
    all_vids = soup.find_all('div', id_='contents')
    print(all_vids)
    video1st = all_vids[0]
    a_Tag = video1st.find('a', class_="yt-uix-tile-link yt-ui-ellipsis yt-ui-ellipsis-2 yt-uix-sessionlink spf-link", href=True)
    Video_name = a_Tag.text
    Video_id = a_Tag['href']
    video_link = 'https://www.youtube.com' + Video_id
    print(Video_name)
    print(video_link)

这不是最好的，但你。。。感谢您

要从Youtube页面获得正确的结果，请将

用户代理

HTTP头设置为Googlebot，并在BeautifulSoup中使用

html.parser

例如：

import requests
from bs4 import BeautifulSoup


headers= {"User-Agent": "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"}
def video_finder():
    word = input("Enter video title: ")

    params = {
        'search_query': word
    }

    vid = requests.get('https://www.youtube.com/results', params=params, headers=headers)
    soup = BeautifulSoup(vid.content, features='html.parser')
    a_Tag = soup.find('a', class_="yt-uix-tile-link yt-ui-ellipsis yt-ui-ellipsis-2 yt-uix-sessionlink spf-link", href=lambda h: h.startswith('/watch?'))
    Video_name = a_Tag.text
    Video_id = a_Tag['href']
    video_link = 'https://www.youtube.com' + Video_id
    print(Video_name)
    print(video_link)

video_finder()

印刷品：

Enter video title: sailor moon
Sailor Moon Opening (English) *HD*
https://www.youtube.com/watch?v=5txHGxJRwtQ

是最好的工具。您正在导入“pytube”，但您并没有真正使用它-您应该查看他们的手册。@avloss，是的，因为它在代码的后半部分，而我只提供了我遇到问题的部分。非常感谢，这个解决方案工作得非常好。所以我修改了自己的代码，只需要修改标题。你能解释一下原因吗？@RutvikKarupothula一些服务器根据这个头返回不同的HTML代码——例如，它们正在保护自己免受机器人攻击等等。