Python 从播放列表中删除视频标题_Python_Web Scraping

Python 从播放列表中删除视频标题

python web-scraping

Python 从播放列表中删除视频标题,python,web-scraping,Python,Web Scraping,我写了一个从YouTube音乐播放列表中收集视频标题的刮刀，因为有时候视频会被删除。我是python新手。我通过一篇文章编写了代码：我在许多网站上检查了代码的功能（通过更改链接、标签和类），一切都正常，但不知怎么的，YouTube上没有如何从播放列表中获取视频标题 import requests from bs4 import BeautifulSoup url = 'https://www.youtube.com/playlist?list=PLuDh46ey2oy-qmIqPH0o1Z

我写了一个从YouTube音乐播放列表中收集视频标题的刮刀，因为有时候视频会被删除。我是python新手。我通过一篇文章编写了代码：

我在许多网站上检查了代码的功能（通过更改链接、标签和类），一切都正常，但不知怎么的，YouTube上没有

如何从播放列表中获取视频标题

import requests
from bs4 import BeautifulSoup

url = 'https://www.youtube.com/playlist?list=PLuDh46ey2oy-qmIqPH0o1ZUZ9BFuqvtBn'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'lxml')
quotes = soup.find_all('a', class_='yt-simple-endpoint style-scope ytd-playlist-video-renderer')

for quote in quotes:
    print(quote.text)

可能您已经阅读了堆栈溢出，正如您所提到的一样，YouTube使用JavaScript，所以您可以试用

selenium

软件包，它提供了自动化浏览器的功能，您可以从中提取数据，以获取更多您可以阅读的内容

代码如下：

from selenium import webdriver

path="you're path of driver"
driver=webdriver.Chrome(path)

url = 'https://www.youtube.com/playlist?list=PLuDh46ey2oy-qmIqPH0o1ZUZ9BFuqvtBn'
response = driver.get(url)    

main_a=driver.find_elements_by_id("video-title")
lst=[]

for a in main_a:
    lst.append(a.get_attribute("aria-label"))
print(lst)

代码不产生任何结果的主要原因是soup=BeautifulSoup（response.text，'lxml'）不包含标记

您可以使用打印（soup.prettify（））检查此问题。

我建议使用提取播放列表标题

import re
from pytube import YouTube
from pytube import Playlist


playlist = Playlist("https://www.youtube.com/playlist?list=PLuDh46ey2oy-qmIqPH0o1ZUZ9BFuqvtBn")
playlist._video_regex = re.compile(r"\"url\":\"(/watch\?v=[\w-]*)")
print('Number of videos in playlist: %s' % len(playlist.video_urls))
for url in playlist.video_urls:
    yt = YouTube(url)
    print(yt.title)
    #output
    Busta Rhymes - Touch It (TikTok Remix) Lyrics | touch it clean busta rhymes remix tik tok
    Shaggy - Boombastic (Official Music Video)
    Nobody - Mitski (slowed + reverb)
    Mitski - Nobody (Official Video)
    ...truncated

这个答案是Bhavya Parikh答案的一个增强，它使用硒。下面的代码添加了我在评论Bhavya的答案时提到的滚动特性

向下滚动页面有几种方法，答案显示了其中一种方法。代码也使用了headless模式，因此Chrome浏览器窗口不会显示

from time import sleep

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.chrome.options import Options


chrome_options = Options()
chrome_options.add_argument("--disable-infobars")
chrome_options.add_argument("--disable-extensions")
chrome_options.add_argument("--disable-popup-blocking")
chrome_options.add_argument('--headless')

# window size as an argument is required in headless mode
chrome_options.add_argument('window-size=1920x1080')

# Hide the "Chrome is being controlled by automated test software" banner
chrome_options.add_experimental_option("useAutomationExtension", False)
chrome_options.add_experimental_option("excludeSwitches", ['enable-automation'])

driver = webdriver.Chrome('/usr/local/bin/chromedriver', options=chrome_options)

url = 'https://www.youtube.com/playlist?list=PLuDh46ey2oy-qmIqPH0o1ZUZ9BFuqvtBn'
response = driver.get(url)
driver.implicitly_wait(15)

# finds the body tag
elem = driver.find_element_by_tag_name("body")

# you can also use the html tag
# elem = driver.find_element_by_tag_name("html")

no_of_pagedowns = 100
while no_of_pagedowns:
    elem.send_keys(Keys.PAGE_DOWN)
    sleep(0.2)
    no_of_pagedowns -= 1

title_tags = driver.find_elements_by_id("video-title")
video_titles = []
for title_tag in title_tags:
    video_titles.append(title_tag.get_attribute("aria-label"))

# do something with the list of titles.

driver.close()

谢谢，但出于某种原因，它只返回前100个标题好吧，让我检查一下，谢谢suggestion@BhavyaParikh你只得到100个项目的原因是你需要向下滚动到播放列表的末尾。你太棒了@MaxMasendych谢谢！