Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/297.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
如何在Python中从视频链接下载音频_Python_Python 3.x_Audio_Beautifulsoup - Fatal编程技术网

如何在Python中从视频链接下载音频

如何在Python中从视频链接下载音频,python,python-3.x,audio,beautifulsoup,Python,Python 3.x,Audio,Beautifulsoup,我有一个我正在迭代的链接列表,如下所示 https://www.loc.gov/item/2015669100/ https://www.loc.gov/item/2015669101/ https://www.loc.gov/item/2015669102/ https://www.loc.gov/item/2015669103/ https://www.loc.gov/item/2015669104/ https://www.loc.gov/item/2015669105/ https://

我有一个我正在迭代的链接列表,如下所示

https://www.loc.gov/item/2015669100/
https://www.loc.gov/item/2015669101/
https://www.loc.gov/item/2015669102/
https://www.loc.gov/item/2015669103/
https://www.loc.gov/item/2015669104/
https://www.loc.gov/item/2015669105/
https://www.loc.gov/item/2015669106/
https://www.loc.gov/item/2015669107/
https://www.loc.gov/item/2015669108/
https://www.loc.gov/item/2015669109/
如果您查看这些链接,您可以看到它有一个视频和一个可下载的XML文件。我的任务是从视频中下载音频,并从一个页面下载XML文件

我的问题是,如何从这些音频文件中获取音频

下面是我目前的代码

from bs4 import BeautifulSoup
from urllib.request import Request, urlopen

base_html = "https://www.loc.gov/collections/civil-rights-history-project/?sp={}"

for i in range(1,8):
    html = base_html.format(i)
    req = Request(html, headers={'User-Agent': 'Mozilla/5.0'})
    soup = BeautifulSoup(urlopen(req).read(), 'html.parser')
    
    pages = soup.findAll('div', attrs={'class' : 'item-description'})
    for div in pages:
            crawl_p = div.find('a')['href']
            #some logic here


查看该站点,它看起来像是通过传输段(.ts文件)从m3u8 url传输的视频和音频文件

包含m3u8流的url。使用标记中的属性标识标记。(这里是类型属性)

这将删除m3u8 url,
https://tile.loc.gov/streaming-services/iiif/service:afc:afc2010039:afc2010039_crhp0001:afc2010039_crhp0001_mv04/full/full/0/full/default.m3u8

ffmpeg可以从m3u8文件下载流(视频或音频)。它也可以从python运行

subprocess.call(['ffmpeg','-i',m3u8_url,'-vn','-map','a','output.ts' ])
不要忘记用
[]
包装命令。引号内的每个单词表示一个空格分隔的命令。完整的ffmpeg命令是
ffmpeg-im3u8\uurl-vn-map a output.ts

这是完整的代码。只需确保包含ffmpeg path变量,否则子进程将抛出错误。文件大小很大,因此下载音频文件可能需要一些时间

import subprocess
from bs4 import BeautifulSoup
import requests as r

sess = r.session()
site_url = "https://www.loc.gov/item/2015669100/"
request = sess.get(site_url)
#print(request.content)

soup = BeautifulSoup(request.content, 'html5lib')
m3u8_url = soup.find('source',attrs={'type' : 'application/x-mpegURL'})['src']
print(str(m3u8_url))

subprocess.call(['ffmpeg','-i',m3u8_url,'-vn','-map','a','output.ts' ])

欢迎来到堆栈溢出!请花一分钟阅读?你的研究成果在哪里?你试过谷歌搜索解决方案吗?如果是,您尝试实施了什么,哪里出了问题?
import subprocess
from bs4 import BeautifulSoup
import requests as r

sess = r.session()
site_url = "https://www.loc.gov/item/2015669100/"
request = sess.get(site_url)
#print(request.content)

soup = BeautifulSoup(request.content, 'html5lib')
m3u8_url = soup.find('source',attrs={'type' : 'application/x-mpegURL'})['src']
print(str(m3u8_url))

subprocess.call(['ffmpeg','-i',m3u8_url,'-vn','-map','a','output.ts' ])