Python 为什么可以';我不能从url获取曲目标题吗?
我正在尝试编写一个python脚本,它使用BeautifulSoup从中获取曲目标题。我希望能够输出: 391106-布鲁斯·帕丁顿计划 400311-退休色彩师 但是我找不到标签。这是我的剧本:Python 为什么可以';我不能从url获取曲目标题吗?,python,html,regex,beautifulsoup,html-parsing,Python,Html,Regex,Beautifulsoup,Html Parsing,我正在尝试编写一个python脚本,它使用BeautifulSoup从中获取曲目标题。我希望能够输出: 391106-布鲁斯·帕丁顿计划 400311-退休色彩师 但是我找不到标签。这是我的剧本: #!/usr/bin/env python import getopt, sys # screen scraping stuff import urllib2 import re from bs4 import BeautifulSoup def usage ( msg ):
#!/usr/bin/env python
import getopt, sys
# screen scraping stuff
import urllib2
import re
from bs4 import BeautifulSoup
def usage ( msg ):
print """
usage: get_titles_sherlockholmes_basil.py
%s
""" % ( msg )
#end usage
def output_html ( url ):
soup = BeautifulSoup(urllib2.urlopen( url ).read())
#title = soup.find_all("div", class_="ttl")
#titles = soup.find_all(class_="ttl")
#titles = soup.find_all('<div class="ttl">')
#titles = soup.select("div.ttl")
#titles = soup.find_all("div", attrs={"class": "ttl"})
#titles = soup.find_all("div", class_="jwrow")
#titles = soup.find_all("div", id="jw6_list")
titles = soup.find_all(id="jw6_list")
for title in titles:
print "%s <br>\n" % title
# end output_html
url = 'http://archive.org/details/HQSherlockRathboneTCS'
output_html ( url )
print "<br>-------------------<br>"
sys.exit()
#/usr/bin/env python
导入getopt,sys
#刮网材料
导入urllib2
进口稀土
从bs4导入BeautifulSoup
def使用量(msg):
打印“”
用法:get_titles_sherlockholmes_basil.py
%
“”“%(msg)
#最终用途
def输出_html(url):
soup=BeautifulSoup(urlib2.urlopen(url.read())
#标题=汤。查找所有(“div”,class=“ttl”)
#标题=汤。查找所有(class=“ttl”)
#标题=汤。查找所有(“”)
#标题=汤。选择(“div.ttl”)
#titles=soup.find_all(“div”,attrs={“class”:“ttl”})
#标题=汤。查找所有(“div”,class=“jwrow”)
#titles=soup.find_all(“div”,id=“jw6_list”)
titles=soup.find_all(id=“jw6_list”)
标题中的标题:
打印“%s
\n”%title
#结束输出\u html
url='1〕http://archive.org/details/HQSherlockRathboneTCS'
输出html(url)
打印“
--------------
”
sys.exit()
我知道我做错了什么。感谢您的帮助 问题在于,播放列表是在浏览器中借助javascript形成的。实际曲目列表位于javascript数组中的
脚本标记内:
<script type="text/javascript">
Play('jw6',
[{"title":"1. 391106 - Bruce-Partington Plans","image":"/download/HQSherlockRathboneTCS/391106.png","duration":1764,"sources":[{"file":"/download/HQSherlockRathboneTCS/391106.mp3","type":"mp3","height":"0","width":"0"}],"tracks":[{"file":"https://archive.org/stream/HQSherlockRathboneTCS/391106.png&vtt=vtt.vtt","kind":"thumbnails"}]},
{"title":"2. 400311 - The Retired Colourman","image":"/download/HQSherlockRathboneTCS/400311.png","duration":1755,"sources":[{"file":"/download/HQSherlockRathboneTCS/400311.mp3","type":"mp3","height":"0","width":"0"}],"tracks":[{"file":"https://archive.org/stream/HQSherlockRathboneTCS/400311.png&vtt=vtt.vtt","kind":"thumbnails"}]},
...
{"title":"32. 460204 - The Cross of Damascus","image":"/download/HQSherlockRathboneTCS/460204.png","duration":"1720.07","sources":[{"file":"/download/HQSherlockRathboneTCS/460204.mp3","type":"mp3","height":"0","width":"0"}],"tracks":[{"file":"https://archive.org/stream/HQSherlockRathboneTCS/460204.png&vtt=vtt.vtt","kind":"thumbnails"}]}],
{"start":0,"embed":null,"so":false,"autoplay":false,"width":0,"height":0,"audio":true,"responsive":true,"expand4wideVideos":false,"flash":false,"startPlaylistIdx":0,"identifier":"HQSherlockRathboneTCS","collection":"oldtimeradio","waveformer":"jw-holder","hide_list":false});
</script>
印刷品:
1. 391106 - Bruce-Partington Plans
2. 400311 - The Retired Colourman
3. 440515 - Adventure Of The Missing Bloodstain
4. 450326 - The Book of Tobit
5. 450402 - The Amateur Mendicant Society
...
30. 460121 - Telltale Pigeon Feathers
31. 460128 - Sweeney Todd, Demon Barber
32. 460204 - The Cross of Damascus
1. 391106 - Bruce-Partington Plans
2. 400311 - The Retired Colourman
3. 440515 - Adventure Of The Missing Bloodstain
4. 450326 - The Book of Tobit
5. 450402 - The Amateur Mendicant Society
...
30. 460121 - Telltale Pigeon Feathers
31. 460128 - Sweeney Todd, Demon Barber
32. 460204 - The Cross of Damascus