Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/326.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 跳过BeautifulSoup中的空行_Python_Beautifulsoup - Fatal编程技术网

Python 跳过BeautifulSoup中的空行

Python 跳过BeautifulSoup中的空行,python,beautifulsoup,Python,Beautifulsoup,我目前正在尝试使用BeautifulSoup从1001TrackList(一个在DJ混音中列出曲目的网站)中收集数据 如果混音中的曲目没有ID,1001TrackLists会将其保留为数据表上的“ID-ID”,该数据表在已删除的代码中显示为空白条目,并将我的for循环弄乱 如何让Python跳过曲目列表中的“空白”ID,并在空白ID之后继续抓取数据 到目前为止,我的代码是: headers = {'User-Agent': 'Chrome/51.0.2704.103'} page_link

我目前正在尝试使用BeautifulSoup从1001TrackList(一个在DJ混音中列出曲目的网站)中收集数据

如果混音中的曲目没有ID,1001TrackLists会将其保留为数据表上的“ID-ID”,该数据表在已删除的代码中显示为空白条目,并将我的for循环弄乱

如何让Python跳过曲目列表中的“空白”ID,并在空白ID之后继续抓取数据

到目前为止,我的代码是:


headers = {'User-Agent': 'Chrome/51.0.2704.103'}
page_link  = 'https://www.1001tracklists.com/tracklist/7mzt0y9/boddika-joy-orbison-rinse-fm-hessle-audio-cover-show-2014-01-16.html'
page_response = requests.get(page_link, headers=headers)
soup = bs(page_response.content, "html.parser")

tracknumbers = []
tracknames = []
artistnames = []
mixnames = []
dates = []


tracknames_scrape = soup.find_all("div", class_="tlToogleData", div=True)
artistnames_scrape = soup.find_all("meta", itemprop="byArtist")

for (i, track) in enumerate(tracknames_scrape):
    tracknumbers.append(i+1)
    trackname = track.meta['content']
    tracknames.append(trackname)
    print(str(i+1) + str(". ") + trackname)

目前,我能够返回所有曲目,直到我点击一个空白条目,之后我会出现以下错误:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-9-de6ecd3caa59> in <module>
      1 for (i, track) in enumerate(tracknames_scrape):
      2     tracknumbers.append(i+1)
----> 3     trackname = track.meta['content']

TypeError: 'NoneType' object is not subscriptable
---------------------------------------------------------------------------
TypeError回溯(最近一次调用上次)
在里面
枚举(tracknames\u scrap)中的(i,track)为1:
2个轨号。追加(i+1)
---->3 trackname=track.meta['content']
TypeError:“非类型”对象不可下标

如果我使用一个没有空白曲目ID的URL,脚本就可以完美地工作。

使用下面的css选择器获取曲目名称

import requests
from bs4 import BeautifulSoup as bs
headers = {'User-Agent': 'Chrome/51.0.2704.103'}
page_link  = 'https://www.1001tracklists.com/tracklist/7mzt0y9/boddika-joy-orbison-rinse-fm-hessle-audio-cover-show-2014-01-16.html'
page_response = requests.get(page_link, headers=headers)
soup = bs(page_response.content, "html.parser")

tracknumbers = []
tracknames = []
artistnames = []
mixnames = []
dates = []


tracknames_scrape =soup.select('div[itemprop="tracks"]>[itemprop="name"]')
#artistnames_scrape = soup.find_all("meta", itemprop="byArtist")

for (i, track) in enumerate(tracknames_scrape):
    tracknumbers.append(i+1)
    trackname = track['content']
    tracknames.append(trackname)
    print(str(i+1) + str(". ") + trackname)
输出

1. Soft Machine - Snodland
2. Craig Leon - The Customs Of The Age Disturbed
3. Seven Davis Jr. - Thanks
4. Gadi Mizrahi - I'll Set Your House
5. Baby Ford & The iFach Collective - Word For Word
6. Panzer Knacker - Rollin' On The Side Of Psycho
7. 69 - Poi Beats
8. Midi Rain - Shine (DJ Pierre Chicago House Mix)
9. Sunpeople - Check Your Buddha (Sven Väth Remix)
10. Eduardo De La Calle - Madhusudhana
11. Aardvarck - The Antdance
12. Boddika & Joy Orbison - In Here
13. Mike Parker - Lustrations Eight (Contours)
14. Peter Van Hoesen - Axis Mundi
15. Sleeparchive - Bleep 01
16. Conforce - When It Appeared
17. Brommage Dub - Fettwise
18. Matrixxman - Protocol
19. JuJu & Jordash - Powwow
20. Gesloten Cirkel - Yamagic
21. Mike Dehnert - Mischkaa
22. Jerome Sydenham & Joe Claussell - Rhythm
23. Ratchett Traxxx - Nut On U
24. Kenny Dope & Terry Hunter pres. Mass Destruction - No Hook
25. Radio Slave - Don't Stop No Sleep
26. Truncate - Focus
27. Maurizio - Domina (Maurizio Mix Edit)
28. Shed - Atmo - Action
29. AFX - Boxing Day
30. Boddika & Joy Orbison - More Maim

使用以下css选择器获取曲目名称

import requests
from bs4 import BeautifulSoup as bs
headers = {'User-Agent': 'Chrome/51.0.2704.103'}
page_link  = 'https://www.1001tracklists.com/tracklist/7mzt0y9/boddika-joy-orbison-rinse-fm-hessle-audio-cover-show-2014-01-16.html'
page_response = requests.get(page_link, headers=headers)
soup = bs(page_response.content, "html.parser")

tracknumbers = []
tracknames = []
artistnames = []
mixnames = []
dates = []


tracknames_scrape =soup.select('div[itemprop="tracks"]>[itemprop="name"]')
#artistnames_scrape = soup.find_all("meta", itemprop="byArtist")

for (i, track) in enumerate(tracknames_scrape):
    tracknumbers.append(i+1)
    trackname = track['content']
    tracknames.append(trackname)
    print(str(i+1) + str(". ") + trackname)
输出

1. Soft Machine - Snodland
2. Craig Leon - The Customs Of The Age Disturbed
3. Seven Davis Jr. - Thanks
4. Gadi Mizrahi - I'll Set Your House
5. Baby Ford & The iFach Collective - Word For Word
6. Panzer Knacker - Rollin' On The Side Of Psycho
7. 69 - Poi Beats
8. Midi Rain - Shine (DJ Pierre Chicago House Mix)
9. Sunpeople - Check Your Buddha (Sven Väth Remix)
10. Eduardo De La Calle - Madhusudhana
11. Aardvarck - The Antdance
12. Boddika & Joy Orbison - In Here
13. Mike Parker - Lustrations Eight (Contours)
14. Peter Van Hoesen - Axis Mundi
15. Sleeparchive - Bleep 01
16. Conforce - When It Appeared
17. Brommage Dub - Fettwise
18. Matrixxman - Protocol
19. JuJu & Jordash - Powwow
20. Gesloten Cirkel - Yamagic
21. Mike Dehnert - Mischkaa
22. Jerome Sydenham & Joe Claussell - Rhythm
23. Ratchett Traxxx - Nut On U
24. Kenny Dope & Terry Hunter pres. Mass Destruction - No Hook
25. Radio Slave - Don't Stop No Sleep
26. Truncate - Focus
27. Maurizio - Domina (Maurizio Mix Edit)
28. Shed - Atmo - Action
29. AFX - Boxing Day
30. Boddika & Joy Orbison - More Maim

我排除了我的导入,但是是的,除此之外,这里的所有内容都以isCan运行。您可以发布堆栈跟踪以显示代码失败的位置。只需在except子句中添加一个try expect with continue,该子句围绕
trackname=track.meta['content']
。或者,您可以在尝试访问之前检查曲目的类型,例如
如果track.meta不是None
在处理信息之前添加测试。类似于
if track.meta:
。您能发布所需输出的示例吗?我排除了我的导入,但是的,除此之外,这里的所有内容都在运行。您可以发布堆栈跟踪以显示代码失败的位置只需在
trackname=track.meta['content']
的except子句中添加一个try expect with continue。或者,您可以在尝试访问之前检查曲目的类型,例如
如果track.meta不是None
在处理信息之前添加测试。类似于
if track.meta:
。您可以发布一个所需输出的示例吗?