Dataframe beautifulsoup如果找不到元素,如何有目的地添加return none
如果找不到元素,如何故意添加Dataframe beautifulsoup如果找不到元素,如何有目的地添加return none,dataframe,web-scraping,beautifulsoup,Dataframe,Web Scraping,Beautifulsoup,如果找不到元素,如何故意添加[none]?我有一个元素,有时存在,有时不存在。() 低于df中的电流输出: name tag ZX Torsion Releasing Soon Campus Restock Campus Restock Consortium Runner Mid 4D Sold out Ozweego
[none]
?我有一个元素,有时存在,有时不存在。()
低于df中的电流输出:
name tag
ZX Torsion Releasing Soon
Campus Restock
Campus Restock
Consortium Runner Mid 4D Sold out
Ozweego Sold out
Ozweego Sold out
Yeezy Boost 350 V2 Infant Sold out
Yeezy Boost 350 V2 Kids Sold out
Yeezy Boost 350 V2 Sold out
Yung-1 Sold out
Yung 1 Sold out
A.R. Trainer Sold out
A.R. Trainer Sold out
期望输出
name tag
ZX Torsion Releasing Soon
Campus Restock
Campus Restock
Consortium Runner Mid 4D null
Ozweego null
Ozweego null
Yeezy Boost 350 V2 Infant Sold out
Yeezy Boost 350 V2 Kids Sold out
Yeezy Boost 350 V2 Sold out
Yung-1 null
Yung 1 null
A.R. Trainer null
A.R. Trainer null
....and so on
工作代码:
import requests
from bs4 import BeautifulSoup as bs
import pandas as pd
urls = [
'https://www.nakedcph.com/sneakers-by-adidas/s/37'
]
baseURL = 'https://www.nakedcph.com'
final = []
with requests.Session() as s:
for url in urls:
driver = webdriver.Chrome('/Users/Documents/python/Selenium/bin/chromedriver')
driver.get(url)
soup = bs(driver.page_source, 'lxml')
items = soup.findAll("div", {"class" : lambda L: L and L.startswith('col-6 col-md-3 mb-5')})
name = [item.find('span',{'class':'product-name d-block'}).text.strip() for item in items]
tag = [item.find('svg').next_sibling.strip() for item in soup.findAll('div',{'class':'card-ribbon'})]
results = list(zip(name,tag))
df = pd.DataFrame(results)
driver.quit()
df
您可以使用尝试,但
除外。我从未将其纳入列表理解,我可能会尝试回去做:
import requests
import pandas as pd
from selenium import webdriver
urls = [
'https://www.nakedcph.com/sneakers-by-adidas/s/37'
]
baseURL = 'https://www.nakedcph.com'
final = []
with requests.Session() as s:
for url in urls:
driver = webdriver.Chrome('C:/chromedriver_win32/chromedriver.exe')
driver.get(url)
soup = bs(driver.page_source, 'lxml')
items = soup.findAll("div", {"class" : lambda L: L and L.startswith('col-6 col-md-3 mb-5')})
name = []
tag = []
for each in items:
name.append(each.find('span',{'class':'product-name d-block'}).text.strip())
try:
tag.append(each.find('svg').next_sibling.strip())
except:
tag.append(None)
results = list(zip(name,tag))
df = pd.DataFrame(results)
driver.quit()
输出:
print (df)
0 1
0 ZX Torsion Releasing Soon
1 Campus Restock
2 Campus Restock
3 Consortium Runner Mid 4D None
4 Ozweego None
5 Ozweego None
6 Yeezy Boost 350 V2 Infant Sold out
7 Yeezy Boost 350 V2 Kids Sold out
8 Yeezy Boost 350 V2 Sold out
9 Yung-1 None
10 Yung 1 None
11 A.R. Trainer None
12 A.R. Trainer None
13 Adilette Pride None
14 Supercourt None
15 Supercourt RX None
16 ZX 4000 4D None
17 Yeezy Boost 700 V2 Sold out
18 Yeezy Boost 350 V2 Infant Sold out
19 Yeezy Boost 350 V2 Kids Sold out
20 Yeezy Boost 350 V2 Sold out
21 Yeezy Boost 700 V2 Sold out
22 Yeezy Boost 700 V2 Kids Sold out
23 Yeezy Boost 700 V2 Infant Sold out
@chitown98是一种更好的方法来处理这种类型的问题,然后是数据框?只是试图改进我的问题,以备将来使用。你可以将其添加为标记。我认为问题的标题和描述很好。您还提供了您得到的输出,以及您试图通过完整的代码实现的输出,以重新生成它。就像我说的,我觉得这个问题很好。