Python 根据条件将类名打印到字典-BS4
我正试图从网站上抓取数据,将其存储在字典中,并将结果以结构化格式打印到csv表中。 到目前为止,我的代码看起来像这样,几乎按照我想要的方式工作:Python 根据条件将类名打印到字典-BS4,python,dictionary,web-scraping,beautifulsoup,Python,Dictionary,Web Scraping,Beautifulsoup,我正试图从网站上抓取数据,将其存储在字典中,并将结果以结构化格式打印到csv表中。 到目前为止,我的代码看起来像这样,几乎按照我想要的方式工作: import requests from bs4 import BeautifulSoup import csv URL = "https://database.globalreporting.org/reports/49283/" r = requests.get(URL, verify=False) soup = BeautifulSoup(r
import requests
from bs4 import BeautifulSoup
import csv
URL = "https://database.globalreporting.org/reports/49283/"
r = requests.get(URL, verify=False)
soup = BeautifulSoup(r.content, 'html5lib')
# print(soup.prettify())
table = soup.findAll('li', attrs={'class': 'list-group-item'})
print(table)
quotes = []
for row in table:
quote = {}
quote['Label'] = " ".join(row.getText().split())
quotes.append(quote)
for line in row.select('span[class]'):
if line['class'][0] == 'glyphicon glyphicon-ok text-success':
quote['Tickmark'] = "Yes"
quotes.append(quote)
if line['class'][0] == 'glyphicon glyphicon-remove text-light':
quote['Cross'] = "No"
quotes.append(quote)
for quote in quotes:
print(quote)
filename = 'CSR_Info.csv'
with open(filename, 'w') as f:
w = csv.DictWriter(f, ['Label','Tickmark','Cross'])
w.writeheader()
for quote in quotes:
w.writerow(quote)
问题是,我的两个if语句总是没有任何值
输出如下所示(逗号没有任何值,尽管我希望是/否):
我刮取的HTML部分如下所示:
所以我需要的不是类的文本,而是类名本身来检查if语句
有人知道如何做到这一点吗
最后,我的结果应该是:
Integrated:,Yes,
或在“否”的情况下:
Integrated:,,No
如果您打印
行['class']
,您将看到类名称返回为列表,因此类名称[0]应该是glyphicon
而不是glyphicon glyphicon remove text light
,这就是您没有获得值的原因
为了解决这个问题,我添加了if条件来检查列表的长度if 3,然后用和条件验证以下类名
import requests
from bs4 import BeautifulSoup
import csv
URL = "https://database.globalreporting.org/reports/49283/"
r = requests.get(URL, verify=False)
soup = BeautifulSoup(r.content, 'html5lib')
# print(soup.prettify())
table = soup.findAll('li', attrs={'class': 'list-group-item'})
#print(table)
quotes = []
for row in table:
quote = {}
quote['Label'] = " ".join(row.getText().split())
quotes.append(quote)
for line in row.select('span[class]'):
if len(line['class'])==3:
if line['class'][0] == 'glyphicon' and line['class'][1] =='glyphicon-ok' and line['class'][2] =='text-success':
quote['Tickmark'] = "Yes"
quotes.append(quote)
if line['class'][0] == 'glyphicon' and line['class'][1] =='glyphicon-remove' and line['class'][2] =='text-light':
quote['Cross'] = "No"
quotes.append(quote)
for quote in quotes:
print(quote)
filename = 'CSR_Info.csv'
with open(filename, 'w') as f:
w = csv.DictWriter(f, ['Label','Tickmark','Cross'])
w.writeheader()
for quote in quotes:
w.writerow(quote)
输出:
{'Label': 'Publication year: 2017'}
{'Label': 'Report type: GRI - G4'}
{'Label': 'Adherence Level: In accordance - Comprehensive'}
{'Label': 'Sector supplement: Not Applicable'}
{'Label': 'Integrated:', 'Cross': 'No'}
{'Label': 'Integrated:', 'Cross': 'No'}
{'Label': 'GRI Service: Materiality Disclosures Service'}
{'Label': 'Reporting period: ? - ?'}
{'Label': 'Reporting cycle: ?'}
{'Label': 'Language: ?'}
{'Label': 'Number of pages: ?'}
{'Label': 'SDGs:', 'Tickmark': 'Yes'}
{'Label': 'SDGs:', 'Tickmark': 'Yes'}
{'Label': 'CDP:', 'Cross': 'No'}
{'Label': 'CDP:', 'Cross': 'No'}
{'Label': 'IFC:', 'Cross': 'No'}
{'Label': 'IFC:', 'Cross': 'No'}
{'Label': 'OECD Guidelines:', 'Tickmark': 'Yes'}
{'Label': 'OECD Guidelines:', 'Tickmark': 'Yes'}
{'Label': 'UNGC:', 'Tickmark': 'Yes'}
{'Label': 'UNGC:', 'Tickmark': 'Yes'}
{'Label': 'ISO 26000:', 'Cross': 'No'}
{'Label': 'ISO 26000:', 'Cross': 'No'}
{'Label': 'AA1000:', 'Cross': 'No'}
{'Label': 'AA1000:', 'Cross': 'No'}
{'Label': 'Stakeholder Panel/Expert Opinion:', 'Cross': 'No'}
{'Label': 'Stakeholder Panel/Expert Opinion:', 'Cross': 'No'}
{'Label': 'External assurance:', 'Tickmark': 'Yes'}
{'Label': 'External assurance:', 'Tickmark': 'Yes'}
{'Label': 'Type of Assurance Provider: Accountant'}
{'Label': 'Assurance Provider: Pricewaterhouse Coopers'}
{'Label': 'Assurance Scope: Entire sustainability report'}
{'Label': 'Level of Assurance: Limited/moderate'}
{'Label': 'Assurance Standard AA1000AS:', 'Cross': 'No'}
{'Label': 'Assurance Standard AA1000AS:', 'Cross': 'No'}
{'Label': 'Assurance Standard ISAE3000:', 'Tickmark': 'Yes'}
{'Label': 'Assurance Standard ISAE3000:', 'Tickmark': 'Yes'}
{'Label': 'Assurance Standard: national (general):', 'Cross': 'No'}
{'Label': 'Assurance Standard: national (general):', 'Cross': 'No'}
{'Label': 'Assurance Standard: national (sustainability):', 'Cross': 'No'}
{'Label': 'Assurance Standard: national (sustainability):', 'Cross': 'No'}
{'Label': 'Publication year: 2017'}
{'Label': 'Report type: GRI - G4'}
{'Label': 'Adherence Level: In accordance - Comprehensive'}
{'Label': 'Sector supplement: Not Applicable'}
{'Label': 'Integrated:', 'Cross': 'No'}
{'Label': 'Integrated:', 'Cross': 'No'}
{'Label': 'GRI Service: Materiality Disclosures Service'}
{'Label': 'Reporting period: ? - ?'}
{'Label': 'Reporting cycle: ?'}
{'Label': 'Language: ?'}
{'Label': 'Number of pages: ?'}
{'Label': 'SDGs:', 'Tickmark': 'Yes'}
{'Label': 'SDGs:', 'Tickmark': 'Yes'}
{'Label': 'CDP:', 'Cross': 'No'}
{'Label': 'CDP:', 'Cross': 'No'}
{'Label': 'IFC:', 'Cross': 'No'}
{'Label': 'IFC:', 'Cross': 'No'}
{'Label': 'OECD Guidelines:', 'Tickmark': 'Yes'}
{'Label': 'OECD Guidelines:', 'Tickmark': 'Yes'}
{'Label': 'UNGC:', 'Tickmark': 'Yes'}
{'Label': 'UNGC:', 'Tickmark': 'Yes'}
{'Label': 'ISO 26000:', 'Cross': 'No'}
{'Label': 'ISO 26000:', 'Cross': 'No'}
{'Label': 'AA1000:', 'Cross': 'No'}
{'Label': 'AA1000:', 'Cross': 'No'}
{'Label': 'Stakeholder Panel/Expert Opinion:', 'Cross': 'No'}
{'Label': 'Stakeholder Panel/Expert Opinion:', 'Cross': 'No'}
{'Label': 'External assurance:', 'Tickmark': 'Yes'}
{'Label': 'External assurance:', 'Tickmark': 'Yes'}
{'Label': 'Type of Assurance Provider: Accountant'}
{'Label': 'Assurance Provider: Pricewaterhouse Coopers'}
{'Label': 'Assurance Scope: Entire sustainability report'}
{'Label': 'Level of Assurance: Limited/moderate'}
{'Label': 'Assurance Standard AA1000AS:', 'Cross': 'No'}
{'Label': 'Assurance Standard AA1000AS:', 'Cross': 'No'}
{'Label': 'Assurance Standard ISAE3000:', 'Tickmark': 'Yes'}
{'Label': 'Assurance Standard ISAE3000:', 'Tickmark': 'Yes'}
{'Label': 'Assurance Standard: national (general):', 'Cross': 'No'}
{'Label': 'Assurance Standard: national (general):', 'Cross': 'No'}
{'Label': 'Assurance Standard: national (sustainability):', 'Cross': 'No'}
{'Label': 'Assurance Standard: national (sustainability):', 'Cross': 'No'}