Python 根据条件将类名打印到字典-BS4

Python 根据条件将类名打印到字典-BS4,python,dictionary,web-scraping,beautifulsoup,Python,Dictionary,Web Scraping,Beautifulsoup,我正试图从网站上抓取数据,将其存储在字典中,并将结果以结构化格式打印到csv表中。 到目前为止,我的代码看起来像这样,几乎按照我想要的方式工作: import requests from bs4 import BeautifulSoup import csv URL = "https://database.globalreporting.org/reports/49283/" r = requests.get(URL, verify=False) soup = BeautifulSoup(r

我正试图从网站上抓取数据,将其存储在字典中,并将结果以结构化格式打印到csv表中。 到目前为止,我的代码看起来像这样,几乎按照我想要的方式工作:

import requests
from bs4 import BeautifulSoup
import csv

URL = "https://database.globalreporting.org/reports/49283/"
r = requests.get(URL, verify=False)

soup = BeautifulSoup(r.content, 'html5lib')
# print(soup.prettify())
table = soup.findAll('li', attrs={'class': 'list-group-item'})
print(table)

quotes = []

for row in table:
    quote = {}
    quote['Label'] = " ".join(row.getText().split())
    quotes.append(quote)
    for line in row.select('span[class]'):
        if line['class'][0] == 'glyphicon glyphicon-ok text-success':
            quote['Tickmark'] = "Yes"
            quotes.append(quote)
        if line['class'][0] == 'glyphicon glyphicon-remove text-light':
            quote['Cross'] = "No"
            quotes.append(quote)

for quote in quotes:
    print(quote)

filename = 'CSR_Info.csv'
with open(filename, 'w') as f:
    w = csv.DictWriter(f, ['Label','Tickmark','Cross'])
    w.writeheader()
    for quote in quotes:
        w.writerow(quote)
问题是,我的两个if语句总是没有任何值

输出如下所示(逗号没有任何值,尽管我希望是/否):

我刮取的HTML部分如下所示:

所以我需要的不是类的文本,而是类名本身来检查if语句

有人知道如何做到这一点吗

最后,我的结果应该是:

Integrated:,Yes, 
或在“否”的情况下:

Integrated:,,No 

如果您打印
行['class']
,您将看到类名称返回为列表,因此类名称[0]应该是
glyphicon
而不是
glyphicon glyphicon remove text light
,这就是您没有获得值的原因

为了解决这个问题,我添加了if条件来检查列表的长度if 3,然后用和条件验证以下类名

import requests
from bs4 import BeautifulSoup
import csv

URL = "https://database.globalreporting.org/reports/49283/"
r = requests.get(URL, verify=False)

soup = BeautifulSoup(r.content, 'html5lib')
# print(soup.prettify())
table = soup.findAll('li', attrs={'class': 'list-group-item'})
#print(table)

quotes = []

for row in table:
    quote = {}
    quote['Label'] = " ".join(row.getText().split())
    quotes.append(quote)
    for line in row.select('span[class]'):

       if len(line['class'])==3:
          if line['class'][0] == 'glyphicon' and line['class'][1] =='glyphicon-ok' and line['class'][2] =='text-success':
             quote['Tickmark'] = "Yes"
             quotes.append(quote)
          if line['class'][0] == 'glyphicon' and line['class'][1] =='glyphicon-remove' and line['class'][2] =='text-light':
             quote['Cross'] = "No"
             quotes.append(quote)

for quote in quotes:
    print(quote)

filename = 'CSR_Info.csv'
with open(filename, 'w') as f:
    w = csv.DictWriter(f, ['Label','Tickmark','Cross'])
    w.writeheader()
    for quote in quotes:
        w.writerow(quote)
输出

{'Label': 'Publication year: 2017'}
{'Label': 'Report type: GRI - G4'}
{'Label': 'Adherence Level: In accordance - Comprehensive'}
{'Label': 'Sector supplement: Not Applicable'}
{'Label': 'Integrated:', 'Cross': 'No'}
{'Label': 'Integrated:', 'Cross': 'No'}
{'Label': 'GRI Service: Materiality Disclosures Service'}
{'Label': 'Reporting period: ? - ?'}
{'Label': 'Reporting cycle: ?'}
{'Label': 'Language: ?'}
{'Label': 'Number of pages: ?'}
{'Label': 'SDGs:', 'Tickmark': 'Yes'}
{'Label': 'SDGs:', 'Tickmark': 'Yes'}
{'Label': 'CDP:', 'Cross': 'No'}
{'Label': 'CDP:', 'Cross': 'No'}
{'Label': 'IFC:', 'Cross': 'No'}
{'Label': 'IFC:', 'Cross': 'No'}
{'Label': 'OECD Guidelines:', 'Tickmark': 'Yes'}
{'Label': 'OECD Guidelines:', 'Tickmark': 'Yes'}
{'Label': 'UNGC:', 'Tickmark': 'Yes'}
{'Label': 'UNGC:', 'Tickmark': 'Yes'}
{'Label': 'ISO 26000:', 'Cross': 'No'}
{'Label': 'ISO 26000:', 'Cross': 'No'}
{'Label': 'AA1000:', 'Cross': 'No'}
{'Label': 'AA1000:', 'Cross': 'No'}
{'Label': 'Stakeholder Panel/Expert Opinion:', 'Cross': 'No'}
{'Label': 'Stakeholder Panel/Expert Opinion:', 'Cross': 'No'}
{'Label': 'External assurance:', 'Tickmark': 'Yes'}
{'Label': 'External assurance:', 'Tickmark': 'Yes'}
{'Label': 'Type of Assurance Provider: Accountant'}
{'Label': 'Assurance Provider: Pricewaterhouse Coopers'}
{'Label': 'Assurance Scope: Entire sustainability report'}
{'Label': 'Level of Assurance: Limited/moderate'}
{'Label': 'Assurance Standard AA1000AS:', 'Cross': 'No'}
{'Label': 'Assurance Standard AA1000AS:', 'Cross': 'No'}
{'Label': 'Assurance Standard ISAE3000:', 'Tickmark': 'Yes'}
{'Label': 'Assurance Standard ISAE3000:', 'Tickmark': 'Yes'}
{'Label': 'Assurance Standard: national (general):', 'Cross': 'No'}
{'Label': 'Assurance Standard: national (general):', 'Cross': 'No'}
{'Label': 'Assurance Standard: national (sustainability):', 'Cross': 'No'}
{'Label': 'Assurance Standard: national (sustainability):', 'Cross': 'No'}
{'Label': 'Publication year: 2017'}
{'Label': 'Report type: GRI - G4'}
{'Label': 'Adherence Level: In accordance - Comprehensive'}
{'Label': 'Sector supplement: Not Applicable'}
{'Label': 'Integrated:', 'Cross': 'No'}
{'Label': 'Integrated:', 'Cross': 'No'}
{'Label': 'GRI Service: Materiality Disclosures Service'}
{'Label': 'Reporting period: ? - ?'}
{'Label': 'Reporting cycle: ?'}
{'Label': 'Language: ?'}
{'Label': 'Number of pages: ?'}
{'Label': 'SDGs:', 'Tickmark': 'Yes'}
{'Label': 'SDGs:', 'Tickmark': 'Yes'}
{'Label': 'CDP:', 'Cross': 'No'}
{'Label': 'CDP:', 'Cross': 'No'}
{'Label': 'IFC:', 'Cross': 'No'}
{'Label': 'IFC:', 'Cross': 'No'}
{'Label': 'OECD Guidelines:', 'Tickmark': 'Yes'}
{'Label': 'OECD Guidelines:', 'Tickmark': 'Yes'}
{'Label': 'UNGC:', 'Tickmark': 'Yes'}
{'Label': 'UNGC:', 'Tickmark': 'Yes'}
{'Label': 'ISO 26000:', 'Cross': 'No'}
{'Label': 'ISO 26000:', 'Cross': 'No'}
{'Label': 'AA1000:', 'Cross': 'No'}
{'Label': 'AA1000:', 'Cross': 'No'}
{'Label': 'Stakeholder Panel/Expert Opinion:', 'Cross': 'No'}
{'Label': 'Stakeholder Panel/Expert Opinion:', 'Cross': 'No'}
{'Label': 'External assurance:', 'Tickmark': 'Yes'}
{'Label': 'External assurance:', 'Tickmark': 'Yes'}
{'Label': 'Type of Assurance Provider: Accountant'}
{'Label': 'Assurance Provider: Pricewaterhouse Coopers'}
{'Label': 'Assurance Scope: Entire sustainability report'}
{'Label': 'Level of Assurance: Limited/moderate'}
{'Label': 'Assurance Standard AA1000AS:', 'Cross': 'No'}
{'Label': 'Assurance Standard AA1000AS:', 'Cross': 'No'}
{'Label': 'Assurance Standard ISAE3000:', 'Tickmark': 'Yes'}
{'Label': 'Assurance Standard ISAE3000:', 'Tickmark': 'Yes'}
{'Label': 'Assurance Standard: national (general):', 'Cross': 'No'}
{'Label': 'Assurance Standard: national (general):', 'Cross': 'No'}
{'Label': 'Assurance Standard: national (sustainability):', 'Cross': 'No'}
{'Label': 'Assurance Standard: national (sustainability):', 'Cross': 'No'}