Dictionary 用BeautifulSoup解析HTML表标记_Dictionary_Parsing_Beautifulsoup

Dictionary 用BeautifulSoup解析HTML表标记

dictionary parsing

Dictionary 用BeautifulSoup解析HTML表标记,dictionary,parsing,beautifulsoup,Dictionary,Parsing,Beautifulsoup,我有以下任务，要在HTML页面中找到一个带有标记“的特定表，代码正在搜索所有表（即soup.find_all（'table'））。该页面有4个匹配项。因此，您需要针对特定的表。这可以使用索引来完成。此外，要将结果提取到字典中，您还需要刮除tr&td元素下面应该略过2000年到2010年的人口变化，并转换成字典。（注：我可能误解了输出应该是什么，但应该有足够的数据）输出： {'city': '+2.1%', 'state': '+2.1%', 'country': '+9.7%'} {'ci

我有以下任务，要在HTML页面中找到一个带有标记“的特定表，代码正在搜索所有表（即

soup.find_all（'table'）

）。该页面有4个匹配项。因此，您需要针对特定的表。这可以使用索引来完成。此外，要将结果提取到字典中，您还需要刮除

tr

td

元素

下面应该略过2000年到2010年的人口变化，并转换成字典。（注：我可能误解了输出应该是什么，但应该有足够的数据）

输出：

{'city': '+2.1%', 'state': '+2.1%', 'country': '+9.7%'}

{'city': '+2.1%', 'state': '+2.1%', 'country': '+9.7%'}

替代方法 在抓取网站时，html可以随时更改。例如，表格位置可以互换。因此，建议积极识别数据。一种方法是使用表格标题名称和表格行名称。例如：

from urllib.request import urlopen
from bs4 import BeautifulSoup

response = urlopen('file:///C:/Users/User/Documents/Visual%20Studio%202017/DjangoWebProject1/DjangoWebProject1/app/New-York%20(1).html')
table_header = 'City compared to State & U.S.'
table_row_name = 'Population change, 2000 to 2010'

def find_table_by_header(table, header_text):
    return table if table.find('th') and table.find('th').text.strip() == header_text else None

def find_tablerow_by_title(table, table_row_name):
    trs = table.find_all('tr')
    results = [x for x in trs if x.find('td') and x.find('td').text.strip() == table_row_name]
    return None if not results else results[0]

html = response.read().decode('utf-8')
soup = BeautifulSoup(html, 'html.parser')

tables = soup.find_all('table', attrs={'class':'wikitable collapsible collapsed'})
table_city_and_state = [tbl_result for tbl_result in [find_table_by_header(tbl, table_header) for tbl in tables] if tbl_result][0]
tr_population_change = [tbl_result for tbl_result in [find_tablerow_by_title(table_city_and_state, table_row_name) for tbl in tables] if tbl_result][0]
tds_population_change = tr_population_change.find_all('td')

your_dictionary = {
    "city": tds_population_change[1].text.strip(),
    "state": tds_population_change[2].text.strip(),
    "country": tds_population_change[3].text.strip()
}

print(your_dictionary)

输出：

{'city': '+2.1%', 'state': '+2.1%', 'country': '+9.7%'}

{'city': '+2.1%', 'state': '+2.1%', 'country': '+9.7%'}

你能更新你的问题吗，这样代码就可以重现问题。纽约%20（1）.html中包含了什么？是的，这是网页