Python:从URL列表中引用的表中删除所有官员的姓名
我试图让python在BallotMedia上为我提供州议员和众议员的姓名。然而,我编写的代码只提供了我从url请求的标题,但我没有得到任何名称。以下是我当前的python代码:Python:从URL列表中引用的表中删除所有官员的姓名,python,web-scraping,find,screen-scraping,page-inspector,Python,Web Scraping,Find,Screen Scraping,Page Inspector,我试图让python在BallotMedia上为我提供州议员和众议员的姓名。然而,我编写的代码只提供了我从url请求的标题,但我没有得到任何名称。以下是我当前的python代码: import requests from bs4 import BeautifulSoup import pandas as pd list = ['https://ballotpedia.org/Alabama_State_Senate', 'https://ballotpedia.org/Alabama_Hous
import requests
from bs4 import BeautifulSoup
import pandas as pd
list = ['https://ballotpedia.org/Alabama_State_Senate', 'https://ballotpedia.org/Alabama_House_of_Representatives']
temp_dict = {}
for page in list:
r = requests.get(page)
soup = BeautifulSoup(r.content, 'html.parser')
temp_dict[page.split('/')[-1]] = [item.text for item in
soup.select("table.bptable gray sortable tablesorter
jquery-tablesorter a")]
df = pd.DataFrame.from_dict(temp_dict,
orient='index').transpose()
我认为我的错误在于:
temp_dict[page.split('/')[-1]] = [item.text for item in
soup.select("table.bptable gray sortable tablesorter
jquery-tablesorter a")]
谢谢。这两个表的索引从页面上看是相同的。只需使用pandas read_html即可获得表格和所有结果:-
import pandas as pd
urls = ['https://ballotpedia.org/Alabama_State_Senate', 'https://ballotpedia.org/Alabama_House_of_Representatives']
appended_data = []
for page in urls:
df = pd.read_html(page)[3]
appended_data.append(df)
appended_data = pd.concat(appended_data)
你能分享HTML的相关部分吗?当然。我试图针对这一点: