python读取HTML表
python读取HTML表,python,pandas,Python,Pandas,pd.read\u html仅读取(第0个)表的前5行。如何使用pd.read\u html读取整个表格 我尝试了以下代码: import pandas as pd import requests from urllib.error import HTTPError try: url = "https://clinicaltrials.gov/ct2/history/NCT02954874" html_data2 = requests.get(url) df = pd.
pd.read\u html
仅读取(第0个)表的前5行。如何使用pd.read\u html
读取整个表格
我尝试了以下代码:
import pandas as pd
import requests
from urllib.error import HTTPError
try:
url = "https://clinicaltrials.gov/ct2/history/NCT02954874"
html_data2 = requests.get(url)
df = pd.read_html(html_data2.text)[0]
data = df.head()
print(data)
except HTTPError as http_error:
print("HTTP error: ", http_error)
您将
data
分配为df.head()
,它返回数据帧的前5行。相反,您可以:
url = "https://clinicaltrials.gov/ct2/history/NCT02954874"
html_data2 = requests.get(url)
df = pd.read_html(html_data2.text)[0]
data = df #not df.head()
此外,pandas能够直接读取html,因此您可以:
data = pd.read_html(r"https://clinicaltrials.gov/ct2/history/NCT02954874")[0]
然后在你的“尝试”和“例外”语句下输入
产出:
url = "https://clinicaltrials.gov/ct2/history/NCT02954874"
html_data2 = requests.get(url)
df = pd.read_html(html_data2.text)[0]
data = df.head()
print(data)
Vs
将
data=df.head()
更改为data=df
或只是data=pd.read\u html(html\u data2.text)[0]
并去掉多余的line@anky_91:谢谢,这很有效。请贴出答案。我也会接受同样的回答。@anky_91:请把这个作为回答。显示两个输出的片段以及它们之间的差异也很有用。。。
Version A B Submitted Date Changes
0 1 NaN NaN November 3, 2016 Nothing (earliest Version on record)
1 2 NaN NaN November 24, 2016 Contacts/Locations and Study Status
2 3 NaN NaN November 28, 2016 Recruitment Status and Study Status
3 4 NaN NaN December 15, 2016 Contacts/Locations and Study Status
4 5 NaN NaN December 19, 2016 Contacts/Locations and Study Status
url = "https://clinicaltrials.gov/ct2/history/NCT02954874"
html_data2 = requests.get(url)
df = pd.read_html(html_data2.text)[0]
data = df
print(data)
Version A B Submitted Date Changes
0 1 NaN NaN November 3, 2016 Nothing (earliest Version on record)
1 2 NaN NaN November 24, 2016 Contacts/Locations and Study Status
2 3 NaN NaN November 28, 2016 Recruitment Status and Study Status
3 4 NaN NaN December 15, 2016 Contacts/Locations and Study Status
4 5 NaN NaN December 19, 2016 Contacts/Locations and Study Status
.. ... .. .. ... ...
558 559 NaN NaN December 19, 2019 Contacts/Locations and Study Status
559 560 NaN NaN December 20, 2019 Contacts/Locations and Study Status
560 561 NaN NaN December 23, 2019 Contacts/Locations and Study Status
561 562 NaN NaN December 25, 2019 Contacts/Locations and Study Status
562 563 NaN NaN December 27, 2019 Contacts/Locations and Study Status
[563 rows x 5 columns]