python读取HTML表_Python_Pandas - Fatal编程技术网

python读取HTML表

python pandas

python读取HTML表,python,pandas,Python,Pandas,pd.read\u html仅读取（第0个）表的前5行。如何使用pd.read\u html读取整个表格我尝试了以下代码： import pandas as pd import requests from urllib.error import HTTPError try: url = "https://clinicaltrials.gov/ct2/history/NCT02954874" html_data2 = requests.get(url) df = pd.

pd.read\u html

仅读取（第0个）表的前5行。如何使用

pd.read\u html

读取整个表格

我尝试了以下代码：

import pandas as pd
import requests
from urllib.error import HTTPError

try:
    url = "https://clinicaltrials.gov/ct2/history/NCT02954874"
    html_data2 = requests.get(url)
    df = pd.read_html(html_data2.text)[0]
    data = df.head()
    print(data)
except HTTPError as http_error:
    print("HTTP error: ", http_error)

您将

data

分配为

df.head（）

，它返回数据帧的前5行。相反，您可以：

url = "https://clinicaltrials.gov/ct2/history/NCT02954874"
html_data2 = requests.get(url)
df = pd.read_html(html_data2.text)[0]
data = df #not df.head()

此外，pandas能够直接读取html，因此您可以：

data = pd.read_html(r"https://clinicaltrials.gov/ct2/history/NCT02954874")[0]

然后在你的“尝试”和“例外”语句下输入

产出：

url = "https://clinicaltrials.gov/ct2/history/NCT02954874"
html_data2 = requests.get(url)
df = pd.read_html(html_data2.text)[0]
data = df.head()
print(data)

将

data=df.head（）

更改为

data=df

或只是

data=pd.read\u html（html\u data2.text）[0]

并去掉多余的line@anky_91：谢谢，这很有效。请贴出答案。我也会接受同样的回答。@anky_91:请把这个作为回答。显示两个输出的片段以及它们之间的差异也很有用。。。

   Version   A   B     Submitted Date                               Changes
0        1 NaN NaN   November 3, 2016  Nothing (earliest Version on record)
1        2 NaN NaN  November 24, 2016   Contacts/Locations and Study Status
2        3 NaN NaN  November 28, 2016   Recruitment Status and Study Status
3        4 NaN NaN  December 15, 2016   Contacts/Locations and Study Status
4        5 NaN NaN  December 19, 2016   Contacts/Locations and Study Status

url = "https://clinicaltrials.gov/ct2/history/NCT02954874"
html_data2 = requests.get(url)
df = pd.read_html(html_data2.text)[0]
data = df
print(data)

     Version   A   B     Submitted Date                               Changes
0          1 NaN NaN   November 3, 2016  Nothing (earliest Version on record)
1          2 NaN NaN  November 24, 2016   Contacts/Locations and Study Status
2          3 NaN NaN  November 28, 2016   Recruitment Status and Study Status
3          4 NaN NaN  December 15, 2016   Contacts/Locations and Study Status
4          5 NaN NaN  December 19, 2016   Contacts/Locations and Study Status
..       ...  ..  ..                ...                                   ...
558      559 NaN NaN  December 19, 2019   Contacts/Locations and Study Status
559      560 NaN NaN  December 20, 2019   Contacts/Locations and Study Status
560      561 NaN NaN  December 23, 2019   Contacts/Locations and Study Status
561      562 NaN NaN  December 25, 2019   Contacts/Locations and Study Status
562      563 NaN NaN  December 27, 2019   Contacts/Locations and Study Status

[563 rows x 5 columns]