Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/311.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/facebook/8.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
python读取HTML表_Python_Pandas - Fatal编程技术网

python读取HTML表

python读取HTML表,python,pandas,Python,Pandas,pd.read\u html仅读取(第0个)表的前5行。如何使用pd.read\u html读取整个表格 我尝试了以下代码: import pandas as pd import requests from urllib.error import HTTPError try: url = "https://clinicaltrials.gov/ct2/history/NCT02954874" html_data2 = requests.get(url) df = pd.

pd.read\u html
仅读取(第0个)表的前5行。如何使用
pd.read\u html
读取整个表格

我尝试了以下代码:

import pandas as pd
import requests
from urllib.error import HTTPError

try:
    url = "https://clinicaltrials.gov/ct2/history/NCT02954874"
    html_data2 = requests.get(url)
    df = pd.read_html(html_data2.text)[0]
    data = df.head()
    print(data)
except HTTPError as http_error:
    print("HTTP error: ", http_error)

您将
data
分配为
df.head()
,它返回数据帧的前5行。相反,您可以:

url = "https://clinicaltrials.gov/ct2/history/NCT02954874"
html_data2 = requests.get(url)
df = pd.read_html(html_data2.text)[0]
data = df #not df.head()
此外,pandas能够直接读取html,因此您可以:

data = pd.read_html(r"https://clinicaltrials.gov/ct2/history/NCT02954874")[0]
然后在你的“尝试”和“例外”语句下输入

产出:

url = "https://clinicaltrials.gov/ct2/history/NCT02954874"
html_data2 = requests.get(url)
df = pd.read_html(html_data2.text)[0]
data = df.head()
print(data)

Vs



data=df.head()
更改为
data=df
或只是
data=pd.read\u html(html\u data2.text)[0]
并去掉多余的line@anky_91:谢谢,这很有效。请贴出答案。我也会接受同样的回答。@anky_91:请把这个作为回答。显示两个输出的片段以及它们之间的差异也很有用。。。
   Version   A   B     Submitted Date                               Changes
0        1 NaN NaN   November 3, 2016  Nothing (earliest Version on record)
1        2 NaN NaN  November 24, 2016   Contacts/Locations and Study Status
2        3 NaN NaN  November 28, 2016   Recruitment Status and Study Status
3        4 NaN NaN  December 15, 2016   Contacts/Locations and Study Status
4        5 NaN NaN  December 19, 2016   Contacts/Locations and Study Status
url = "https://clinicaltrials.gov/ct2/history/NCT02954874"
html_data2 = requests.get(url)
df = pd.read_html(html_data2.text)[0]
data = df
print(data)
     Version   A   B     Submitted Date                               Changes
0          1 NaN NaN   November 3, 2016  Nothing (earliest Version on record)
1          2 NaN NaN  November 24, 2016   Contacts/Locations and Study Status
2          3 NaN NaN  November 28, 2016   Recruitment Status and Study Status
3          4 NaN NaN  December 15, 2016   Contacts/Locations and Study Status
4          5 NaN NaN  December 19, 2016   Contacts/Locations and Study Status
..       ...  ..  ..                ...                                   ...
558      559 NaN NaN  December 19, 2019   Contacts/Locations and Study Status
559      560 NaN NaN  December 20, 2019   Contacts/Locations and Study Status
560      561 NaN NaN  December 23, 2019   Contacts/Locations and Study Status
561      562 NaN NaN  December 25, 2019   Contacts/Locations and Study Status
562      563 NaN NaN  December 27, 2019   Contacts/Locations and Study Status

[563 rows x 5 columns]