Python Pandas read_html导致类型错误
我使用bs4解析html页面并提取一个表,下面给出的示例表,我试图将其加载到pandas中,但是当我调用Python Pandas read_html导致类型错误,python,pandas,Python,Pandas,我使用bs4解析html页面并提取一个表,下面给出的示例表,我试图将其加载到pandas中,但是当我调用pddataframe=pd.read_html(LOTable,skiprows=2,flavor=['bs4'])时,我得到了下面列出的错误,但我可以打印bs4美化的表 有什么建议我可以解决这个问题,而不需要得到每一个td和阅读1 1 样本表 学习成果 成功完成本模块后,学员将能够: LO1 了解财务会计信息作为决策过程输入的重要作用。 LO2 了解财务报表编制所依据的基本会计概念、原则
pddataframe=pd.read_html(LOTable,skiprows=2,flavor=['bs4'])
时,我得到了下面列出的错误,但我可以打印bs4美化的表
有什么建议我可以解决这个问题,而不需要得到每一个td和阅读1 1
样本表
学习成果
成功完成本模块后,学员将能够:
LO1
了解财务会计信息作为决策过程输入的重要作用。
LO2
了解财务报表编制所依据的基本会计概念、原则和惯例。
LO3
了解记录和分类交易或事件相关信息的各种格式。
LO4
运用会计概念、惯例和技术知识,如复式记账,将记录信息过账到名义分类账中的T账户。
LO5
根据试算表,以规定格式编制并呈报独家贸易商的财务报表,并附上附注和其他信息。
错误
-------------------------------------------------------------在()中键入错误回溯(最近一次调用)
10#将表格读入熊猫
11如果首先:
--->12 pddataframe=pd.read_html(LOTable,skiprows=2,flavor=['bs4'])
13第一个=错误
14数据帧
C:\Program Files\Anaconda3\envs\learningoutcouts\lib\site packages\pandas\io\html.py(io、匹配、风格、标题、索引、skiprows、属性、解析日期、元组、千、编码)
872 _验证_标题_参数(标题)
873返回解析(风格、io、匹配、标题、索引列、skiprows、,
-->874解析(日期、元组、千、属性、编码)
C:\Program Files\Anaconda3\envs\learningoutcouts\lib\site packages\pandas\io\html.py in\u parse(风格、io、匹配、标题、索引、skiprows、解析日期、元组、千、属性、编码)
734中断
735其他:
-->736带回溯的raise_(保留)
737
738 ret=[]
C:\Program Files\Anaconda3\envs\learningoutcouts\lib\site packages\pandas\compat\\uuuuuuuu init\uuuuuuuuuuuuuu.py在带回溯的raise\u中(exc,回溯)
331如果回溯==省略号:
332 u,u,traceback=sys.exc_info()
-->333带回溯的提升exc(回溯)
334其他:
335#此版本的raise在Python3中是一个语法错误
**TypeError:“非类型”对象不可调用**
熊猫可以猜到
HTML = '''\
<table cellpadding="5" cellspacing="0" class="borders" width="100%">
<tr>
<th colspan="2">
Learning Outcomes
</th>
... omitting most of what you had here
Prepare and present the financial statements of a Sole Trader in prescribed format from a Trial Balance accompanies by notes with additional information.
</td>
</tr>
</table>'''
from io import StringIO
import pandas as pd
df = pd.read_html(StringIO(HTML))
print (df)
这个确切的代码对我有用
htm = """<table cellpadding="5" cellspacing="0" class="borders" width="100%">
<tr>
<th colspan="2">
Learning Outcomes
</th>
</tr>
<tr>
<td class="info" colspan="2">
On successful completion of this module the learner will be able to:
</td>
</tr>
<tr>
<td style="width:10%;">
LO1
</td>
<td>
Demonstrate an awareness of the important role of Financial Accounting information as an input into the decision making process.
</td>
</tr>
<tr>
<td style="width:10%;">
LO2
</td>
<td>
Display an understanding of the fundamental accounting concepts, principles and conventions that underpin the preparation of Financial statements.
</td>
</tr>
<tr>
<td style="width:10%;">
LO3
</td>
<td>
Understand the various formats in which information in relation to transactions or events is recorded and classified.
</td>
</tr>
<tr>
<td style="width:10%;">
LO4
</td>
<td>
Apply a knowledge of accounting concepts,conventions and techniques such as double entry to the posting of recorded information to the T accounts in the Nominal Ledger.
</td>
</tr>
<tr>
<td style="width:10%;">
LO5
</td>
<td>
Prepare and present the financial statements of a Sole Trader in prescribed format from a Trial Balance accompanies by notes with additional information.
</td>
</tr>
</table>
"""
pd.read_html(htm, skiprows=2, flavor='bs4')[0]
htm=”“”
学习成果
成功完成本模块后,学员将能够:
LO1
了解财务会计信息作为决策过程输入的重要作用。
LO2
了解财务报表编制所依据的基本会计概念、原则和惯例。
LO3
了解记录和分类交易或事件相关信息的各种格式。
LO4
运用会计概念、惯例和技术知识,如复式记账,将记录信息过账到名义分类账中的T账户。
LO5
根据试算表,以规定格式编制并呈报独家贸易商的财务报表,并附上附注和其他信息。
"""
pd.read_html(htm,skiprows=2,flavor='bs4')[0]
感谢所有建议答案和评论中的提示,我的新手错误是,在使用bs4提取表后,我将其放入变量中。
当我需要运行
pd.read\u html(LOTable,skiprows=2,flavor='bs4')
时,我正在运行pd.read\u html(LOTable.prettify(),skiprows=2,flavor='bs4')
flavor:str或None,字符串容器
。在没有flavor='bs4'pd的情况下尝试了它。read\u html(LOTable,skiprows=2)也出现了同样的错误。在尝试阅读之前,我可以调用print(LOTable.prettify()),并输出表格html,然后错误@AKS我不理解您的评论。请您再解释一下,好吗?问题在于您的示例或真实数据?谢谢告诉我们。我怀疑。我阅读了最近的panda文档后发现,没有必要将flavor='bs4'
放入,因为panda将默认为lxml IIRC。这就是我建议它自己进行解析的原因,或者至少它处理解析而不需要用户这样做。
HTML = '''\
<table cellpadding="5" cellspacing="0" class="borders" width="100%">
<tr>
<th colspan="2">
Learning Outcomes
</th>
... omitting most of what you had here
Prepare and present the financial statements of a Sole Trader in prescribed format from a Trial Balance accompanies by notes with additional information.
</td>
</tr>
</table>'''
from io import StringIO
import pandas as pd
df = pd.read_html(StringIO(HTML))
print (df)
[ 0 \
0 Learning Outcomes
1 On successful completion of this module the le...
2 LO1
3 LO2
4 LO3
5 LO4
6 LO5
1
0 NaN
1 NaN
2 Demonstrate an awareness of the important role...
3 Display an understanding of the fundamental ac...
4 Understand the various formats in which inform...
5 Apply a knowledge of accounting concepts,conve...
6 Prepare and present the financial statements o... ]
htm = """<table cellpadding="5" cellspacing="0" class="borders" width="100%">
<tr>
<th colspan="2">
Learning Outcomes
</th>
</tr>
<tr>
<td class="info" colspan="2">
On successful completion of this module the learner will be able to:
</td>
</tr>
<tr>
<td style="width:10%;">
LO1
</td>
<td>
Demonstrate an awareness of the important role of Financial Accounting information as an input into the decision making process.
</td>
</tr>
<tr>
<td style="width:10%;">
LO2
</td>
<td>
Display an understanding of the fundamental accounting concepts, principles and conventions that underpin the preparation of Financial statements.
</td>
</tr>
<tr>
<td style="width:10%;">
LO3
</td>
<td>
Understand the various formats in which information in relation to transactions or events is recorded and classified.
</td>
</tr>
<tr>
<td style="width:10%;">
LO4
</td>
<td>
Apply a knowledge of accounting concepts,conventions and techniques such as double entry to the posting of recorded information to the T accounts in the Nominal Ledger.
</td>
</tr>
<tr>
<td style="width:10%;">
LO5
</td>
<td>
Prepare and present the financial statements of a Sole Trader in prescribed format from a Trial Balance accompanies by notes with additional information.
</td>
</tr>
</table>
"""
pd.read_html(htm, skiprows=2, flavor='bs4')[0]