Python 熊猫不从文件夹中的html文件读取表_Python_Python 3.x_Pandas

Python 熊猫不从文件夹中的html文件读取表

python python-3.x pandas

Python 熊猫不从文件夹中的html文件读取表,python,python-3.x,pandas,Python,Python 3.x,Pandas,我试图使用pandas读取文件夹中每个单独html文件的表，以找出每个文件中的表数但是，当指定单个文件时，此功能有效，但当我尝试在文件夹中运行它时，它会显示没有表这是单个文件的代码 import pandas as pd file = r'C:\Users\Ahmed_Abdelmuniem\Desktop\XXX.html' table = pd.read_html(file) print ('tables found:', len(table)) 这是输出 C:\Users\Ah

我试图使用pandas读取文件夹中每个单独html文件的表，以找出每个文件中的表数

但是，当指定单个文件时，此功能有效，但当我尝试在文件夹中运行它时，它会显示没有表

这是单个文件的代码

import pandas as pd


file = r'C:\Users\Ahmed_Abdelmuniem\Desktop\XXX.html'
table = pd.read_html(file)

print ('tables found:', len(table))

这是输出

C:\Users\Ahmed_Abdelmuniem\AppData\Local\Programs\Python\Python39\python.exe C:/Users/Ahmed_Abdelmuniem/PycharmProjects/PandaHTML/main.py
tables found: 72

Process finished with exit code 0

这是文件夹中每个文件的代码

import pandas as pd
import shutil
import os

source_dir = r'C:\Users\Ahmed_Abdelmuniem\Desktop\TMorning'
target_dir = r'C:\Users\Ahmed_Abdelmuniem\Desktop\TAfternoon'

file_names = os.listdir(source_dir)

for file_name in file_names:
    table = pd.read_html(file_name)
    print ('tables found:', len(table))

这是错误日志：

C:\Users\Ahmed_Abdelmuniem\AppData\Local\Programs\Python\Python39\python.exe "C:/Users/Ahmed_Abdelmuniem/PycharmProjects/File mover V2.0/main.py"
Traceback (most recent call last):
  File "C:\Users\Ahmed_Abdelmuniem\PycharmProjects\File mover V2.0\main.py", line 12, in <module>
    table = pd.read_html(file_name)
  File "C:\Users\Ahmed_Abdelmuniem\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\util\_decorators.py", line 299, in wrapper
    return func(*args, **kwargs)
  File "C:\Users\Ahmed_Abdelmuniem\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\io\html.py", line 1085, in read_html
    return _parse(
  File "C:\Users\Ahmed_Abdelmuniem\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\io\html.py", line 913, in _parse
    raise retained
  File "C:\Users\Ahmed_Abdelmuniem\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\io\html.py", line 893, in _parse
    tables = p.parse_tables()
  File "C:\Users\Ahmed_Abdelmuniem\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\io\html.py", line 213, in parse_tables
    tables = self._parse_tables(self._build_doc(), self.match, self.attrs)
  File "C:\Users\Ahmed_Abdelmuniem\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\io\html.py", line 543, in _parse_tables
    raise ValueError("No tables found")
ValueError: No tables found

Process finished with exit code 1

C:\Users\Ahmed\u Abdelmuniem\AppData\Local\Programs\Python\Python39\Python.exe“C:/Users/Ahmed\u Abdelmuniem/PycharmProjects/File mover V2.0/main.py”
回溯（最近一次呼叫最后一次）：
文件“C:\Users\Ahmed_Abdelmuniem\PycharmProjects\File mover V2.0\main.py”，第12行，在
table=pd.read\u html（文件名）
文件“C:\Users\Ahmed\u Abdelmuniem\AppData\Local\Programs\Python\39\lib\site packages\pandas\util\\u decorators.py”，第299行，在包装器中
返回函数（*args，**kwargs）
文件“C:\Users\Ahmed\u Abdelmuniem\AppData\Local\Programs\Python\39\lib\site packages\pandas\io\html.py”，第1085行，以只读html格式
返回解析(
文件“C:\Users\Ahmed\u Abdelmuniem\AppData\Local\Programs\Python\39\lib\site packages\pandas\io\html.py”，第913行，在
提存
文件“C:\Users\Ahmed\u Abdelmuniem\AppData\Local\Programs\Python\39\lib\site packages\pandas\io\html.py”，第893行，在
tables=p.parse_tables（）
文件“C:\Users\Ahmed\u Abdelmuniem\AppData\Local\Programs\Python\39\lib\site packages\pandas\io\html.py”，第213行，在parse\u表格中
tables=self.\u parse\u tables（self.\u build\u doc（），self.match，self.attrs）
文件“C:\Users\Ahmed\u Abdelmuniem\AppData\Local\Programs\Python\39\lib\site packages\pandas\io\html.py”，第543行，在解析表中
raise VALUERROR（“未找到表”）
ValueError:未找到任何表
进程已完成，退出代码为1

os.listdir

返回一个列表，其中包含目录中条目的名称，包括子目录或任何其他文件。如果您只想保留html文件，最好使用

glob.glob

import glob

file_names = glob.glob(os.path.join(source_dir, '*.html'))

编辑：如果要使用

os.listdir

，必须获取文件的实际路径：

for file_name in file_names:
    table = pd.read_html(os.path.join(source_dir, file_name))
    print ('tables found:', len(table))

非常感谢，它成功了。但是如果只有html文件，为什么它不成功呢？