Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/sql-server-2008/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 表格:FileNotFoundError:[Errno 2](但文件路径正确)_Python_Ipython_Jupyter_Tabula - Fatal编程技术网

Python 表格:FileNotFoundError:[Errno 2](但文件路径正确)

Python 表格:FileNotFoundError:[Errno 2](但文件路径正确),python,ipython,jupyter,tabula,Python,Ipython,Jupyter,Tabula,问题: import tabula as tb import pandas as pd other = "https://github.com/chezou/tabula-py/raw/master/tests/resources/data.pdf" dfs = tb.read_pdf(other, stream=True) #this works file="D:\Favorites\1. Programming\Projects\cell penetrati

问题:

import tabula as tb
import pandas as pd

other = "https://github.com/chezou/tabula-py/raw/master/tests/resources/data.pdf"
dfs = tb.read_pdf(other, stream=True) #this works

file="D:\Favorites\1. Programming\Projects\cell penetrating peptide supplemental.pdf"
tables = tb.read_pdf(file, pages = "all", multiple_tables = True)
tables

输出:

---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
<ipython-input-29-c598474e8fa3> in <module>
      6 
      7 file="D:\Favorites\1. Programming\Projects\cell penetrating peptide supplemental.pdf"
----> 8 tables = tb.read_pdf(file, pages = "all", multiple_tables = True)
      9 tables

~\anaconda3\lib\site-packages\tabula\io.py in read_pdf(input_path, output_format, encoding, java_options, pandas_options, multiple_tables, user_agent, **kwargs)
    312 
    313     if not os.path.exists(path):
--> 314         raise FileNotFoundError(errno.ENOENT, os.strerror(errno.ENOENT), path)
    315 
    316     if os.path.getsize(path) == 0:

FileNotFoundError: [Errno 2] No such file or directory: 'D:\\Favorites\x01. Programming\\Projects\\cell penetrating peptide supplemental.pdf'
True
True
False
输出:

---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
<ipython-input-29-c598474e8fa3> in <module>
      6 
      7 file="D:\Favorites\1. Programming\Projects\cell penetrating peptide supplemental.pdf"
----> 8 tables = tb.read_pdf(file, pages = "all", multiple_tables = True)
      9 tables

~\anaconda3\lib\site-packages\tabula\io.py in read_pdf(input_path, output_format, encoding, java_options, pandas_options, multiple_tables, user_agent, **kwargs)
    312 
    313     if not os.path.exists(path):
--> 314         raise FileNotFoundError(errno.ENOENT, os.strerror(errno.ENOENT), path)
    315 
    316     if os.path.getsize(path) == 0:

FileNotFoundError: [Errno 2] No such file or directory: 'D:\\Favorites\x01. Programming\\Projects\\cell penetrating peptide supplemental.pdf'
True
True
False
???????为什么只在printos.path.existsfile为False时才会引发错误

我尝试了一个来自互联网的文件,它工作得非常好。我试图读取的文件没有URL。我无法从浏览器中查看它。我只能选择下载它。否则,我会尝试将其URL输入函数

更新: 我尝试了建议的解决方案

import tabula as tb
import pandas as pd


tables = tb.read_pdf(r"D:\Favorites\1. Programming\Projects\cell penetrating peptide supplemental.pdf", pages = "all", multiple_tables = True)
tables
得到这个:

Got stderr: Jun 28, 2020 11:17:13 AM org.apache.pdfbox.pdmodel.font.PDFont <init>
WARNING: Invalid ToUnicode CMap in font PKLNYU+CambriaMath
Jun 28, 2020 11:17:13 AM org.apache.pdfbox.pdmodel.font.PDSimpleFont toUnicode
WARNING: No Unicode mapping for 4 (33) in font PKLNYU+CambriaMath
Jun 28, 2020 11:17:13 AM org.apache.pdfbox.pdmodel.font.PDSimpleFont toUnicode
WARNING: No Unicode mapping for 3 (34) in font PKLNYU+CambriaMath
Jun 28, 2020 11:17:13 AM org.apache.pdfbox.pdmodel.font.PDSimpleFont toUnicode
WARNING: No Unicode mapping for 1 (35) in font PKLNYU+CambriaMath
Jun 28, 2020 11:17:13 AM org.apache.pdfbox.pdmodel.font.PDSimpleFont toUnicode
WARNING: No Unicode mapping for 2 (36) in font PKLNYU+CambriaMath
Jun 28, 2020 11:17:13 AM org.apache.pdfbox.pdmodel.font.PDFont <init>
WARNING: Invalid ToUnicode CMap in font FLAXFE+CambriaMath
Jun 28, 2020 11:17:13 AM org.apache.pdfbox.pdmodel.font.PDSimpleFont toUnicode
WARNING: No Unicode mapping for 2 (33) in font FLAXFE+CambriaMath
Jun 28, 2020 11:17:13 AM org.apache.pdfbox.pdmodel.font.PDSimpleFont toUnicode
WARNING: No Unicode mapping for 1 (34) in font FLAXFE+CambriaMath
Jun 28, 2020 11:17:13 AM org.apache.pdfbox.pdmodel.font.PDFont <init>
WARNING: Invalid ToUnicode CMap in font BPOUDD+CambriaMath
Jun 28, 2020 11:17:13 AM org.apache.pdfbox.pdmodel.font.PDSimpleFont toUnicode
WARNING: No Unicode mapping for 4 (33) in font BPOUDD+CambriaMath
Jun 28, 2020 11:17:13 AM org.apache.pdfbox.pdmodel.font.PDSimpleFont toUnicode
WARNING: No Unicode mapping for 3 (34) in font BPOUDD+CambriaMath
Jun 28, 2020 11:17:13 AM org.apache.pdfbox.pdmodel.font.PDSimpleFont toUnicode
WARNING: No Unicode mapping for 1 (35) in font BPOUDD+CambriaMath
Jun 28, 2020 11:17:13 AM org.apache.pdfbox.pdmodel.font.PDSimpleFont toUnicode
WARNING: No Unicode mapping for 2 (36) in font BPOUDD+CambriaMath
Jun 28, 2020 11:17:13 AM org.apache.pdfbox.pdmodel.font.PDFont <init>
WARNING: Invalid ToUnicode CMap in font DCUQIG+CambriaMath
Jun 28, 2020 11:17:13 AM org.apache.pdfbox.pdmodel.font.PDSimpleFont toUnicode
WARNING: No Unicode mapping for 1 (33) in font DCUQIG+CambriaMath
Jun 28, 2020 11:17:13 AM org.apache.pdfbox.pdmodel.font.PDFont <init>
WARNING: Invalid ToUnicode CMap in font DREOWG+CambriaMath
Jun 28, 2020 11:17:13 AM org.apache.pdfbox.pdmodel.font.PDSimpleFont toUnicode
WARNING: No Unicode mapping for 1 (33) in font DREOWG+CambriaMath
Jun 28, 2020 11:17:13 AM org.apache.pdfbox.pdmodel.font.PDFont <init>
WARNING: Invalid ToUnicode CMap in font EWGNLJ+CambriaMath
Jun 28, 2020 11:17:13 AM org.apache.pdfbox.pdmodel.font.PDSimpleFont toUnicode
WARNING: No Unicode mapping for 2 (33) in font EWGNLJ+CambriaMath
Jun 28, 2020 11:17:13 AM org.apache.pdfbox.pdmodel.font.PDSimpleFont toUnicode
WARNING: No Unicode mapping for 1 (34) in font EWGNLJ+CambriaMath
Jun 28, 2020 11:17:13 AM org.apache.pdfbox.pdmodel.font.PDFont <init>
WARNING: Invalid ToUnicode CMap in font PUHGFM+CambriaMath
Jun 28, 2020 11:17:13 AM org.apache.pdfbox.pdmodel.font.PDSimpleFont toUnicode
WARNING: No Unicode mapping for 2 (33) in font PUHGFM+CambriaMath
Jun 28, 2020 11:17:13 AM org.apache.pdfbox.pdmodel.font.PDSimpleFont toUnicode
WARNING: No Unicode mapping for 1 (34) in font PUHGFM+CambriaMath
Jun 28, 2020 11:17:13 AM org.apache.pdfbox.pdmodel.font.PDFont <init>
WARNING: Invalid ToUnicode CMap in font UHIZXI+CambriaMath
Jun 28, 2020 11:17:13 AM org.apache.pdfbox.pdmodel.font.PDSimpleFont toUnicode
WARNING: No Unicode mapping for 4 (33) in font UHIZXI+CambriaMath
Jun 28, 2020 11:17:13 AM org.apache.pdfbox.pdmodel.font.PDSimpleFont toUnicode
WARNING: No Unicode mapping for 3 (34) in font UHIZXI+CambriaMath
Jun 28, 2020 11:17:13 AM org.apache.pdfbox.pdmodel.font.PDSimpleFont toUnicode
WARNING: No Unicode mapping for 1 (35) in font UHIZXI+CambriaMath
Jun 28, 2020 11:17:13 AM org.apache.pdfbox.pdmodel.font.PDSimpleFont toUnicode
WARNING: No Unicode mapping for 2 (36) in font UHIZXI+CambriaMath
Jun 28, 2020 11:17:13 AM org.apache.pdfbox.pdmodel.font.PDFont <init>
WARNING: Invalid ToUnicode CMap in font UCENHU+CambriaMath
Jun 28, 2020 11:17:13 AM org.apache.pdfbox.pdmodel.font.PDSimpleFont toUnicode
WARNING: No Unicode mapping for 1 (33) in font UCENHU+CambriaMath

问题是tabla py有一个localize_file函数,它在read_pdf中调用。localize_文件将调用os.path.expanduser来扩展路径。例如,在类Unix系统中,~是用户主目录的别名。因此,os.path.expanduser将在Mac os X中执行以下扩展

>>> os.path.expanduser("~/Documents")
'/Users/username/Documents'
不幸的是,此函数还有另一个作用:它将\视为ANSI转义码的转义符号,因为它在函数内调用os.fspath。所以如果你跑

>>> os.path.expanduser("\125")
'U'
>>> os.fspath("\125")
'U'
在您的情况下,路径中的\1已转义到\x01,因此Windows找不到这样的目录。为了保持您的路径不变,请将其作为原始字符串传递,即在其前面放置一个r,如下所示

>>> os.path.expanduser(r"\125")
'\\125'
参考资料:


问题是tabla py有一个localize_file函数,它在read_pdf中调用。localize_文件将调用os.path.expanduser来扩展路径。例如,在类Unix系统中,~是用户主目录的别名。因此,os.path.expanduser将在Mac os X中执行以下扩展

>>> os.path.expanduser("~/Documents")
'/Users/username/Documents'
不幸的是,此函数还有另一个作用:它将\视为ANSI转义码的转义符号,因为它在函数内调用os.fspath。所以如果你跑

>>> os.path.expanduser("\125")
'U'
>>> os.fspath("\125")
'U'
在您的情况下,路径中的\1已转义到\x01,因此Windows找不到这样的目录。为了保持您的路径不变,请将其作为原始字符串传递,即在其前面放置一个r,如下所示

>>> os.path.expanduser(r"\125")
'\\125'
参考资料:


哇,太酷了。我真的很感激你的回答为我解开了这个逃避角色问题的谜团。谢谢你把它分解成那样。你知道为什么会出现这个新错误吗?这篇文章已经更新了@whilrun@ellie-lumen您真的不需要麻烦这些,因为它们是提取pdf文档时常见的警告。ToUnicode CMap由PDF提供,用于将PDF的CID字体转换为Unicode入口点,以便使用Unicode提取。但是,某些格式错误的PDF可能包含格式错误的CMap,或者其本身格式错误,因此无法转换某些字符,或者PDF文档中的某些无效字符。如果你想自己修好它,会有帮助,但通常不值得花时间。哇,那真的很酷。我真的很感激你的回答为我解开了这个逃避角色问题的谜团。谢谢你把它分解成那样。你知道为什么会出现这个新错误吗?这篇文章已经更新了@whilrun@ellie-lumen您真的不需要麻烦这些,因为它们是提取pdf文档时常见的警告。ToUnicode CMap由PDF提供,用于将PDF的CID字体转换为Unicode入口点,以便使用Unicode提取。但是,某些格式错误的PDF可能包含格式错误的CMap,或者其本身格式错误,因此无法转换某些字符,或者PDF文档中的某些无效字符。如果你想自己解决它,会有帮助,但通常不值得花时间。