Python 3.x python3中的chardet和未知文件编码_Python 3.x

Python 3.x python3中的chardet和未知文件编码

python-3.x

Python 3.x python3中的chardet和未知文件编码,python-3.x,Python 3.x,我使用chardet识别我的文件编码，但发生了以下错误： fh= open("file", mode="r") sc= chardet.detect(fh) Traceback (most recent call last): File "/home/alireza/test.py", line 19, in <module> sc= chardet.detect(fh) File "/usr/lib/python3/dist-packages/chardet/__

我使用chardet识别我的文件编码，但发生了以下错误：

fh= open("file", mode="r")
sc= chardet.detect(fh)

Traceback (most recent call last):
  File "/home/alireza/test.py", line 19, in <module>
    sc= chardet.detect(fh)
  File "/usr/lib/python3/dist-packages/chardet/__init__.py", line 24, in detect
    u.feed(aBuf)
  File "/usr/lib/python3/dist-packages/chardet/universaldetector.py", line 65, in feed
    aLen = len(aBuf)
TypeError: object of type '_io.TextIOWrapper' has no len()

fh=open（“文件”，mode=“r”）
sc=字符检测（fh）
回溯（最近一次呼叫最后一次）：
文件“/home/alireza/test.py”，第19行，在
sc=字符检测（fh）
文件“/usr/lib/python3/dist-packages/chardet/_-init___.py”，第24行，在detect中
u、 饲料（阿布夫）
文件“/usr/lib/python3/dist packages/chardet/universaldetector.py”，第65行，在提要中
aLen=len（aBuf）
TypeError:类型为“\u io.TextIOWrapper”的对象没有len（）

我不能在不知道编码的情况下打开文件

fh= open("file", mode="r").read()
sc= chardet.detect(fh)

Traceback (most recent call last):
  File "/home/alireza/workspacee/makecdown/test.py", line 21, in <module>
    fh= open("910.srt", mode="r").read()
  File "/usr/lib/python3.2/codecs.py", line 300, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc7 in position 34: invalid continuation byte

fh=open（“文件”，mode=“r”）.read（）
sc=字符检测（fh）
回溯（最近一次呼叫最后一次）：
文件“/home/alireza/workspace/makecdown/test.py”，第21行，在
fh=打开（“910.srt”，mode=“r”）.read（）
文件“/usr/lib/python3.2/codecs.py”，第300行，解码
（结果，消耗）=自身缓冲区解码（数据，自身错误，最终）
UnicodeDecodeError:“utf-8”编解码器无法解码位置34中的字节0xc7:无效的连续字节

如何在没有打开文件的情况下使用chardet？！或者在打开后/打开前找到文件编码的方法？

尝试这样打开文件

fh= open("file", mode="rb")

命令行工具如果这不起作用，请尝试chardet的命令行工具。说明来自：

chardet附带了一个命令行脚本，该脚本在一个或多个文件的编码：

% chardetect.py somefile someotherfile
somefile: windows-1252 with confidence 0.5
someotherfile: ascii with confidence 1.0

试着像这样打开文件

fh= open("file", mode="rb")

命令行工具如果这不起作用，请尝试chardet的命令行工具。说明来自：

chardet附带了一个命令行脚本，该脚本在一个或多个文件的编码：

% chardetect.py somefile someotherfile
somefile: windows-1252 with confidence 0.5
someotherfile: ascii with confidence 1.0

这不是一个直接的答案，但是您可以在这里找到它在Python3中如何工作的描述。在研究之后，您可能会找到如何检测另一种特定编码的方法

该代码最初源于。你也可以在那里找到更多的信息。或者寻找与Seamonkey相关的更高级的Python包

这不是一个直接的答案，但是您可以在这里找到它在Python 3中如何工作的描述。在研究之后，您可能会找到如何检测另一种特定编码的方法

该代码最初源于。你也可以在那里找到更多的信息。或者寻找与Seamonkey相关的更高级的Python包

我使用“rb”模式，它的工作，但chardet检测错误的编码！MacCyrillic（置信度：0.30），当编码为utf8时，输出的编码无效，但实际编码为windows-1256，编码为utf8有效。有没有其他方法可以找到文件的编码并将其更改为utf8？您试图转换的文件如何

chardet

根据语言猜测编码。因此，如果您没有正确编码的有意义文本，

chardet

可能会失败。我使用py3和命令行工具输出与内部chardet输出相同的内容（第一条注释），chardet无法在我的语言（波斯语）上使用windows-1256或阿拉伯语编码文本。感谢您的支持。有没有其他方法可以找到文件的编码并将其更改为utf8？我使用“rb”模式，但chardet无法检测到错误的编码！MacCyrillic（置信度：0.30），当编码为utf8时，输出的编码无效，但实际编码为windows-1256，编码为utf8有效。有没有其他方法可以找到文件的编码并将其更改为utf8？您试图转换的文件如何

chardet

根据语言猜测编码。因此，如果您没有正确编码的有意义文本，

chardet

可能会失败。我使用py3和命令行工具输出与内部chardet输出相同的内容（第一条注释），chardet无法在我的语言（波斯语）上使用windows-1256或阿拉伯语编码文本。感谢您的支持。有没有其他方法可以找到文件的编码并将其更改为utf8？