Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/variables/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 巨蟒/熊猫:UnicodeDecodeError:';utf-8';编解码器可以';t解码位置133处的字节0xcd:无效的延续字节_Python_Pandas_Unidecoder - Fatal编程技术网

Python 巨蟒/熊猫:UnicodeDecodeError:';utf-8';编解码器可以';t解码位置133处的字节0xcd:无效的延续字节

Python 巨蟒/熊猫:UnicodeDecodeError:';utf-8';编解码器可以';t解码位置133处的字节0xcd:无效的延续字节,python,pandas,unidecoder,Python,Pandas,Unidecoder,我正在尝试构建一种方法来导入多种类型的CSV或EXCEL,并将其标准化。在某个csv出现之前,一切都很顺利,这给我带来了以下错误: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xcd in position 133: invalid continuation byte 我正在构建一组try/excepts来包含数据类型的变化,但是对于这一个,我不知道如何防止 if csv_or_excel_path[-3:]=='csv':

我正在尝试构建一种方法来导入多种类型的CSV或EXCEL,并将其标准化。在某个csv出现之前,一切都很顺利,这给我带来了以下错误:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xcd in position 133: invalid continuation byte
我正在构建一组try/excepts来包含数据类型的变化,但是对于这一个,我不知道如何防止

    if csv_or_excel_path[-3:]=='csv':
        try: table=pd.read_csv(csv_or_excel_path)
        except:
            try: table=pd.read_csv(csv_or_excel_path,sep=';')
            except:
                try:table=pd.read_csv(csv_or_excel_path,sep='\t')
                except:
                    try: table=pd.read_csv(csv_or_excel_path,encoding='utf-8')
                    except:
                        try: table=pd.read_csv(csv_or_excel_path,encoding='utf-8',sep=';')
                        except: table=pd.read_csv(csv_or_excel_path,encoding='utf-8',sep='\t')
顺便说一下,文件的分隔符是“;”

因此:

a) 我知道如果我能确定“位置133”中的角色是什么,那么追踪问题会更容易,但我不知道如何找到答案。有什么建议吗


b) 是否有人建议在“尝试/例外”序列中包含哪些内容以跳过此问题?

感谢@woblers和@FHTMitchell的支持。问题是CSV遇到了一个奇怪的问题:ISO-8859-1

我通过在try/except序列中添加几行来修复它。在这里你可以看到它的完整版本

    if csv_or_excel_path[-3:]=='csv':
        try: table=pd.read_csv(csv_or_excel_path)
        except:
            try: table=pd.read_csv(csv_or_excel_path,sep=';')
            except:
                try:table=pd.read_csv(csv_or_excel_path,sep='\t')
                except:
                    try: table=pd.read_csv(csv_or_excel_path,encoding='utf-8')
                    except:
                        try: table=pd.read_csv(csv_or_excel_path,encoding='utf-8',sep=';')
                        except:
                            try: table=pd.read_csv(csv_or_excel_path,encoding='utf-8',sep='\t')
                            except:
                                try:table=pd.read_csv(csv_or_excel_path,encoding = "ISO-8859-1", sep=";")
                                except:
                                    try: table=pd.read_csv(csv_or_excel_path,encoding = "ISO-8859-1", sep=";")
                                    except: table=pd.read_csv(csv_or_excel_path,encoding = "ISO-8859-1", sep="\t")

感谢@woblers和@FHTMitchell的支持。问题是CSV遇到了一个奇怪的问题:ISO-8859-1

我通过在try/except序列中添加几行来修复它。在这里你可以看到它的完整版本

    if csv_or_excel_path[-3:]=='csv':
        try: table=pd.read_csv(csv_or_excel_path)
        except:
            try: table=pd.read_csv(csv_or_excel_path,sep=';')
            except:
                try:table=pd.read_csv(csv_or_excel_path,sep='\t')
                except:
                    try: table=pd.read_csv(csv_or_excel_path,encoding='utf-8')
                    except:
                        try: table=pd.read_csv(csv_or_excel_path,encoding='utf-8',sep=';')
                        except:
                            try: table=pd.read_csv(csv_or_excel_path,encoding='utf-8',sep='\t')
                            except:
                                try:table=pd.read_csv(csv_or_excel_path,encoding = "ISO-8859-1", sep=";")
                                except:
                                    try: table=pd.read_csv(csv_or_excel_path,encoding = "ISO-8859-1", sep=";")
                                    except: table=pd.read_csv(csv_or_excel_path,encoding = "ISO-8859-1", sep="\t")

就记录而言,这可能比多次
try/except
s要好

def read_csv(filepath):
     if os.path.splitext(filepath)[1] != '.csv':
          return  # or whatever
     seps = [',', ';', '\t']                    # ',' is default
     encodings = [None, 'utf-8', 'ISO-8859-1']  # None is default
     for sep in seps:
         for encoding in encodings:
              try:
                  return pd.read_csv(filepath, encoding=encoding, sep=sep)
              except Exception:  # should really be more specific 
                  pass
     raise ValueError("{!r} is has no encoding in {} or seperator in {}"
                      .format(filepath, encodings, seps))

就记录而言,这可能比多次
try/except
s要好

def read_csv(filepath):
     if os.path.splitext(filepath)[1] != '.csv':
          return  # or whatever
     seps = [',', ';', '\t']                    # ',' is default
     encodings = [None, 'utf-8', 'ISO-8859-1']  # None is default
     for sep in seps:
         for encoding in encodings:
              try:
                  return pd.read_csv(filepath, encoding=encoding, sep=sep)
              except Exception:  # should really be more specific 
                  pass
     raise ValueError("{!r} is has no encoding in {} or seperator in {}"
                      .format(filepath, encodings, seps))

另一种可能性是这样做

with open(path_to_file, encoding="utf8", errors="ignore") as f:
    table = pd.read_csv(f, sep=";")

默认情况下,
errors=“ignore”
将从
read()
调用中忽略有问题的字节序列。还可以为此类字节序列提供填充值。但总的来说,这应该可以减少大量痛苦的错误处理和嵌套的尝试例外的需要。

另一种可能性是这样做

with open(path_to_file, encoding="utf8", errors="ignore") as f:
    table = pd.read_csv(f, sep=";")

默认情况下,
errors=“ignore”
将从
read()
调用中忽略有问题的字节序列。还可以为此类字节序列提供填充值。但一般来说,这应该可以减少大量痛苦的错误处理和嵌套的重试例外情况。

您能给我们看一个CSV示例吗?@FHTMitchell实际上我正在尝试复制/粘贴它,但当我尝试复制/粘贴它时,会将文本框头清空,因为我认为它意味着文件中的第133个字符。这可能是《informação》中的角色之一。在加载之前,您需要先将文件转换为UTF格式。您能给我们看一个CSV的示例吗?@FHTMitchell实际上我正在尝试复制/粘贴它,但当我尝试复制/粘贴它时,会将文本框标题清空,因为我认为它意味着文件中的第133个字符。这可能是《informação》中的角色之一。在加载之前,您需要先将文件转换为UTF格式。