Python 从csv读取到pandas、chardet和error bad lines选项在我的情况下不起作用_Python_Csv_Pandas_Unicode

Python 从csv读取到pandas、chardet和error bad lines选项在我的情况下不起作用

python csv pandas unicode

Python 从csv读取到pandas、chardet和error bad lines选项在我的情况下不起作用,python,csv,pandas,unicode,Python,Csv,Pandas,Unicode,在我写这篇文章之前，我检查了类似的问题，并且我尝试使用try/except。。。在那里，try什么也不做，除了打印错误的行，但无法解决我的问题。因此，目前我有： import pandas as pd import chardet # Read the file with open("full_data.csv", 'rb') as f: result = chardet.detect(f.read()) # or readline if the file is large df1

在我写这篇文章之前，我检查了类似的问题，并且我尝试使用try/except。。。在那里，try什么也不做，除了打印错误的行，但无法解决我的问题。因此，目前我有：

import pandas as pd
import chardet

# Read the file
with open("full_data.csv", 'rb') as f:
    result = chardet.detect(f.read())  # or readline if the file is large

df1 = pd.read_csv("full_data.csv", sep=';',
                   encoding=result['encoding'], error_bad_lines=False, low_memory=False, quoting=csv.QUOTE_NONE)

但我仍然得到了错误：

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xba in position 9: invalid start byte

在打开的csv中是否有类似错误='replace'的选项？或任何其他解决方案

使用引擎选项可以解决我的问题：

df1 = pd.read_csv("full_data.csv", sep=";", engine="python")

首先，您以二进制模式读取文件以确定编码，然后将其作为纯文本提供给df。你能发布几行你的csv，或者至少确定内容吗？我想有一些俄文、拉丁文和中文字符，我不介意用“？”或其他任何东西来代替它们，但是首先，为什么你要使用二进制阅读模式来确定编码？当我以纯文本形式阅读时，我也得到，：UnicodeDecodeError:“charmap”编解码器无法解码位置3785643:字符映射到的字节0x9e