Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/.htaccess/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python numpy genfromtxt/pandas read\u csv;忽略引号内的逗号_Python_File Io_Numpy_Pandas_Genfromtxt - Fatal编程技术网

Python numpy genfromtxt/pandas read\u csv;忽略引号内的逗号

Python numpy genfromtxt/pandas read\u csv;忽略引号内的逗号,python,file-io,numpy,pandas,genfromtxt,Python,File Io,Numpy,Pandas,Genfromtxt,考虑一个文件,a.dat,其内容如下: address 1, address 2, address 3, num1, num2, num3 address 1, address 2, address 3, 1.0, 2.0, 3 address 1, address 2, "address 3, address4", 1.0, 2.0, 3 我正试着用英语输入。但是,该函数在第3行中看到一个附加列。我得到了一个类似的错误: 及 我试图找到一个输入参数来补偿这一点。我不介意我最终得到的是nump

考虑一个文件,
a.dat
,其内容如下:

address 1, address 2, address 3, num1, num2, num3
address 1, address 2, address 3, 1.0, 2.0, 3
address 1, address 2, "address 3, address4", 1.0, 2.0, 3
我正试着用英语输入。但是,该函数在第3行中看到一个附加列。我得到了一个类似的错误:

我试图找到一个输入参数来补偿这一点。我不介意我最终得到的是numpy ndarray还是pandas数据帧

我是否可以在
genfromtxt
和/或
read\u csv
中设置一个参数,让我忽略语音标记中的逗号?

我注意到,
read_csv
包含一个
quotechar='”
参数,定义如下:

quotechar:字符串(长度1)用于表示开始的字符 和引用项的结尾。引用项可以包括分隔符和 它将被忽略

在我看来,这就像read_csv在默认情况下适用于我的案例,但事实并非如此

我可以看出,我可以预处理文件以去掉逗号-如果可能的话,我希望避免这样做,但如果这是唯一的方法,我希望得到建议。

Python的内置模块可以处理此类数据

with open("a.dat") as f:
    reader = csv.reader(f, skipinitialspace=True)
    header = next(reader)
    dtype = numpy.dtype(zip(header, ['S20', 'S20', 'S20', 'f8', 'f8', 'f8']))
    data = numpy.fromiter(itertools.imap(tuple, reader), dtype=dtype)
刚刚找到:

我缺少的关键参数是
skipinitialspace=True
——这“处理逗号分隔符后的空格”


这是有效的:-)

为什么quotechar不起作用?有人吗?
pandas read_csv sort of works - but it gives me an unaligned data structure:

pd.read_csv('a.dat')

pandas.parser.CParserError: Error tokenizing data. C error: Expected 6 fields in line 3, saw 7
with open("a.dat") as f:
    reader = csv.reader(f, skipinitialspace=True)
    header = next(reader)
    dtype = numpy.dtype(zip(header, ['S20', 'S20', 'S20', 'f8', 'f8', 'f8']))
    data = numpy.fromiter(itertools.imap(tuple, reader), dtype=dtype)
a=pd.read_csv('a.dat',quotechar='"',skipinitialspace=True)

   address 1  address 2            address 3  num1  num2  num3
0  address 1  address 2            address 3     1     2     3
1  address 1  address 2  address 3, address4     1     2     3