Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/pandas/4.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
使用不常见的分隔符读取csv_Csv_Pandas - Fatal编程技术网

使用不常见的分隔符读取csv

使用不常见的分隔符读取csv,csv,pandas,Csv,Pandas,我有一个csv文件,它使用þ作为引号,段落符号作为逗号分隔的值 使用子类csv.dial不起作用。Pandas没有将þ值解释为字符串 有什么想法吗 # This works when the delimiters are more standard (; ") # But really trying to make it work with the ASCII chars commented out below import csv f = open('./data/Test_Quote_S

我有一个csv文件,它使用þ作为引号,段落符号作为逗号分隔的值

使用子类csv.dial不起作用。Pandas没有将þ值解释为字符串

有什么想法吗

# This works when the delimiters are more standard (; ")
# But really trying to make it work with the ASCII chars commented out below

import csv

f = open('./data/Test_Quote_SemiColon.dat')

class my_dialect(csv.Dialect):
    lineterminator = '\n'
    delimiter = ';'  # ASCII: 020
    quotechar = '"'  # ASCII: 254

reader = csv.reader(f, dialect=my_dialect, quoting=1)

for line in reader:
    print line
以下是引号和分号数据:

贝吉德;ENDID;名称到从…起复写的副本;密件抄送 ABC_001;ABC_004;史密斯,约翰;多伊,约翰;罗,简;; ABC_005;ABC_007;史密斯,约翰;多伊,约翰;;;
ABC_008;ABC_012;多伊,约翰;多伊,约翰;史密斯,约翰

我发现literal和chr254都能解析这个。这看起来对吗

>>> import StringIO
>>> txt = '''þBEGIDþþENDIDþþNameþþToþþFromþþCCþþBCCþ þABC_001þþaBC_004þþSmith, JohnþþDoe, JohnþRoe, Janeþþþþþ þABC_005þþaBC_007þþSmith, JohnþþDoe, Johnþþþþþþ þABC_008þþaBC_012þþDoe, JohnþþDoe, JohnþSmith, Johnþþþþþ'''
>>> reader = csv.reader(StringIO.StringIO(txt), delimiter=',', quotechar=chr(254))
>>> for line in reader: 
...     for entry in line:
...         print unicode(entry, 'utf8')
... 
þBEGIDþþENDIDþþNameþþToþþFromþþCCþþBCCþ þABC_001þþaBC_004þþSmith
 JohnþþDoe
 JohnþRoe
 Janeþþþþþ þABC_005þþaBC_007þþSmith
 JohnþþDoe
 Johnþþþþþþ þABC_008þþaBC_012þþDoe
 JohnþþDoe
 JohnþSmith
 Johnþþþþþ
txt的回声如下:

>>> txt
'\xc3\xbeBEGID\xc3\xbe\xc3\xbeENDID\xc3\xbe\xc3\xbeName\xc3\xbe\xc3\xbeTo\xc3\xbe\xc3\xbeFrom\xc3\xbe\xc3\xbeCC\xc3\xbe\xc3\xbeBCC\xc3\xbe \xc3\xbeABC_001\xc3\xbe\xc3\xbeaBC_004\xc3\xbe\xc3\xbeSmith, John\xc3\xbe\xc3\xbeDoe, John\xc3\xbeRoe, Jane\xc3\xbe\xc3\xbe\xc3\xbe\xc3\xbe\xc3\xbe \xc3\xbeABC_005\xc3\xbe\xc3\xbeaBC_007\xc3\xbe\xc3\xbeSmith, John\xc3\xbe\xc3\xbeDoe, John\xc3\xbe\xc3\xbe\xc3\xbe\xc3\xbe\xc3\xbe\xc3\xbe \xc3\xbeABC_008\xc3\xbe\xc3\xbeaBC_012\xc3\xbe\xc3\xbeDoe, John\xc3\xbe\xc3\xbeDoe, John\xc3\xbeSmith, John\xc3\xbe\xc3\xbe\xc3\xbe\xc3\xbe\xc3\xbe'

您能否给出一个小示例,说明您的数据经过csv文件的一部分,或类似于csv文件并复制问题的内容,以及您使用pandas读取该文件的代码。csv上使用的是什么编码?你试过改变编码吗?你知道这些符号的ASCII码吗,所以你可以使用sep='something'和quote='something'?仅供参考,使用iPython Notebook 2.2、Python 2.7.6,我发现StringIO有一个错误。重要的是什么?接近-但不完全是。我认为它需要一个lineterminator值。如果没有单引号,它应该是这样的:['BEGID','ENDID','Name','To','From','CC','BCC']['ABC_001','ABC_004','Smith,John','Roe,Jane',]['ABC_005','ABC_007','Smith,John','Doe,John',',['ABC_008','ABC_012','Doe,John','Smith,John','