使用不常见的分隔符读取csv
我有一个csv文件,它使用þ作为引号,段落符号作为逗号分隔的值 使用子类csv.dial不起作用。Pandas没有将þ值解释为字符串 有什么想法吗使用不常见的分隔符读取csv,csv,pandas,Csv,Pandas,我有一个csv文件,它使用þ作为引号,段落符号作为逗号分隔的值 使用子类csv.dial不起作用。Pandas没有将þ值解释为字符串 有什么想法吗 # This works when the delimiters are more standard (; ") # But really trying to make it work with the ASCII chars commented out below import csv f = open('./data/Test_Quote_S
# This works when the delimiters are more standard (; ")
# But really trying to make it work with the ASCII chars commented out below
import csv
f = open('./data/Test_Quote_SemiColon.dat')
class my_dialect(csv.Dialect):
lineterminator = '\n'
delimiter = ';' # ASCII: 020
quotechar = '"' # ASCII: 254
reader = csv.reader(f, dialect=my_dialect, quoting=1)
for line in reader:
print line
以下是引号和分号数据:
贝吉德;ENDID;名称到从…起复写的副本;密件抄送
ABC_001;ABC_004;史密斯,约翰;多伊,约翰;罗,简;;
ABC_005;ABC_007;史密斯,约翰;多伊,约翰;;;
ABC_008;ABC_012;多伊,约翰;多伊,约翰;史密斯,约翰 我发现literal和chr254都能解析这个。这看起来对吗
>>> import StringIO
>>> txt = '''þBEGIDþþENDIDþþNameþþToþþFromþþCCþþBCCþ þABC_001þþaBC_004þþSmith, JohnþþDoe, JohnþRoe, Janeþþþþþ þABC_005þþaBC_007þþSmith, JohnþþDoe, Johnþþþþþþ þABC_008þþaBC_012þþDoe, JohnþþDoe, JohnþSmith, Johnþþþþþ'''
>>> reader = csv.reader(StringIO.StringIO(txt), delimiter=',', quotechar=chr(254))
>>> for line in reader:
... for entry in line:
... print unicode(entry, 'utf8')
...
þBEGIDþþENDIDþþNameþþToþþFromþþCCþþBCCþ þABC_001þþaBC_004þþSmith
JohnþþDoe
JohnþRoe
Janeþþþþþ þABC_005þþaBC_007þþSmith
JohnþþDoe
Johnþþþþþþ þABC_008þþaBC_012þþDoe
JohnþþDoe
JohnþSmith
Johnþþþþþ
txt的回声如下:
>>> txt
'\xc3\xbeBEGID\xc3\xbe\xc3\xbeENDID\xc3\xbe\xc3\xbeName\xc3\xbe\xc3\xbeTo\xc3\xbe\xc3\xbeFrom\xc3\xbe\xc3\xbeCC\xc3\xbe\xc3\xbeBCC\xc3\xbe \xc3\xbeABC_001\xc3\xbe\xc3\xbeaBC_004\xc3\xbe\xc3\xbeSmith, John\xc3\xbe\xc3\xbeDoe, John\xc3\xbeRoe, Jane\xc3\xbe\xc3\xbe\xc3\xbe\xc3\xbe\xc3\xbe \xc3\xbeABC_005\xc3\xbe\xc3\xbeaBC_007\xc3\xbe\xc3\xbeSmith, John\xc3\xbe\xc3\xbeDoe, John\xc3\xbe\xc3\xbe\xc3\xbe\xc3\xbe\xc3\xbe\xc3\xbe \xc3\xbeABC_008\xc3\xbe\xc3\xbeaBC_012\xc3\xbe\xc3\xbeDoe, John\xc3\xbe\xc3\xbeDoe, John\xc3\xbeSmith, John\xc3\xbe\xc3\xbe\xc3\xbe\xc3\xbe\xc3\xbe'
您能否给出一个小示例,说明您的数据经过csv文件的一部分,或类似于csv文件并复制问题的内容,以及您使用pandas读取该文件的代码。csv上使用的是什么编码?你试过改变编码吗?你知道这些符号的ASCII码吗,所以你可以使用sep='something'和quote='something'?仅供参考,使用iPython Notebook 2.2、Python 2.7.6,我发现StringIO有一个错误。重要的是什么?接近-但不完全是。我认为它需要一个lineterminator值。如果没有单引号,它应该是这样的:['BEGID','ENDID','Name','To','From','CC','BCC']['ABC_001','ABC_004','Smith,John','Roe,Jane',]['ABC_005','ABC_007','Smith,John','Doe,John',',['ABC_008','ABC_012','Doe,John','Smith,John','