Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/361.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 索引器:由于i';我改变了文件的读取方式_Python_Csv_Utf 16_Index Error - Fatal编程技术网

Python 索引器:由于i';我改变了文件的读取方式

Python 索引器:由于i';我改变了文件的读取方式,python,csv,utf-16,index-error,Python,Csv,Utf 16,Index Error,我正在尝试读取并重新格式化一个非常大的(2GB+).out文件,该文件的结构类似于csv。我以前使用过标准的open(),但没有这样的问题,但将其改为codecs.open(),因为它在某些字符方面有问题 它现在正在投掷 回溯(最近一次呼叫最后一次): 第21行,在 如果(r[5]==”): 索引器:在第一行列出索引超出范围,尽管在r[5]处肯定有一个元素。 (运行时间为0.301s) 导入系统 导入csv 导入日期时间 导入编解码器 maxInt=sys.maxsize 减量=真 减量化: 减

我正在尝试读取并重新格式化一个非常大的(2GB+).out文件,该文件的结构类似于csv。我以前使用过标准的open(),但没有这样的问题,但将其改为codecs.open(),因为它在某些字符方面有问题

它现在正在投掷

回溯(最近一次呼叫最后一次):
第21行,在
如果(r[5]==”):
索引器:在第一行列出索引超出范围
,尽管在r[5]处肯定有一个元素。 (运行时间为0.301s)

导入系统 导入csv 导入日期时间 导入编解码器 maxInt=sys.maxsize 减量=真 减量化: 减量=假 尝试: csv.字段大小限制(maxInt) 除溢出错误外: maxInt=int(maxInt/10) 减量=真 以codecs.open(“file.out”、“rU”、“utf-16-be”)作为源: rdr=csv.reader(源) 打开(“out.csv”,“w”,换行符=”)作为结果: wtr=csv.writer(结果) wtr.writerow((“第1列”、“第2列”、“第3列”、“等”) 对于rdr中的r: 如果(r[5]==”): 持续 wtr.writerow((datetime.datetime.strtime(r[5],“%m/%d/%Y”).strftime(“%Y-%m-%d”)、r[3]、r[7]、r[9]+r[10]+“”+r[12])) 使用utf-8抛出UnicodeDecodeError:“utf-8”编解码器无法解码位置12处的字节0xc9:无效的连续字节

使用latin-1或ISO-8859-1抛出UnicodeEncodeError:“charmap”编解码器无法对位置57-58中的字符进行编码:字符映射到,尽管运行了更多

输入文件如下所示:

"A00017","K","G","1999","4530","01/12/1999","","","","PEOPLE TO ELECT MANGINELLI","","","","258 MAGNIOLIA DRIVE","SELDEN","NY","11784","","","404.57","","","","","","","2","","NAA","07/22/1999 08:43:59"
"A00037","K","G","1999","999999","01/12/1999","","","","CITIZENS TO ELECT TEDISCO TO ASSEMBLY","","","","","","","","","","0","","","","","","","2","","",""
"A00037","K","N","1999","1693","01/15/1999","","","","OUTSTANDING LOAN","","","","2176 GUILDERLAND AVE","SCHENECTADY","NY","12306","","","10474.8","10474.8","","","OTHER","","PREVIOUS LOAN FROM JAMES TEDISCO","","P","JM","07/15/1999 15:08:17"
"A00037","J","N","2000","1694","01/13/2000","","","","OUTSTANDING LOAN","","","","2176 GUILDERLAND","SCHENECTADY","NY","12306","","","10474.8","10474.8","","","OTHER","","LOANS FROM PREVIOUS CAMPAIGNS FROM J","","P","JM","01/14/1900 16:35:09"
"A00037","K","X","2000","999999","","","","","","","","","","","","","","","","","","","","","","","","","07/20/2000 00:00:00"
"A00037","J","X","2001","999999","","","","","","","","","","","","","","","","","","","","","","","","","01/17/2001 00:00:00"
"A00037","K","X","2002","999999","","","","","","","","","","","","","","","","","","","","","","","","","07/19/2002 00:00:00"
"A00037","J","X","2003","999999","","","","","","","","","","","","","","","","","","","","","","","","","01/21/2003 00:00:00"
"A00037","K","X","2003","999999","","","","","","","","","","","","","","","","","","","","","","","","","07/16/2003 00:00:00"
"A00037","J","X","2004","999999","","","","","","","","","","","","","","","","","","","","","","","","","01/22/2004 00:00:00"
我走到今天多亏了:


在正在读取的“file.out”中,找出行中每个单元格元素之间的分隔字符。类似于“\t”-制表符或“,”-逗号,并将其传递给“delimiter”属性

尝试打印“r”,并查看列名或行中值之间的字符

rdr = csv.reader(source,delimiter=<separator>)
rdr=csv.reader(源代码,分隔符=)

能否在循环中打印
r
?而不必看到重现问题的文件,我们很难帮助您。您是否尝试过打印“r”并查看它是否是数组?尝试过将r打印到控制台并得到
UnicodeEncodeError:“charmap”编解码器无法对位置2-97中的字符进行编码:字符映射到
在编解码器中使用“utf-8”。打开(“file.out”,“rU”,“utf-8”)您是否可以尝试使用“latin-1”或“ISO-8859-1”编码而不是“utf-8”您是否可以尝试以下代码:rdr=csv.reader((line.replace('\0','')表示源代码中的行),delimiter=',')。您是否可以共享导致错误的数据中的行