如何使Python只看逗号,在分隔符之前或之后没有空格
我有一个csv文件,我正试图读入python,操纵它,然后写入另一个csv文件 我当前的问题是,尽管文件是以逗号分隔的,但并非所有逗号都是分隔符 只有前面和/或后面有空格的而不是的逗号才应算作分隔符。(仅限“、”非“、”或“、”) 下面是我的代码的样子:如何使Python只看逗号,在分隔符之前或之后没有空格,python,csv,delimiter,python-3.5,Python,Csv,Delimiter,Python 3.5,我有一个csv文件,我正试图读入python,操纵它,然后写入另一个csv文件 我当前的问题是,尽管文件是以逗号分隔的,但并非所有逗号都是分隔符 只有前面和/或后面有空格的而不是的逗号才应算作分隔符。(仅限“、”非“、”或“、”) 下面是我的代码的样子: import csv #open file for reading with open(mypath, 'r', encoding = 'utf_8') as csvfile: myfile = list(csv.reader(csv
import csv
#open file for reading
with open(mypath, 'r', encoding = 'utf_8') as csvfile:
myfile = list(csv.reader(csvfile, dialect = 'excel', delimiter = ','))
#specifying columns to be deleted
BadCols = [29,28,27,25,21,20,19,18,16,15,14,13,12,11,8,7,4,3]
#Loop through column indices to be deleted
for col in BadCols:
#Loop through each row to delete columns
for i, row in enumerate(myfile):
#Delete Column, which is basically a list item at that row
myfile[i].pop(col)
#Open file for writing
with open(mypath2, "w", encoding = 'utf_8', newline='') as csvfile:
csv_file = csv.writer(csvfile, dialect = 'excel', delimiter = ',')
for i, row in enumerate(myfile):
for j, col in enumerate(row):
csvfile.write('%s, ' %col)
csvfile.write('\n')
csvfile.close
Date,Name,City
May 30, 2016,Ryan,Boston
我的数据如下所示:
import csv
#open file for reading
with open(mypath, 'r', encoding = 'utf_8') as csvfile:
myfile = list(csv.reader(csvfile, dialect = 'excel', delimiter = ','))
#specifying columns to be deleted
BadCols = [29,28,27,25,21,20,19,18,16,15,14,13,12,11,8,7,4,3]
#Loop through column indices to be deleted
for col in BadCols:
#Loop through each row to delete columns
for i, row in enumerate(myfile):
#Delete Column, which is basically a list item at that row
myfile[i].pop(col)
#Open file for writing
with open(mypath2, "w", encoding = 'utf_8', newline='') as csvfile:
csv_file = csv.writer(csvfile, dialect = 'excel', delimiter = ',')
for i, row in enumerate(myfile):
for j, col in enumerate(row):
csvfile.write('%s, ' %col)
csvfile.write('\n')
csvfile.close
Date,Name,City
May 30, 2016,Ryan,Boston
以下是我在使用excel打开文件时希望看到的内容:
Date Name City
May 30, 2016 Ryan Boston
以下是我从Excel中实际看到的内容:
Date [Blank column name] Name City
May 30 2016 Ryan Boston
因此,日期被读取为两个元素,而不是一个
非常感谢您的帮助。正则表达式可能是您最好的选择:
import re
patt = re.compile(r"\b,\b")
with open("in.csv") as f:
for row in map(patt.split, f):
print(row)
这将给你:
['Date', 'Name', 'City\n']
['May 30, 2016', 'Ryan', 'Boston']
您将不得不处理尾随空格,但这不应该是一个大问题。显然,如果您将“foo,bar”
作为一个名称,您也会遇到问题,例如,如果不是这样,重新使用方法就可以了
另一种选择可能是用一个空格替换”、“
或”、“
”:
import csv
import re
patt = re.compile(r"\s(,)|(,)\s")
with open("in.csv") as f:
for line in csv.reader(map(lambda s: patt.sub(" ", s), f)):
print(line)
因此:
Date,Name,City
May 30, 2016,Ryan,Boston
May 31 ,2016,foo,Narnia
你会得到:
['Date', 'Name', 'City']
['May 30 2016', 'Ryan', 'Boston']
['May 31 2016', 'foo', 'Narnia']
CSV和一个字段分隔符也被用作没有“引用”的内容-shiver,我建议作为fast hack,首先用一个带外字符(比如管道(|))替换所有“好”分隔符,该字符不会出现在文件中的其他位置,而不是在该字符上拆分,或者让CSV模块用一种特殊方言或自动检测对其进行解析,这样就完成了。但也许在这里的晚上太晚了;-)或者,如果从右侧开始,则通过简单的
line.rsplit(',',2)
或类似方式从右侧开始解析,始终有两个逗号是“好的”+1对于@padraic cunningham的回答,您拥有的不是正确的CSV文件。修复文件…对于那些面临相同问题的人,您也可以尝试Pandas库,尤其是如果Padraic建议的解决方案不适合您。它很容易使用。