Python csv文件解析中的歧义
我正在尝试解析包含以下内容的csv文件:Python csv文件解析中的歧义,python,csv,Python,Csv,我正在尝试解析包含以下内容的csv文件: # country,title1,title2,type GB,Fast Friends,Burn Notice, S:4, E:2,episode, SE,The Spiderwick Chronicles,"SPIDERWICK CHRONICLES, THE",movie, 预期产出为: ['SE', 'The Spiderwick Chronicles', '"SPIDERWICK CHRONICLES, THE"', 'movie'] ['G
# country,title1,title2,type
GB,Fast Friends,Burn Notice, S:4, E:2,episode,
SE,The Spiderwick Chronicles,"SPIDERWICK CHRONICLES, THE",movie,
预期产出为:
['SE', 'The Spiderwick Chronicles', '"SPIDERWICK CHRONICLES, THE"', 'movie']
['GB', 'Fast Friends', 'Burn Notice, S:4, E:2', 'episode']
问题是,“标题”字段中的逗号没有转义。我尝试使用csvreader
以及字符串和正则表达式解析,但无法获得明确的匹配
是否有可能在一半字段上使用不带分界符的逗号准确解析此文件?或者,是否需要创建新的csv 如果假设所有逗号都出现在
标题2
中,您可能会玩一个把戏。否则,您将有不明确的数据
strings = ['SE,The Spiderwick Chronicles,"SPIDERWICK CHRONICLES, THE",movie,'
,'GB,Fast Friends,Burn Notice, S:4, E:2,episode,'
]
for string in strings:
xs = string.split(',')
country = xs[0]
title1 = xs[1]
title2 = ' '.join(xs[2:-2])
mtype = xs[-2]
print [country, title1, title2, mtype]
输出:
['SE', 'The Spiderwick Chronicles', '"SPIDERWICK CHRONICLES THE"', 'movie']
['GB', 'Fast Friends', 'Burn Notice S:4 E:2', 'episode']
您可以使用RegEx(导入re
)-
匹配(\“*\”,)|(.*)
通过这种方式,您可以查找[quoted string]或[any string]。如果字段中有逗号,我会将excel保存为文本文件,字段之间用制表符隔开。您的预期输出是什么?@AvinashRaj请查看更新可能的复制方式
刻录通知,s:4,E:2
显示为单个字段?GB、Fast、Friends、four、Burn Notice、s:4、E:2、插曲、的预期输出是什么?我如何区分这一部分属于标题1,这一部分属于标题2?