Python解析CSV字符串和可能的集合
我有一个CSV字符串,其中一些项目可能被Python解析CSV字符串和可能的集合,python,csv,parsing,Python,Csv,Parsing,我有一个CSV字符串,其中一些项目可能被{}括起来,里面有逗号。我想收集列表中的字符串值 收集列表中的值的最具python风格的方法是什么 示例1:'a,b,c',预期输出['a','b','c'] 示例2:{aa,ab},b,c',预期输出[{aa,ab}',b',c'] 示例3:'{aa,ab},{bb,b},c',预期输出['{aa,ab}','{bb,b}',c'] 我曾经尝试过使用s.split(','),它适用于示例1,但对于案例2和案例3会很糟糕 我相信这个问题()与我的问题非常相
{}
括起来,里面有逗号。我想收集列表中的字符串值
收集列表中的值的最具python风格的方法是什么
示例1:'a,b,c'
,预期输出['a','b','c']
示例2:{aa,ab},b,c'
,预期输出[{aa,ab}',b',c']
示例3:'{aa,ab},{bb,b},c'
,预期输出['{aa,ab}','{bb,b}',c']
我曾经尝试过使用s.split(',')
,它适用于示例1,但对于案例2和案例3会很糟糕
我相信这个问题()与我的问题非常相似。但我无法找到合适的正则表达式语法来使用。解决方案实际上非常相似:
import re
PATTERN = re.compile(r'''\s*((?:[^,{]|\{[^{]*\})+)\s*''')
data = '{aa,ab}, {bb,b}, c'
print(PATTERN.split(data)[1::2])
将提供:
['{aa,ab}', '{bb,b}', 'c']
一种更具可读性的方式(至少对我来说)是解释您要查找的内容:括号{}之间的内容或仅包含字母数字字符的内容:
import re
examples = [
'a,b,c',
'{aa,ab}, b, c',
'{aa,ab}, {bb,b}, c'
]
for example in examples:
print(re.findall(r'(\{.+?\}|\w+)', example))
它打印
['a', 'b', 'c']
['{aa,ab}', 'b', 'c']
['{aa,ab}', '{bb,b}', 'c']
请注意,不必使用正则表达式,只需使用纯Python即可:
s = '{aa,ab}, {bb,b}, c'
commas = [i for i, c in enumerate(s) if c == ',' and \
s[:i].count('{') == s[:i].count('}')]
[s[2:b] for a, b in zip([-2] + commas, commas + [None])]
#['{aa,ab}', '{bb,b}', 'c']
一种更简单的纯python方法将{}替换为“”:
def parseCSV(string):
results = []
current = ''
quoted = False
quoting = False
for i in range(0, len(string)):
currentletter = string[i]
if currentletter == '"':
if quoted == True:
if quoting == True:
current = current + currentletter
quoting = False
else:
quoting = True
else:
quoted = True
quoting = False
else:
shouldCheck = False
if quoted == True:
if quoting == True:
quoted = False
quoting = False
shouldCheck = True
else:
current = current + currentletter
else:
shouldCheck = True
if shouldCheck == True:
if currentletter == ',':
results.append(current)
current = ''
else:
current = current + currentletter
results.append(current)
return results