用于多个分隔符(包括双引号)的Python正则表达式

用于多个分隔符(包括双引号)的Python正则表达式,python,regex,parsing,Python,Regex,Parsing,使用正则表达式的python代码可以执行类似的操作 输入: > https://test.com, 2017-08-14, "This is the title with , and "anything" in it", "This is the paragraph also with , and "anything" in it" ['https://test.com', '2017-08-14', 'This is the title with , and "anything" in

使用正则表达式的python代码可以执行类似的操作

输入:

> https://test.com, 2017-08-14, "This is the title with , and "anything" in it", "This is the paragraph also with , and "anything" in it"
['https://test.com', '2017-08-14', 'This is the title with , and "anything" in it', 'This is the paragraph also with , and "anything" in it']
理想输出:

> https://test.com, 2017-08-14, "This is the title with , and "anything" in it", "This is the paragraph also with , and "anything" in it"
['https://test.com', '2017-08-14', 'This is the title with , and "anything" in it', 'This is the paragraph also with , and "anything" in it']

您可以使用几种分割方法

普通的内置split方法接受分隔符作为参数,并将执行tin上写的操作,精确地按照指定的分隔符拆分字符串,并将其作为列表返回

在您的例子中,您需要的分隔符是“,”但只有不在引号内的逗号。在一般情况下,对于类似您可以做的事情:

foo = 'https://test.com, 2017-08-14, "This is the title with , and "anything" in it", "This is the paragraph also with , and "anything" in it"'


print foo.split(',')
#but this has the caveat that you don't have any ','s within your input as those will become delimitation points as well, which you do not want.
在这种特殊情况下,您还可以匹配,例如,“,” 但是这也会失败,因为您的输入有一个元素
title和“any
,并且该元素将被错误地拆分

在这种情况下,我们可以使用
shlex
并使用它的
split
方法。现在,这个split方法将分隔符设置为空白

这样做:

print [_ for _ in shlex.split(foo)]
会让我们更接近我们想要的,但不完全是:

>>> ['https://test.com,', '2017-08-14,', 'This is the title with , and anything in it,', 'This is the paragraph also with , and anything in it']
可以看出,元素中有令人讨厌的逗号,这是我们不想要的

不幸的是,我们不能这样做

print [_[:-1] for _ in shlex.split(foo)]
因为这将切断“it”中的最后一个“t”,但我们可以使用内置的字符串

rstrip 
方法

并匹配每个元素末尾的任何逗号:

print [_.rstrip(',') for _ in shlex.split(foo)]
给出输出:

>>> ['https://test.com', '2017-08-14', 'This is the title with , and anything in it', 'This is the paragraph also with , and anything in it']
这与我们想要的非常接近,但并不完全一样!(错过了“任何东西”旁边的”——shlex狼吞虎咽了!)

但是,我们已经非常接近了,我将把这个小消息留给你的家庭作业,你应该先尝试解决,因为其他人已经发布了

资源:


提示:请同时查看csv模块。

欢迎使用堆栈溢出。这不是代码或正则表达式编写服务。一旦您自己努力解决问题并遇到困难,我们很乐意为您提供帮助。当您这样做时,您可以解释您遇到的问题,包括相关代码,并询问有关该代码的特定问题,我们可以尝试提供帮助。祝你好运。不客气。这是不对的。OP不想要
内部
将被拆分。看到理想的输出。@lincr Oopsies,你说得对。它将在找到的任何逗号处拆分字符串。认为这是一个非常琐碎的硬件问题,所以我太快地浏览了一下输入=p。我把代码更新得更近了一点,但把他的硬件问题留给了他去尝试和完成。谢谢