python正则表达式-某些字符之间的字符
编辑:我应该补充一点,测试中的字符串应该包含所有可能的字符(即.*+$§€/等)。所以我认为regexp应该最有帮助 我使用正则表达式查找某些字符([“和”])之间的所有字符。我的示例如下:python正则表达式-某些字符之间的字符,python,regex,char,newline,lookahead,Python,Regex,Char,Newline,Lookahead,编辑:我应该补充一点,测试中的字符串应该包含所有可能的字符(即.*+$§€/等)。所以我认为regexp应该最有帮助 我使用正则表达式查找某些字符([“和”])之间的所有字符。我的示例如下: test = """["this is a text and its supposed to contain every possible char."], ["another one after a newline."], ["and another one even with
test = """["this is a text and its supposed to contain every possible char."],
["another one after a newline."],
["and another one even with
newlines
in it."]"""
['this is a text and its supposed to contain every possible char.', 'another one after a newline.', 'and another one even with newlines in it.']
import re
my_list = re.findall(r'(?<=\[").*(?="\])*[^ ,\n]', test)
print (my_list)
假定的输出应如下所示:
test = """["this is a text and its supposed to contain every possible char."],
["another one after a newline."],
["and another one even with
newlines
in it."]"""
['this is a text and its supposed to contain every possible char.', 'another one after a newline.', 'and another one even with newlines in it.']
import re
my_list = re.findall(r'(?<=\[").*(?="\])*[^ ,\n]', test)
print (my_list)
我的代码(包括正则表达式)如下所示:
test = """["this is a text and its supposed to contain every possible char."],
["another one after a newline."],
["and another one even with
newlines
in it."]"""
['this is a text and its supposed to contain every possible char.', 'another one after a newline.', 'and another one even with newlines in it.']
import re
my_list = re.findall(r'(?<=\[").*(?="\])*[^ ,\n]', test)
print (my_list)
因此有两个问题:
1) 它不会像我希望它对(?=“\])
那样删除文本末尾的。
2) 它没有捕获括号中的第三个文本,猜测是因为换行。但到目前为止,当我尝试*\n
时,无法捕获这些内容,因为它返回了一个空字符串
我非常感谢在这个问题上给予的任何帮助或提示。先谢谢你
顺便说一句,iam在anaconda spyder和最新的正则表达式(2018)上使用python 3.6
编辑2:对测试进行一次修改:
test = """[
"this is a text and its supposed to contain every possible char."
],
[
"another one after a newline."
],
[
"and another one even with
newlines
in it."
]"""
我再一次很难从中删除换行符,我想可以用\s删除空格,这样的regexp就可以解决这个问题了
my_list = re.findall(r'(?<=\[\S\s\")[\w\W]*(?=\"\S\s\])', test)
print (my_list)
你可以试试这个伴侣。
(?<=\[\")[\w\s.]+(?=\"\])
(?你可以试试这个伴侣。
(?<=\[\")[\w\s.]+(?=\"\])
(?如果您可能也接受not regex解决方案,您可以尝试
result = []
for l in eval(' '.join(test.split())):
result.extend(l)
print(result)
# ['this is a text and its supposed to contain every possible char.', 'another one after a newline.', 'and another one even with newlines in it.']
如果您可能也接受not regex解决方案,您可以尝试
result = []
for l in eval(' '.join(test.split())):
result.extend(l)
print(result)
# ['this is a text and its supposed to contain every possible char.', 'another one after a newline.', 'and another one even with newlines in it.']
我想说的是:
test = """["this is a text and its supposed to contain every possible char."],
["another one after a newline."],
["and another one even with
newlines
in it."]"""
for i in test.replace('\n', '').replace(' ', ' ').split(','):
print(i.lstrip(r' ["').rstrip(r'"]'))
这将导致以下内容被打印到屏幕上
this is a text and its supposed to contain every possible char.
another one after a newline.
and another one even with newlines in it.
如果您想要这些-精确-字符串的列表,我们可以将其修改为-
newList = []
for i in test.replace('\n', '').replace(' ', ' ').split(','):
newList.append(i.lstrip(r' ["').rstrip(r'"]'))
我想说的是:
test = """["this is a text and its supposed to contain every possible char."],
["another one after a newline."],
["and another one even with
newlines
in it."]"""
for i in test.replace('\n', '').replace(' ', ' ').split(','):
print(i.lstrip(r' ["').rstrip(r'"]'))
这将导致以下内容被打印到屏幕上
this is a text and its supposed to contain every possible char.
another one after a newline.
and another one even with newlines in it.
如果您想要这些-精确-字符串的列表,我们可以将其修改为-
newList = []
for i in test.replace('\n', '').replace(' ', ' ').split(','):
newList.append(i.lstrip(r' ["').rstrip(r'"]'))
假设的输出应该是这样的
那么除了匹配之外,您还想删除输出中的换行符?看起来您需要一个.sub
或者类似的东西假设的输出应该是这样的
那么除了匹配之外,您还想删除输出中的换行符?看起来您需要一个.sub
或者,谢谢你的回答。是的,我想基本上包括所有字符。我想“.”会包括除换行符以外的所有字符,但我想它只会在到达换行符时停止。顺便问一句,现在有没有办法去掉输出中的这些字符?还有中间的额外空格(…有换行符…)?我知道如何用for循环替换它,但如果可以在regexp中完成,我想知道。@MikeTwain是的,你可以。检查我已更新的答案。如果它有助于你选择正确答案:pthanks。尽管输出仍然包含这些\n字符。没有办法也删除它们?顺便说一句,最后一个al如果可以的话,请修改。我在问题中编辑了它。修改为测试。我想匹配[”和“]”之间的所有内容。顺便说一句,我刚刚编辑了问题。@MikeTwain yes mate you can.result.replace(/(?:\n+\s{2,})/,“”)使用此正则表达式,您将获得所需的输出答案。是的,我希望基本上包括所有字符。我原以为“.”将包括除换行符以外的所有字符,但我猜它只会在到达换行符时停止。顺便问一下,现在有没有办法去掉输出中的\n这些字符?以及中间的额外空格(…带有\n换行符…)?我知道如何用for循环替换它,但如果可以在regexp中完成,我想知道。@MikeTwain是的,你可以。检查我已更新的答案。如果它有助于你选择正确答案:pthanks。尽管输出仍然包含这些\n字符。没有办法也删除它们?顺便说一句,最后一个al如果可以的话,请修改。我在问题中编辑了它。修改为测试。我想匹配[”和“]”之间的所有内容。顺便说一句,我刚刚编辑了问题。@MikeTwain yes mate you can.result.replace(/(?:\n+\s{2,})/,“”)使用这个正则表达式,你会得到你想要的输出感兴趣的方法,到目前为止还没有考虑过。但是对于我的例子,它似乎工作得很好。谢谢你!有趣的方法,到目前为止还没有考虑过。但是对于我的例子,它似乎工作得很好。谢谢你!