Python 使用(.|\s)后只返回一个匹配项*
字符串: 我想为Python 使用(.|\s)后只返回一个匹配项*,python,regex,Python,Regex,字符串: 我想为Person1选择has()中的所有内容,即['1,1','2,2','3,3'] 我尝试了:具有全局模式标志的\((\d,\d)\(.\s)*Person2,但只返回了1,1。使用re.findall()函数的解决方案: Person1(has(1, 1) has(2, 2) has(3, 3) had(4, 4) had(5, 5)) Person2(has(6, 6) had(7, 7)) (?-lookbehind否定断言,确保关键的has子字符串
Person1
选择has()
中的所有内容,即['1,1','2,2','3,3']
我尝试了:
具有全局模式标志的\((\d,\d)\(.\s)*Person2
,但只返回了1,1
。使用re.findall()
函数的解决方案:
Person1(has(1, 1) has(2, 2)
has(3, 3)
had(4, 4) had(5, 5))
Person2(has(6, 6) had(7, 7))
(?-lookbehind否定断言,确保关键的
子字符串前面没有has
Person2(
-包含([^()]+)
项的第一个捕获组具有
要为某个
人提供项目,请使用以下统一方法和扩展示例:
['1, 1', '2, 2', '3, 3']
输出:
import re
s = '''
Person1(has(1, 1) has(2, 2)
has(3, 3)
had(4, 4) had(5, 5))
Person2(has(6, 6) had(7, 7))'''
has_items = re.findall(r'(?<!Person2\()has\(([^()]+)\)', s)
print(has_items)
def grepPersonItems(s, person):
person_items = []
person_group = re.search(r'(' + person + '\(.*?\)\))', s, re.DOTALL)
if person_group:
person_items = re.findall(r'has\(([^()]+)\)', person_group.group())
return person_items
s = '''
Person1(has(1, 1) has(2, 2)
has(3, 3)
had(4, 4) had(5, 5))
Person2(has(6, 6) had(7, 7), has(8,8)) Person3(has(2, 6) had(7, 7), has(9, 9))'''
print('Person1: ', person1_items)
print('Person2: ', person2_items)
print('Person3: ', person3_items)
print(person1_items)
print(person2_items)
print(person3_items)
为什么不完全解析它,然后您就可以选择您可能需要的任何内容-您将需要两种模式,一种用于抓取每个人及其内容,另一种用于抓取其中的各个部分+您可以添加更多解析以获取各个元素并将其转换为本机Python类型。例如:
Person1: ['1, 1', '2, 2', '3, 3']
Person2: ['6, 6', '8, 8']
Person3: ['2, 6', '9, 9']
然后,您可以解析所有内容并访问其内容。最基本的示例是:
import collections
import re
persons = re.compile(r"(Person\d+)\(((?:.*?\(.*?\)\s*)+)\)")
contents = re.compile(r"(\w+)\((.*?)\)")
def parse_input(data, parse_inner=True, map_inner=str):
result = {} # store for our parsed data
for match in persons.finditer(data): # loop through our `Persons`
person = match.group(1) # grab the first group to get our Person
elements = collections.defaultdict(list) # store for the parsed inner elements
for element in contents.finditer(match.group(2)): # loop through the has/had/etc.
element_name = element.group(1) # the first group holds the name
element_data = element.group(2) # this is the inner content of each has/had/etc.
if parse_inner: # if we want to parse the inner elements...
element_data = [map_inner(x.strip()) for x in element_data.split(",")]
elements[element_name].append(element_data) # add our inner results
result[person] = elements # add persons to our result
return result # well, obvious...
import re
regex = r"has\(\s*(\d+)\s*,\s*(\d+)\s*\)"
dict={}
test_str = ("Person1(has(1, 1) has(2, 2)\n"
" has(3, 3) \n"
" had(4, 4) had(5, 5))\n"
"Person2(had(6, 6) has(7, 7))\n"
"Person3(had(6, 6) has(8, 8))")
res=re.split(r"(Person\d+)",test_str)
currentKey="";
for rs in res:
if "Person" in rs:
currentKey=rs;
elif currentKey !="":
matches = re.finditer(regex, rs, re.DOTALL)
ar=[]
for match in matches:
ar.append(match.group(1)+","+match.group(2))
dict[currentKey]=ar;
print(dict)
但是,您可以做更多的事情……您可以添加多个人员,并将其“转换”为实际的Python结构:
test = """Person1(has(1, 1) has(2, 2)
has(3, 3)
had(4, 4) had(5, 5))
Person2(has(6, 6) had(7, 7))"""
parsed = parse_input(test, False) # basic string grab
print(parsed["Person1"]["has"]) # ['1, 1', '2, 2', '3, 3']
print(parsed["Person2"]["has"]) # ['6, 6']
print(parsed["Person2"]["had"]) # ['7, 7']
我想你可以尝试这种方法,我想它对所有人来说都是动态的和简单的。它拆分和解析字符串,并将每个所需的数组放入person字典中
示例源():
产出将是:
import collections
import re
persons = re.compile(r"(Person\d+)\(((?:.*?\(.*?\)\s*)+)\)")
contents = re.compile(r"(\w+)\((.*?)\)")
def parse_input(data, parse_inner=True, map_inner=str):
result = {} # store for our parsed data
for match in persons.finditer(data): # loop through our `Persons`
person = match.group(1) # grab the first group to get our Person
elements = collections.defaultdict(list) # store for the parsed inner elements
for element in contents.finditer(match.group(2)): # loop through the has/had/etc.
element_name = element.group(1) # the first group holds the name
element_data = element.group(2) # this is the inner content of each has/had/etc.
if parse_inner: # if we want to parse the inner elements...
element_data = [map_inner(x.strip()) for x in element_data.split(",")]
elements[element_name].append(element_data) # add our inner results
result[person] = elements # add persons to our result
return result # well, obvious...
import re
regex = r"has\(\s*(\d+)\s*,\s*(\d+)\s*\)"
dict={}
test_str = ("Person1(has(1, 1) has(2, 2)\n"
" has(3, 3) \n"
" had(4, 4) had(5, 5))\n"
"Person2(had(6, 6) has(7, 7))\n"
"Person3(had(6, 6) has(8, 8))")
res=re.split(r"(Person\d+)",test_str)
currentKey="";
for rs in res:
if "Person" in rs:
currentKey=rs;
elif currentKey !="":
matches = re.finditer(regex, rs, re.DOTALL)
ar=[]
for match in matches:
ar.append(match.group(1)+","+match.group(2))
dict[currentKey]=ar;
print(dict)
我只能为Person1
或Person2
选择has
?如果Person2
的has超过1个has
,那么第一个has之后的has也会被选中。谢谢。@Harrison,详细说明你的问题,你想为Person1
和Person2
或任何pos都搜索所有has
项目吗可能的人?我最初的问题是为Person1
获取所有has
,但是如果你也能为Person2
提供另一个正则表达式,那就太好了。我在问题中简化了Person2
,它也可以有多个has
和多行。如果你为Person添加另一个has
2
,(?也会选择它。@哈里森,享受我的扩展示例的统一方法