如何在python中搜索行中的字符串并提取两个字符之间的数据？_Python_Python 3.x

如何在python中搜索行中的字符串并提取两个字符之间的数据？

python python-3.x

如何在python中搜索行中的字符串并提取两个字符之间的数据？,python,python-3.x,Python,Python 3.x,文件内容： module traffic( green_main, yellow_main, red_main, green_first, yellow_first, red_first, clk, rst, waiting_main, waiting_first ); 我需要搜索字符串“module”，并提取（……）之间的内容；括号这是我试过的代码，我无法得到结果 fp = open(file_name) contents = fp.read() unique_word_

文件内容：

module traffic(
    green_main, yellow_main, red_main, green_first, yellow_first, 
    red_first, clk, rst, waiting_main, waiting_first
);

我需要搜索字符串“module”，并提取（……）之间的内容；括号

这是我试过的代码，我无法得到结果

fp = open(file_name)
contents = fp.read()
unique_word_a = '('
unique_word_b = ');'
s = contents

for line in contents:
    if 'module' in line:
        your_string=s[s.find(unique_word_a)+len(unique_word_a):s.find(unique_word_b)].strip()
        print(your_string)

如果要在“（“”）之间提取内容，可以这样做：（但首先要注意如何处理内容）：

如果您的内容不仅在一行中：

import math 
def find_all(your_string, search_string, max_index=math.inf, offset=0,):
    index = your_string.find(search_string, offset)

    while index != -1 and index < max_index:
        yield index
        index = your_string.find(search_string, index + 1)

s = content.replace('\n', '')

for offset in find_all(s, 'module'):
    max_index = s.find('module', offset=offset + len('module'))
    if max_index == -1:
        max_index = math.inf
    print([s[start + 1: stop] for start, stop in zip(find_all(s, '(',max_index, offset), find_all(s, ')', max_index, offset))])

导入数学
def find_all（您的_字符串，搜索字符串，max_index=math.inf，offset=0，）：
索引=你的字符串。查找（搜索字符串，偏移量）
而索引！=-1和指数<最大指数：
产量指数
index=你的字符串。find（搜索字符串，索引+1）
s=内容。替换（'\n'，''）
对于find_all（“模块”）中的偏移量：
max_index=s.find（'module'，offset=offset+len（'module'））
如果最大索引==-1：
max_index=math.inf
打印（[s[start+1:stop]表示开始，在zip中停止（查找所有（s'（'，最大索引，偏移量），查找所有（s'），最大索引，偏移量）））

您的代码存在以下问题：

for line in contents:
    if 'module' in line:

这里，

contents

是一个包含文件全部内容的字符串，而不是字符串（行）列表或可以逐行循环的文件句柄。因此，您的

行

实际上不是一行，而是该字符串中的一个字符，它显然不能包含子字符串

“module”

由于您从未在循环中实际使用

行

，因此您只需删除循环和条件，代码就可以正常工作。（如果您将代码更改为实际循环行，并且在这些行中查找

，则它将不起作用，因为（
和）
不在同一行上。）

或者，您可以使用正则表达式：
>>> content = """module traffic(green_main, yellow_main, red_main, green_first, yellow_first, 
...                red_first, clk, rst, waiting_main, waiting_first);"""
...
>>> re.search("module \w+\((.*?)\);", content, re.DOTALL).group(1)
'green_main, yellow_main, red_main, green_first, yellow_first, \n               red_first, clk, rst, waiting_main, waiting_first'

这里，module\w+\（.*）意味着

单词模块
后跟空格和一些单词类型\w
字符
文字开头（
包含任何内容的捕获组（…）
，包括换行符（re.DOTALL
），非贪婪*？
字面上的结束语）
和


而组（1）
将获取在（…）
的（非转义）对之间找到的内容
如果您想将其列为列表：
>>> list(map(str.strip, _.split(",")))
['green_main', 'yellow_main', 'red_main', 'green_first', 'yellow_first', 'red_first', 'clk', 'rst', 'waiting_main', 'waiting_first']

如果我尝试您的解决方案，我会收到以下错误消息AttributeError:“NoneType”对象没有属性“group”@RekhaG适合我。但也许这条线并不总是完全相同的？可能在模块
后面有一个以上的空格，或者前面有一个空格（
？无论如何，如果你只是解决了另一个问题，你实际上不需要正则表达式……这基本上就是OP现在正在做的事情，但是在那之前有一个不同的问题。如果（
和）模块在不同的行上。我想应该将内容作为一个整体来处理
>>> list(map(str.strip, _.split(",")))
['green_main', 'yellow_main', 'red_main', 'green_first', 'yellow_first', 'red_first', 'clk', 'rst', 'waiting_main', 'waiting_first']