Python中的模式替换_Python_Regex_Python 2.7

Python中的模式替换

python regex python-2.7

Python中的模式替换,python,regex,python-2.7,Python,Regex,Python 2.7,正在寻找一些替代方法来清理包含括号之间信息的表格文件。这将是包含在管道中的第一步，我需要删除包含在括号内的每个值我所拥有的 > Otu00467 Bacteria(100);Gracilibacteria(99);unclassified(99);unclassified(99);unclassified(99);unclassified(99); > Otu00469 Bacteria(100);Proteobacteria(96);unclassified(96);unc

正在寻找一些替代方法来清理包含括号之间信息的表格文件。这将是包含在管道中的第一步，我需要删除包含在括号内的每个值

我所拥有的

> Otu00467  Bacteria(100);Gracilibacteria(99);unclassified(99);unclassified(99);unclassified(99);unclassified(99);
> Otu00469  Bacteria(100);Proteobacteria(96);unclassified(96);unclassified(96);unclassified(96);unclassified(96);
> Otu00470  Bacteria(100);Proteobacteria(100);Alphaproteobacteria(100);Rhodospirillales(100);Rhodospirillaceae(100);Azospirillum(54);

我想要的是：

 Otu00467   Bacteria;Gracilibacteria;unclassified;unclassified;unclassified;unclassified;
 Otu00469   Bacteria;Proteobacteria;unclassified;unclassified;unclassified;unclassified;
 Otu00470   Bacteria;Proteobacteria;Alphaproteobacteria;Rhodospirillales;Rhodospirillaceae;Azospirillum;

我的第一种方法是将第二列拆分为，并进一步加入一切。不错，但是太难看了

谢谢。

使用re.sub：

import re
new_string = re.sub(r'\(.*?\)', '', your_string)

这将删除括号中包含的所有文本。re.M标志是多行说明符，当字符串在匹配模式中有换行符时，该标志非常有用。

使用re.sub:

这将删除括号中包含的所有文本。M标志是多行说明符，当字符串在匹配模式中有换行符时，它很有用。

我会尝试使用regexp。诸如此类：

pattern = re.compile('(\w+)\(\d+\);')
';'.join(re.findall(pattern, string))

对于每个字符串，我将尝试使用regexp。诸如此类：

pattern = re.compile('(\w+)\(\d+\);')
';'.join(re.findall(pattern, string))

#Use re module to use regex
import re

#Open file and read data in data variable
data = open('file.txt').read()

#Apply search and replace on data variable
data = re.sub('\(\d+\)', '', data)

#Print data to output.txt file
with open('output.txt', 'w') as out:
    out.write(data)

对于每个字符串

，这个正则表达式去掉了括号中的数字组，也去掉了任何'>'字符，因为您似乎也希望消除它们

#Use re module to use regex
import re

#Open file and read data in data variable
data = open('file.txt').read()

#Apply search and replace on data variable
data = re.sub('\(\d+\)', '', data)

#Print data to output.txt file
with open('output.txt', 'w') as out:
    out.write(data)

import re

data = '''\
> Otu00467  Bacteria(100);Gracilibacteria(99);unclassified(99);>unclassified(99);unclassified(99);unclassified(99);
> Otu00469  Bacteria(100);Proteobacteria(96);unclassified(96);unclassified(96);unclassified(96);unclassified(96);
> Otu00470  Bacteria(100);Proteobacteria(100);Alphaproteobacteria(100);Rhodospirillales(100);Rhodospirillaceae(100);Azospirillum(54);
'''

data = re.sub(r'>|\(\d+\)', '', data)
print(data)

输出

这段代码适用于Python 2和3。

此正则表达式去掉了括号中的数字组，也去掉了任何'>'字符，因为您似乎也希望消除它们

import re

data = '''\
> Otu00467  Bacteria(100);Gracilibacteria(99);unclassified(99);>unclassified(99);unclassified(99);unclassified(99);
> Otu00469  Bacteria(100);Proteobacteria(96);unclassified(96);unclassified(96);unclassified(96);unclassified(96);
> Otu00470  Bacteria(100);Proteobacteria(100);Alphaproteobacteria(100);Rhodospirillales(100);Rhodospirillaceae(100);Azospirillum(54);
'''

data = re.sub(r'>|\(\d+\)', '', data)
print(data)

输出

这段代码适用于Python 2和3。

谢谢。已编辑。m标志只会更改“^”和“$锚定”的行为，因此它对这个正则表达式绝对没有影响。OTOH，在指定正则表达式时，使用r语法总是谨慎的，以避免反斜杠出现意外问题。@Rawing Good point。我被re.S，又名re.DOTALL搞混了。噢，谢谢。已编辑。m标志只会更改“^”和“$锚定”的行为，因此它对这个正则表达式绝对没有影响。OTOH，在指定正则表达式时，使用r语法总是谨慎的，以避免反斜杠出现意外问题。@Rawing Good point。我和re.S，又称re.DOTALL混淆了：oops：在您的特定情况下应该没有区别，因为没有嵌套的括号，但这当然也会起作用。在您的特定情况下，应该没有区别，因为没有嵌套的括号，但这当然也会起作用。