Python正则表达式删除\n
我有个问题。我试图做的是对数据进行排序,并在某些点创建新行。目前,我的代码如下所示:Python正则表达式删除\n,python,regex,excel,vba,Python,Regex,Excel,Vba,我有个问题。我试图做的是对数据进行排序,并在某些点创建新行。目前,我的代码如下所示: from __future__ import print_function import re NDoc = raw_input("Enter name of new document ")+".txt" log = open(NDoc, 'w') file = raw_input("Enter a file to be sorted ") extfile = file+".txt" xfile = open(
from __future__ import print_function
import re
NDoc = raw_input("Enter name of new document ")+".txt"
log = open(NDoc, 'w')
file = raw_input("Enter a file to be sorted ")
extfile = file+".txt"
xfile = open(file+".txt")
for line in xfile:
l=line.strip()
l=re.sub("\n","",l)
n=re.sub("(\B)(?=((MTH|HST|ENG)[|]))","\n",line)
if len(n) > 0:
nl=n.split("\n")
for item in nl:
log.write(item+"\n")
#print(item)
print ("The data from",extfile,"has been sorted into",NDoc)
MTH|lettersandnumbersHST|lettersandnumbersENG|lettersandnumbers
MTH|lettersandnumbersHST|lettersandnumbersENG|lettersandnumbers
MTH|lettersandnumbersHST|
MTH|lettersandnumbers
HST|lettersandnumbers
ENG|lettersandnumbers
MTH|lettersandnumbers
HST|lettersandnumbers
ENG|lettersandnumbers
MTH|lettersandnumbers
HST|
一切正常,除了在第三学期(英语)之后,我的数据中出现了一行新词。例如,如果我的数据文件是这样的:
from __future__ import print_function
import re
NDoc = raw_input("Enter name of new document ")+".txt"
log = open(NDoc, 'w')
file = raw_input("Enter a file to be sorted ")
extfile = file+".txt"
xfile = open(file+".txt")
for line in xfile:
l=line.strip()
l=re.sub("\n","",l)
n=re.sub("(\B)(?=((MTH|HST|ENG)[|]))","\n",line)
if len(n) > 0:
nl=n.split("\n")
for item in nl:
log.write(item+"\n")
#print(item)
print ("The data from",extfile,"has been sorted into",NDoc)
MTH|lettersandnumbersHST|lettersandnumbersENG|lettersandnumbers
MTH|lettersandnumbersHST|lettersandnumbersENG|lettersandnumbers
MTH|lettersandnumbersHST|
MTH|lettersandnumbers
HST|lettersandnumbers
ENG|lettersandnumbers
MTH|lettersandnumbers
HST|lettersandnumbers
ENG|lettersandnumbers
MTH|lettersandnumbers
HST|
我希望它看起来像这样:
from __future__ import print_function
import re
NDoc = raw_input("Enter name of new document ")+".txt"
log = open(NDoc, 'w')
file = raw_input("Enter a file to be sorted ")
extfile = file+".txt"
xfile = open(file+".txt")
for line in xfile:
l=line.strip()
l=re.sub("\n","",l)
n=re.sub("(\B)(?=((MTH|HST|ENG)[|]))","\n",line)
if len(n) > 0:
nl=n.split("\n")
for item in nl:
log.write(item+"\n")
#print(item)
print ("The data from",extfile,"has been sorted into",NDoc)
MTH|lettersandnumbersHST|lettersandnumbersENG|lettersandnumbers
MTH|lettersandnumbersHST|lettersandnumbersENG|lettersandnumbers
MTH|lettersandnumbersHST|
MTH|lettersandnumbers
HST|lettersandnumbers
ENG|lettersandnumbers
MTH|lettersandnumbers
HST|lettersandnumbers
ENG|lettersandnumbers
MTH|lettersandnumbers
HST|
但它却给了我这个:
MTH|lettersandnumbers
HST|lettersandnumbers
ENG|lettersandnumbers
MTH|lettersandnumbers
HST|lettersandnumbers
ENG|lettersandnumbers
MTH|lettersandnumbers
HST|
现在我想,在添加新的\n之前,执行l=re.sub(“\n”,”,l)
会将所有\n替换为零,那么为什么仍要添加一行,但仅在ENG之后
提前感谢您提供的任何见解。您的线路使用了错误的名称
l=line.strip()
l=re.sub("\n","",l)
应该是
line=line.strip()
line=re.sub("\n","",line)
或者干脆
line=line.strip().replace('\n', '')
你的源数据在“ENG”之后有空格。只要去掉这些空格,你就没事了
l=re.sub(' ', '', l)
您可以使用findall来匹配以下任一模式:
s = """MTH|lettersandnumbersHST|lettersandnumbersENG|lettersandnumbers
MTH|lettersandnumbersHST|lettersandnumbersENG|lettersandnumbers
MTH|lettersandnumbersHST|"""
r= re.compile("([A-Z]+\|[0-9a-z]+|[A-Z]+\|)",)
for line in s.splitlines(True):
print("\n".join(r.findall(line)))
输出:
MTH|lettersandnumbers
HST|lettersandnumbers
ENG|lettersandnumbers
MTH|lettersandnumbers
HST|lettersandnumbers
ENG|lettersandnumbers
MTH|lettersandnumbers
HST|
我认为你没有使用正确的工具 您可能想要: 简短说明:这将捕获任何选项
MTH
、HST
或ENG
,前面没有\n
([^\n]
是“除\n
以外的任何内容”)和前面的字符,并在它们之间添加一个\n
。结果就是你所期望的
例如:
>>> st = """MTH|lettersandnumbersHST|lettersandnumbersENG|lettersandnumbers
... MTH|lettersandnumbersHST|lettersandnumbersENG|lettersandnumbers
... MTH|lettersandnumbersHST|"""
>>> print(re.sub("([^\n])(MTH|HST|ENG)", r"\1\n\2", st))
MTH|lettersandnumbers
HST|lettersandnumbers
ENG|lettersandnumbers
MTH|lettersandnumbers
HST|lettersandnumbers
ENG|lettersandnumbers
MTH|lettersandnumbers
HST|
l=l.replace(“\n”和“”)
我注意到您分配了一个引用l
,然后再也不使用它了。也许应该是行
?是的,就是这样,谢谢你抓住了我愚蠢的错误。你测试过这个吗?因为我觉得这不管用。