Python 从以特定字符开头的文件中删除空记录_Python_Python 3.x_String

Python 从以特定字符开头的文件中删除空记录

python python-3.x string

Python 从以特定字符开头的文件中删除空记录,python,python-3.x,string,Python,Python 3.x,String,我有一个包含DBLP数据集的文件，它由计算机科学中的书目数据组成。我想删除一些缺少信息的记录。例如，我想删除缺少场地的记录。在该数据集中，地点后面跟着“#c” 在这段代码中，我按照手稿的标题（“#*”）拆分文档。现在，我正在尝试删除没有场馆名称的记录输入数据： #*Toward Connectionist Parsing. #@Steven L. Small,Garrison W. Cottrell,Lokendra Shastri #t1982 #c #index14997 #*

我有一个包含DBLP数据集的文件，它由计算机科学中的书目数据组成。我想删除一些缺少信息的记录。例如，我想删除缺少场地的记录。在该数据集中，地点后面跟着“#c”

在这段代码中，我按照手稿的标题（“#*”）拆分文档。现在，我正在尝试删除没有场馆名称的记录

输入数据：

#*Toward Connectionist Parsing.

#@Steven L. Small,Garrison W. Cottrell,Lokendra Shastri

#t1982

#c

#index14997


#*A Framework for Reinforcement Learning on Real Robots.

#@William D. Smart,Leslie Pack Kaelbling

#t1998

#cAAAI/IAAI

#index14998

#*Efficient Goal-Directed Exploration.

#@Yury V. Smirnov,Sven Koenig,Manuela M. Veloso,Reid G. Simmons

#t1996

#cAAAI/IAAI, Vol. 1

#index14999

我的代码：

inFile = open('lorem.txt','r')
Data = inFile.read()
data = Data.split("#*")
ouFile = open('testdata.txt','w')
for idx, word in enumerate(data):
    print("i = ", idx)
    if not('#!' in data[idx]):
        del data[idx]
        idx = idx - 1
    else:
        ouFile.write("#*" + data[idx])
ouFile.close()
inFile.close()

预期产出：

#*A Framework for Reinforcement Learning on Real Robots.

#@William D. Smart,Leslie Pack Kaelbling

#t1998

#cAAAI/IAAI

#index14998

#*Efficient Goal-Directed Exploration.

#@Yury V. Smirnov,Sven Koenig,Manuela M. Veloso,Reid G. Simmons

#t1996

#cAAAI/IAAI, Vol. 1

#index14999

实际产量：空输出文件将为您提供子字符串的索引，如果子字符串不存在，则为-1

DOCUMENT_SEP='.*'
在文件中打开（'lorem.txt'）：
documents=in_file.read（）.split（DOCUMENT_SEP）
打开（'testdata.txt'，'w'）作为输出文件：
对于文档中的文档：
i=document.find（“#c”）
如果i<0:#否“#c”
持续
#“#c”存在，但没有后续场馆信息
如果不是文档[i+2:i+3]。条带（）
持续
out\u file.write（文档\u SEP）
输出文件。写入（文档）

我没有手动关闭，而是使用了带有语句的


不需要使用索引；删除循环中间的项将使索引计算复杂化。
使用像#c[A-Z]…
这样的正则表达式将使代码更简单

将为您提供子字符串的索引，如果子字符串不存在，则为-1
DOCUMENT_SEP='.*'
在文件中打开（'lorem.txt'）：
documents=in_file.read（）.split（DOCUMENT_SEP）
打开（'testdata.txt'，'w'）作为输出文件：
对于文档中的文档：
i=document.find（“#c”）
如果i<0:#否“#c”
持续
#“#c”存在，但没有后续场馆信息
如果不是文档[i+2:i+3]。条带（）
持续
out\u file.write（文档\u SEP）
输出文件。写入（文档）

我没有手动关闭，而是使用了带有语句的


不需要使用索引；删除循环中间的项将使索引计算复杂化。
使用像#c[A-Z]…
这样的正则表达式将使代码更简单
您的代码不起作用的原因是因为没有#在您的任何条目中。
如果要排除带有空#c
字段的条目，可以尝试以下操作：
inFile = open('lorem.txt','r')
Data = inFile.read()
data = Data.split("#*")
ouFile = open('testdata.txt','w')
for idx, word in enumerate(data):
    print("i = ", idx)
    if not '#c\n' in data[idx] and len(word) > 0:
        ouFile.write("#*" + data[idx])
ouFile.close()
inFile.close()

一般来说，不要删除循环列表中的元素。它可能会引起很多意想不到的戏剧性事件。
您的代码不起作用的原因是因为没有#在您的任何条目中。
如果要排除带有空#c
字段的条目，可以尝试以下操作：
inFile = open('lorem.txt','r')
Data = inFile.read()
data = Data.split("#*")
ouFile = open('testdata.txt','w')
for idx, word in enumerate(data):
    print("i = ", idx)
    if not '#c\n' in data[idx] and len(word) > 0:
        ouFile.write("#*" + data[idx])
ouFile.close()
inFile.close()

一般来说，不要删除循环列表中的元素。这会引起很多意想不到的戏剧性事件