Python 删除包含特定字符串的行_Python_Line

Python 删除包含特定字符串的行

python

Python 删除包含特定字符串的行,python,line,Python,Line,我试图从文本文件中读取文本，读取行，删除包含特定字符串的行（在本例中为“bad”和“顽皮”）。我写的代码是这样的： infile = file('./oldfile.txt') newopen = open('./newfile.txt', 'w') for line in infile : if 'bad' in line: line = line.replace('.' , '') if 'naughty' in line: line =

我试图从文本文件中读取文本，读取行，删除包含特定字符串的行（在本例中为“bad”和“顽皮”）。我写的代码是这样的：

infile = file('./oldfile.txt')

newopen = open('./newfile.txt', 'w')
for line in infile :

    if 'bad' in line:
        line = line.replace('.' , '')
    if 'naughty' in line:
        line = line.replace('.', '')
    else:
        newopen.write(line)

newopen.close()

good baby
bad boy
good boy
normal boy

good baby
good boy
normal boy

我是这样写的，但没有成功

重要的一点是，如果文本的内容是这样的：

infile = file('./oldfile.txt')

newopen = open('./newfile.txt', 'w')
for line in infile :

    if 'bad' in line:
        line = line.replace('.' , '')
    if 'naughty' in line:
        line = line.replace('.', '')
    else:
        newopen.write(line)

newopen.close()

good baby
bad boy
good boy
normal boy

good baby
good boy
normal boy

我不希望输出有空行。所以我不喜欢：

good baby

good boy
normal boy

但就像这样：

infile = file('./oldfile.txt')

newopen = open('./newfile.txt', 'w')
for line in infile :

    if 'bad' in line:
        line = line.replace('.' , '')
    if 'naughty' in line:
        line = line.replace('.', '')
    else:
        newopen.write(line)

newopen.close()

good baby
bad boy
good boy
normal boy

good baby
good boy
normal boy

在上面的代码中，我应该编辑什么？

您不能简单地将该行包含到新文件中，而是执行替换

for line in infile :
     if 'bad' not in line and 'naughty' not in line:
            newopen.write(line)

else

仅在的情况下连接到最后一个

。您想要elif
：
if 'bad' in line:
    pass
elif 'naughty' in line:
    pass
else:
    newopen.write(line)

还请注意，我删除了行替换，因为您无论如何都不会编写这些行。
您可以像这样使代码更简单、更可读
to_skip = ("bad", "naughty")
out_handle = open("testout", "w")

with open("testin", "r") as handle:
    for line in handle:
        if set(line.split(" ")).intersection(to_skip):
            continue
        out_handle.write(line)
out_handle.close()

bad_words = ['bad', 'naughty']

with open('oldfile.txt') as oldfile, open('newfile.txt', 'w') as newfile:
    for line in oldfile:
        if not any(bad_word in line for bad_word in bad_words):
            newfile.write(line)

使用and.今天我需要完成一个类似的任务，所以我根据我做的一些研究写了一个完成任务的要点。
我希望有人会觉得这很有用
import os

os.system('cls' if os.name == 'nt' else 'clear')

oldfile = raw_input('{*} Enter the file (with extension) you would like to strip domains from: ')
newfile = raw_input('{*} Enter the name of the file (with extension) you would like me to save: ')

emailDomains = ['windstream.net', 'mail.com', 'google.com', 'web.de', 'email', 'yandex.ru', 'ymail', 'mail.eu', 'mail.bg', 'comcast.net', 'yahoo', 'Yahoo', 'gmail', 'Gmail', 'GMAIL', 'hotmail', 'comcast', 'bellsouth.net', 'verizon.net', 'att.net', 'roadrunner.com', 'charter.net', 'mail.ru', '@live', 'icloud', '@aol', 'facebook', 'outlook', 'myspace', 'rocketmail']

print "\n[*] This script will remove records that contain the following strings: \n\n", emailDomains

raw_input("\n[!] Press any key to start...\n")

linecounter = 0

with open(oldfile) as oFile, open(newfile, 'w') as nFile:
    for line in oFile:
        if not any(domain in line for domain in emailDomains):
            nFile.write(line)
            linecounter = linecounter + 1
            print '[*] - {%s} Writing verified record to %s ---{ %s' % (linecounter, newfile, line)

print '[*] === COMPLETE === [*]'
print '[*] %s was saved' % newfile
print '[*] There are %s records in your saved file.' % linecounter

链接至要点：
最好的，
Az
使用python文本操作包：
from textops import *

'oldfile.txt' | cat() | grepv('bad') | tofile('newfile.txt')

我已使用此选项从文本文件中删除不需要的单词：
bad_words = ['abc', 'def', 'ghi', 'jkl']

with open('List of words.txt') as badfile, open('Clean list of words.txt', 'w') as cleanfile:
    for line in badfile:
        clean = True
        for word in bad_words:
            if word in line:
                clean = False
        if clean == True:
            cleanfile.write(line)

或对目录中的所有文件执行相同操作：
import os

bad_words = ['abc', 'def', 'ghi', 'jkl']

for root, dirs, files in os.walk(".", topdown = True):
    for file in files:
        if '.txt' in file:
            with open(file) as filename, open('clean '+file, 'w') as cleanfile:
                for line in filename:
                    clean = True
                    for word in bad_words:
                        if word in line:
                            clean = False
                    if clean == True:
                        cleanfile.write(line)

我相信一定有更优雅的方式来做，但这就是我想要的
bad_words = ['doc:', 'strickland:','\n']

with open('linetest.txt') as oldfile, open('linetestnew.txt', 'w') as newfile:
    for line in oldfile:
        if not any(bad_word in line for bad_word in bad_words):
            newfile.write(line)

\n
是换行符的Unicode转义序列。
正则表达式比我使用的公认答案（23 MB测试文件）快一点。但是里面没有太多
重新导入
坏词=['bad'，'顽皮']
regex=f“^.*（：{'|'.join（坏单词）}）。*\n”
subst=“”
打开（'oldfile.txt'）作为旧文件：
lines=oldfile.read（）
结果=re.sub（正则表达式、subst、行、re.MULTILINE）
打开（'newfile.txt'，'w'）作为新文件：
newfile.write（结果）

试试看，这个效果很好
import re

text = "this is bad!"
text = re.sub(r"(.*?)bad(.*?)$|\n", "", text)
text = re.sub(r"(.*?)naughty(.*?)$|\n", "", text)
print(text)


我想你希望“或”而不是“和”我希望只包含坏或naghuty之一的行也被删除。哪一个是对的？@H.Choi它要么是行中不（“坏”或“顽皮”）要么是行中不“坏”且行中不“顽皮”
，所以这里的和
应该是正确的。你为什么要用空格替换行中的点呢？Wooble也许OP希望这是一个正则表达式，如果他将行中出现的任何东西
替换为任何东西
，则他将替换为任何东西
。如果出现类似的情况，这是不好的。