基于另一个文件（Python）从文件中删除短语_Python

基于另一个文件（Python）从文件中删除短语

python

基于另一个文件（Python）从文件中删除短语,python,Python,如何在python中实现这一点包含错误短语 Go away Don't do that Stop it I don't know why you do that. Go away. I was wondering what you were doing. You seem nice allphrases.txt包含 Go away Don't do that Stop it I don't know why you do that. Go away. I was wondering wh

如何在python中实现这一点

包含错误短语

Go away
Don't do that
Stop it

I don't know why you do that. Go away.
I was wondering what you were doing.
You seem nice

allphrases.txt包含

Go away
Don't do that
Stop it

I don't know why you do that. Go away.
I was wondering what you were doing.
You seem nice

我希望allphrases.txt清除badphages.txt中的行

这在bash中是微不足道的

cat badfiles.txt | while read b
do
cat allphrases.txt | grep -v "$b" > tmp
cat tmp > allphrases.txt
done

哦，你以为我没看过也没试过。我找了一个多小时

这是我的密码：

# Files  
ttv = "/tmp/tv.dat"  
tmp = "/tmp/tempfile"  
bad = "/tmp/badshows"

坏文件已存在
…这里的代码创建ttv

# Function grep_v  
def grep_v(f,str):  
     file = open(f, "r")   
     for line in file:  
          if line in str:  
               return True  
     return False  

t = open(tmp, 'w')  
tfile = open(ttv, "r")   
for line in tfile:  
     if not grep_v(bad,line):  
          t.write(line)  
tfile.close  
t.close  
os.rename(tmp, ttv)

首先，谷歌如何用python读取文件：

您可能会得到如下结果：

使用此选项读取列表中的两个文件

with open('badphrases.txt') as f:
    content = f.readlines()
badphrases = [x.strip() for x in content] 

with open('allphrases.txt') as f:
    content = f.readlines()
allphrases = [x.strip() for x in content]

现在，两个内容都在列表中

迭代所有短语，检查其中是否存在来自不良短语的短语

在这点上，你可以考虑谷歌：

如何迭代python列表
如何检查另一个字符串python中是否存在字符串

从这些地方获取代码，构建一个暴力算法，如下所示：

for line in allphrases:
    flag = True
    for badphrase in badphrases:
        if badphrase in line:
            flag = False
            break
    if flag:
        print(line)

如果您能理解此代码，则会注意到需要将“打印”替换为“输出到文件”：

现在谷歌如何打印到文件python

然后考虑如何改进算法。祝你一切顺利

更新：

@COLDSPEED建议您可以使用简单的google -如何在python中替换文件中的行：

您可能会得到这样的结果：

这同样有效。

解决方案还不错

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

import feedparser, os, re

# Files
h = os.environ['HOME']
ttv = h + "/WEB/Shows/tv.dat"
old = h + "/WEB/Shows/old.dat"
moo = h + "/WEB/Shows/moo.dat"
tmp = h + "/WEB/Shows/tempfile"
bad = h + "/WEB/Shows/badshows"

# Function not_present
def not_present(f,str):
     file = open(f, "r") 
     for line in file:
          if str in line:
               return False
     return True

# Sources (shortened)
sources = ['http://predb.me/?cats=tv&rss=1']

# Grab all the feeds and put them into ttv and old
k = open(old, 'a')
f = open(ttv, 'a')
for h in sources:
     d = feedparser.parse(h)
     for post in d.entries:
          if not_present(old,post.link):
               f.write(post.title + "|" +  post.link + "\n")
               k.write(post.title + "|" +  post.link + "\n")
f.close
k.close

# Remove shows without [Ss][0-9] and put them in moo
m = open(moo, 'a')
t = open(tmp, 'w')
file = open(ttv, "r") 
for line in file:
     if re.search(r's[0-9]', line, re.I) is None:
          m.write(line)
#          print("moo", line)
     else:
          t.write(line)
#          print("tmp", line)
t.close
m.close
os.rename(tmp, ttv)

# Remove badshows
t = open(tmp, 'w')
with open(bad) as f:
    content = f.readlines()
bap = [x.strip() for x in content] 

with open(ttv) as f:
    content = f.readlines()
all = [x.strip() for x in content] 

for line in all:
    flag = True
    for b in bap:
        if b in line:
            flag = False
            break
    if flag:
         t.write(line + "\n")
t.close
os.rename(tmp, ttv)

不妨让用户在谷歌上搜索“如何用python替换文件中的行”。可能有100种方法可以做到这一点。显然，他/她正在努力学习python。所以给一些基本的提示。有时候用另一种语言做事情会简单得多。不管python中的解决方案是什么，它肯定比它需要的复杂得多。我不明白为什么python很受欢迎。@W.Hunk如果badfiles.txt文件实际上是通过API提供的呢？您的问题非常简单，因此shell脚本可以更好更快地解决它。Python可以编写非常复杂的web应用程序，您可以在shell中完成吗？（也许，但在python中没有那么安全和容易）。应用程序问题不是用python/C/C++/Shell/Perl/Java解决的。如果您了解这些语言中每种语言的基础知识，那么可以根据具体情况轻松选择使用哪种语言。你会有一个很好的理由。希望这能有所帮助。看看python到底没那么糟糕。此外，您还可以改进不存在的功能。目前它每次都在读取文件。读取文件一次并存储在列表中。调用该方法时，请从该列表中进行检查。