Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/extjs/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
基于另一个文件(Python)从文件中删除短语_Python - Fatal编程技术网

基于另一个文件(Python)从文件中删除短语

基于另一个文件(Python)从文件中删除短语,python,Python,如何在python中实现这一点 包含错误短语 Go away Don't do that Stop it I don't know why you do that. Go away. I was wondering what you were doing. You seem nice allphrases.txt包含 Go away Don't do that Stop it I don't know why you do that. Go away. I was wondering wh

如何在python中实现这一点

包含错误短语

Go away
Don't do that
Stop it
I don't know why you do that. Go away.
I was wondering what you were doing.
You seem nice
allphrases.txt包含

Go away
Don't do that
Stop it
I don't know why you do that. Go away.
I was wondering what you were doing.
You seem nice
我希望allphrases.txt清除badphages.txt中的行

这在bash中是微不足道的

cat badfiles.txt | while read b
do
cat allphrases.txt | grep -v "$b" > tmp
cat tmp > allphrases.txt
done
哦,你以为我没看过也没试过。我找了一个多小时

这是我的密码:

# Files  
ttv = "/tmp/tv.dat"  
tmp = "/tmp/tempfile"  
bad = "/tmp/badshows"  
坏文件已存在
…这里的代码创建ttv

# Function grep_v  
def grep_v(f,str):  
     file = open(f, "r")   
     for line in file:  
          if line in str:  
               return True  
     return False  

t = open(tmp, 'w')  
tfile = open(ttv, "r")   
for line in tfile:  
     if not grep_v(bad,line):  
          t.write(line)  
tfile.close  
t.close  
os.rename(tmp, ttv)  

首先,谷歌如何用python读取文件:

您可能会得到如下结果:

使用此选项读取列表中的两个文件

with open('badphrases.txt') as f:
    content = f.readlines()
badphrases = [x.strip() for x in content] 

with open('allphrases.txt') as f:
    content = f.readlines()
allphrases = [x.strip() for x in content] 
现在,两个内容都在列表中

迭代所有短语,检查其中是否存在来自不良短语的短语

在这点上,你可以考虑谷歌:

  • 如何迭代python列表
  • 如何检查另一个字符串python中是否存在字符串
从这些地方获取代码,构建一个暴力算法,如下所示:

for line in allphrases:
    flag = True
    for badphrase in badphrases:
        if badphrase in line:
            flag = False
            break
    if flag:
        print(line)
如果您能理解此代码,则会注意到需要将“打印”替换为“输出到文件”:

  • 现在谷歌如何打印到文件python
然后考虑如何改进算法。祝你一切顺利

更新:

@COLDSPEED建议您可以使用简单的google -如何在python中替换文件中的行:

您可能会得到这样的结果:

这同样有效。

解决方案还不错

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

import feedparser, os, re

# Files
h = os.environ['HOME']
ttv = h + "/WEB/Shows/tv.dat"
old = h + "/WEB/Shows/old.dat"
moo = h + "/WEB/Shows/moo.dat"
tmp = h + "/WEB/Shows/tempfile"
bad = h + "/WEB/Shows/badshows"

# Function not_present
def not_present(f,str):
     file = open(f, "r") 
     for line in file:
          if str in line:
               return False
     return True

# Sources (shortened)
sources = ['http://predb.me/?cats=tv&rss=1']

# Grab all the feeds and put them into ttv and old
k = open(old, 'a')
f = open(ttv, 'a')
for h in sources:
     d = feedparser.parse(h)
     for post in d.entries:
          if not_present(old,post.link):
               f.write(post.title + "|" +  post.link + "\n")
               k.write(post.title + "|" +  post.link + "\n")
f.close
k.close

# Remove shows without [Ss][0-9] and put them in moo
m = open(moo, 'a')
t = open(tmp, 'w')
file = open(ttv, "r") 
for line in file:
     if re.search(r's[0-9]', line, re.I) is None:
          m.write(line)
#          print("moo", line)
     else:
          t.write(line)
#          print("tmp", line)
t.close
m.close
os.rename(tmp, ttv)

# Remove badshows
t = open(tmp, 'w')
with open(bad) as f:
    content = f.readlines()
bap = [x.strip() for x in content] 

with open(ttv) as f:
    content = f.readlines()
all = [x.strip() for x in content] 

for line in all:
    flag = True
    for b in bap:
        if b in line:
            flag = False
            break
    if flag:
         t.write(line + "\n")
t.close
os.rename(tmp, ttv)

不妨让用户在谷歌上搜索“如何用python替换文件中的行”。可能有100种方法可以做到这一点。显然,他/她正在努力学习python。所以给一些基本的提示。有时候用另一种语言做事情会简单得多。不管python中的解决方案是什么,它肯定比它需要的复杂得多。我不明白为什么python很受欢迎。@W.Hunk如果badfiles.txt文件实际上是通过API提供的呢?您的问题非常简单,因此shell脚本可以更好更快地解决它。Python可以编写非常复杂的web应用程序,您可以在shell中完成吗?(也许,但在python中没有那么安全和容易)。应用程序问题不是用python/C/C++/Shell/Perl/Java解决的。如果您了解这些语言中每种语言的基础知识,那么可以根据具体情况轻松选择使用哪种语言。你会有一个很好的理由。希望这能有所帮助。看看python到底没那么糟糕。此外,您还可以改进不存在的功能。目前它每次都在读取文件。读取文件一次并存储在列表中。调用该方法时,请从该列表中进行检查。