Python-检查文件夹中文件中没有的单词_Python_File_Find

Python-检查文件夹中文件中没有的单词

python file

Python-检查文件夹中文件中没有的单词,python,file,find,Python,File,Find,我正在制定一个脚本来检查某个单词是否在路径中我面临的问题是，如果没有单独的文件结果，我似乎无法获得特定的结果 Example: path = "/opt/webserver/logs/" file1.txt file2.txt file3.txt .... ... .. file10000.txt 代码如下： #checkWordinFiles.py import os words = [ "Apple", "Oranges", "Starfruit" ] path = "/opt/we

我正在制定一个脚本来检查某个单词是否在路径中

我面临的问题是，如果没有单独的文件结果，我似乎无法获得特定的结果

Example:
path = "/opt/webserver/logs/"

file1.txt
file2.txt
file3.txt
....
...
..
file10000.txt

代码如下：

#checkWordinFiles.py
import os

words = [ "Apple", "Oranges", "Starfruit" ]
path = "/opt/webserver/logs"
files = os.listdir(path)
for infile in files:
        for word in words:
                if word not in infile:
                        print word

问题是这个词并不存在于每个文件中。这个脚本将打印出文件中没有的单词，但我只想打印任何文件中没有的单词

我希望脚本打印出路径中任何文件中都没有的单词

每次都有点像“grep Apple*”

有什么想法吗？

以下是您的方法：

for word in words:
    word_found = False
    for infile in files:
        if word in infile:
            word_found = True
            break
    if not word_found:
        print "%s not in any file" % word

编辑：我没有注意文件阅读登录。正如@Karl提到的，您应该读取路径中的所有文件，然后搜索文件中的单词。您可以使用

os.walk（）

获取路径中所有文件的列表，包括子目录中的文件。

概念上的问题是

os.listdir

生成目录中文件名的列表；因此，您正在搜索文件名称中的单词，而不是文件内容。要解决此问题，您需要使用文件名打开并读取文件

炫耀方式：

import os

def contents(filename):
    with file(filename) as f: return f.read()

words = set(["Apple", "Oranges", "Starfruit"])
path = "/opt/webserver/logs"
filenames = os.listdir(path)
print words.difference(
    reduce(lambda x, y: x.union(y), (
        # Note that the following assumes we really want to treat the file
        # as a sequence of words, and not do general substring searching.
        # For example, it will miss "apple" if the file contains "pineapples".
        set(contents(filename).split()).intersection(words)
        for filename in filenames
        # In fact, the .intersection call there is redundant, but might improve
        # performance and will probably save memory at least.
    ))
)

假设您想在/path/to/file中搜索一个单词“foo”

做

修改它以适合您。您可以使用os.listdir（）获取文件名并相应地进行操作。

因此我需要打开每个文件，查找是否存在，然后将其标记为已找到。如果找不到，则打印单词。有一种更优雅的方法：对于每个文件，创建一个

集合，其中包含文件中找到的单词
中的所有单词。以这些集合的并集以及该并集与原始单词集之间的差异为例。我会写出来的…谢谢，我熟悉lambda，但是reduce（）做什么呢？基本上它将操作散布在序列的元素之间（这里，我的序列来自生成器表达式）。所以reduce（lambda x，y:x*y，…）
（不要这样写；操作符
模块中有一个表示乘法的助手）为您提供元素的乘积。此外，首选sum
函数<代码>减少
通常也被视为炫耀。：）有时需要初始元素作为第三个参数。如往常一样，请参阅内置文档（help（reduce））了解详细信息。对于否决投票的人，请让我知道我做错了什么，或者您希望看到什么。我很想把它修好。
import os

def contents(filename):
    with file(filename) as f: return f.read()

words = set(["Apple", "Oranges", "Starfruit"])
path = "/opt/webserver/logs"
filenames = os.listdir(path)
print words.difference(
    reduce(lambda x, y: x.union(y), (
        # Note that the following assumes we really want to treat the file
        # as a sequence of words, and not do general substring searching.
        # For example, it will miss "apple" if the file contains "pineapples".
        set(contents(filename).split()).intersection(words)
        for filename in filenames
        # In fact, the .intersection call there is redundant, but might improve
        # performance and will probably save memory at least.
    ))
)

for line in open("/path/to/file"):
    if "foo" in line:
         print "hurray. you found it"