Python 使用字典，在另一个文本文件中搜索字符串并打印整行_Python

Python 使用字典，在另一个文本文件中搜索字符串并打印整行

python

Python 使用字典，在另一个文本文件中搜索字符串并打印整行,python,Python,我想从字典中搜索它的一个单词是否在第二个txt文件中。我对以下代码有问题： print 'Searching for known strings...\n' with open('something.txt') as f: haystack = f.read() with open('d:\\Users\\something\\Desktop\\something\\dictionary\\entirelist.txt') as f: for needle in (line.s

我想从字典中搜索它的一个单词是否在第二个txt文件中。我对以下代码有问题：

print 'Searching for known strings...\n'
with open('something.txt') as f:
    haystack = f.read()
with open('d:\\Users\\something\\Desktop\\something\\dictionary\\entirelist.txt') as f:
    for needle in (line.strip() for line in f):
        if needle in haystack:
            print line

with

open

语句不是我写的，我是从以下地方取的：我想打印这行，所以我写的是行而不是针。问题来了：它说，

行没有定义

我的最终目标是查看字典中的任何单词是否在“something.txt”中，如果是，请打印识别单词的行。

看起来您使用了生成器：（line.strip（）表示f中的行），我认为您无法从生成器范围之外（即括号之外）访问内部变量“line”

尝试以下方法：

for line in f:
    if line.strip() in haystack:
        print line

您询问的特定异常是因为

line

在生成器表达式之外不存在。如果要访问它，需要将其保持在与

print

语句相同的范围内，如下所示：

for line in f:
    needle = line.strip()
    if needle in haystack:
        print line

但这不会特别有用。它将是

needle

中的单词加上结尾的换行符。如果要打印出

haystack

中包含

pinder

的行（或多行？），则必须搜索该行，而不仅仅是询问

pinder

是否出现在整个

haystack

中的任何位置

要真正做到你所要求的，你需要在

haystack

的行上循环，并检查每个行是否有

needle

。像这样：

with open('something.txt') as f:
    haystacks = list(f)

with open('d:\\Users\\something\\Desktop\\something\\dictionary\\entirelist.txt') as f:
    for line in f:
        needle = line.strip()
        for haystack in haystacks:
            if needle in haystack:
                print haystack

with open('something.txt') as f:
    haystack = f.read()
with open('d:\\Users\\something\\Desktop\\something\\dictionary\\entirelist.txt') as f:
    for line in f:
        needle = line.strip()
        pattern = '^.*{}.*$'.format(re.escape(needle))
        for match in re.finditer(pattern, haystack, re.MULTILINE):
            print match.group(0)

但是，您可能需要考虑一个巧妙的技巧：如果您可以编写一个正则表达式来匹配包含

needle

的任何完整行，那么您只需要打印出所有匹配项。像这样：

with open('something.txt') as f:
    haystacks = list(f)

with open('d:\\Users\\something\\Desktop\\something\\dictionary\\entirelist.txt') as f:
    for line in f:
        needle = line.strip()
        for haystack in haystacks:
            if needle in haystack:
                print haystack

with open('something.txt') as f:
    haystack = f.read()
with open('d:\\Users\\something\\Desktop\\something\\dictionary\\entirelist.txt') as f:
    for line in f:
        needle = line.strip()
        pattern = '^.*{}.*$'.format(re.escape(needle))
        for match in re.finditer(pattern, haystack, re.MULTILINE):
            print match.group(0)

下面是正则表达式工作原理的示例：

^.*Falco.*$

当然，如果你想不敏感地搜索大小写，或者只搜索完整的单词，等等，你需要做一些小的修改；有关更多信息，请参阅或第三方教程。

您能给我们举一个例子（精简到3行）说明

something.txt

和

entirelist.txt

的样子，以及您需要什么输出吗？因为

line.strip（）

只是一个字符串（字典中的一行，新行已删除），针入行.strip（）的

：

将是该行中的每个字符。所以这不可能是对的。你的第二个效果很好，第三个打印了这样的内容：而且我对不敏感地搜索案例和完整的单词感兴趣，所以我会查看你的链接，单独尝试，看看会发生什么：）谢谢你的帮助+提供备选方案：）@Maxim:对，对不起，

finditer

MatchObject

s，而不仅仅是匹配的字符串。这是非常有用的，但是如果你想看看发生了什么…好吧，我已经编辑了答案。要使正则表达式不区分大小写，只需添加另一个标志（

re.MULTILINE | re.IGNORECASE

）。要只匹配完整的单词，如果你很幸运并且

\b

具有与你想要的单词相同的定义，那就非常容易了；否则会有点麻烦。无论如何，一定要使用Debuggex或其他regex工具进行操作，这比通常的源代码编辑调试周期容易得多。再次感谢！你是如何做到用相同的定义来匹配完整的单词的？此外，我想检查该行的str是否已打印，如果已打印，则不打印。可能吗？我补充道：

needle=''+needle

它似乎对“半完整单词”有效。（否则，如果我写

needle='''+needle+'

，它不会计算第一个（不是问题）和最后一个（这是问题）单词。）@Maxim:请仔细阅读

\b

的功能。假设一个单词的正则表达式定义足够接近，

\bFalco\b

而不仅仅是

Falco

将匹配

Falco

中的

Falco

。或

这是Falco的第三张专辑

，但不在

猎鹰和雪人

中。