Python 从文本文件制作综合词典？_Python_Dictionary_Key

Python 从文本文件制作综合词典？

python dictionary

Python 从文本文件制作综合词典？,python,dictionary,key,Python,Dictionary,Key,为.txt文件制作词典的最简单方法是什么？文本文件中的每个单词都用空格分隔。文件中的每个单词都应该是字典中的一个键，其值是文件中某个点上跟随它的所有单词，包括重复因此，如果文本文件是：我喜欢猫和狗。狗喜欢猫。我更喜欢狗字典将是： d = {'I': ['like', 'like'], 'like': ['cats', 'cats', 'dogs'], 'cats': ['and', '. ']... …直到所有的文字都变成钥匙编辑：很抱歉，我没有显示到目前为止的代码，因为我是一个极端

为.txt文件制作词典的最简单方法是什么？文本文件中的每个单词都用空格分隔。文件中的每个单词都应该是字典中的一个键，其值是文件中某个点上跟随它的所有单词，包括重复

因此，如果文本文件是：我喜欢猫和狗。狗喜欢猫。我更喜欢狗

字典将是：

d = {'I': ['like', 'like'], 'like': ['cats', 'cats', 'dogs'], 'cats': ['and', '. ']...

…直到所有的文字都变成钥匙

编辑：很抱歉，我没有显示到目前为止的代码，因为我是一个极端的初学者，几乎不知道我在做什么。而且，它看起来很糟糕。但是，这里有一些：

def textDictionary(fileName):
    p = open(fileName)
    f = p.read()
    w = f.split()
    newDictionary = {}
    for i in range(len(w)):
        newDictionary[w[i]] = w[i+1]
    return newDictionary

现在这当然不应该做我想要的一切，但它至少应该返回：

{'I'：'like'，'like'：'cats'，'cats'：'和'…}

…等等

然而，它给了我完全不同的东西。

对我来说，这似乎是一份适合违约者的工作。首先，您需要决定如何拆分单词-为简单起见，我将只拆分空格，但这可能是正则表达式的工作，因为您有标点符号：

from collections import defaultdict
d = defaultdict(list)

with open('textfile') as fin:
    data = fin.read()
    words = data.split()

for i, w in words:
    try:
        d[w].append(words[i+1])
    except IndexError:
        pass  # last word has no words which follow it...

最好的方法是在两个并发循环中迭代单词，偏移一个循环。为此，在原始列表和列表[1:]上使用zip

这个迭代将是dict的关键和价值。或者更确切地说，在本例中是defaultdict。使用list创建的defaultdict会自动使用空列表初始化每个键。因此，您可以根据需要追加，而无需设置初始值

from collections import defaultdict

def textDictionary(fileName):
    with open(fileName) as p:  # with to open and automatically close
        f = p.read()
        w = f.split()

    newDictionary = defaultdict(list)
    # defaultdict initialized with list makes each element a list automatically,
    # this is great for `append`ing

    for key, value in zip(w, w[1:]):
        newDictionary[key].append(value)  # easy append!

    return dict(newDictionary)  # dict() changes defaultdict to normal

文件：

我喜欢猫，狗喜欢猫

{'I': ['like'], 'and': ['dogs'], 'cats': ['and'], 'like': ['cats', 'cats'], 'dogs': ['like']}

我注意到在这种情况下，like后面跟着猫两次。如果只需要一个，请使用set而不是list初始化defaultdict，并使用.add而不是.append

从文件中读取行后，可以执行以下操作：

line = 'I like cats and dogs. Dogs like cats. I like dogs more.'
line = line.replace('.', ' .') #To make sure 'dogs.' or 'cats.' do not become the keys of the dictionary.
op = defaultdict(list)
words = line.split()
for i, word in enumerate(words):
    if word not in '.': #To make sure '.' is not a key in the dictionary
        try:
            op[word].append(words[i+1])
        except IndexError:
            pass

唯一需要明确注意的是句号。注释解释了代码是如何实现的。上述代码导致：

{'and': ['dogs'], 'like': ['cats', 'cats', 'dogs'], 'I': ['like', 'like'], 'dogs': ['.', 'more'], 'cats': ['and', '.'], 'Dogs': ['like'], 'more': ['.']}

虽然我们中的许多人都很乐意帮助回答您的问题，但如果您向我们展示您已经尝试过的内容，我们更有可能理解问题并提供有用的答案。这里有一些关于如何提供代码的信息。好吧，这是可行的，但会为标点符号不同的单词（如“cats”和“cats”）生成单独的键，这是我不想要的。但是，我确实希望根据标点符号区分值，即“like”：[“cats”，以及“cats.”。另外，我可以做一个表示句子结尾/开头的键吗？比如，会有一个键“&”，它的值是在文件中某个点开始一个句子的任何单词？这个键“&”也会出现在其他键的值中，即“cats”：[“And”，“And”&']将是一个条目。我有什么办法可以实现这些改变吗？很高兴听到这个消息！至于你的新问题，你可能应该继续并通过点击按钮作为新问题提问。堆栈溢出模型面向单个问题和特定答案。