Python 返回字符串中单词的字典长度_Python_String_Dictionary

Python 返回字符串中单词的字典长度

python string dictionary

Python 返回字符串中单词的字典长度,python,string,dictionary,Python,String,Dictionary,我需要构建一个函数，以字符串作为输入并返回字典。键是数字，值是包含唯一单词的列表，这些单词的字母数与键相同。例如，如果输入函数如下所示： n_letter_dictionary("The way you see people is the way you treat them and the Way you treat them is what they become") def n_letter_dictionary(my_string): my_string=my_string

我需要构建一个函数，以字符串作为输入并返回字典。
键是数字，值是包含唯一单词的列表，这些单词的字母数与键相同。
例如，如果输入函数如下所示：

n_letter_dictionary("The way you see people is the way you treat them and the Way you treat them is what they become")

def n_letter_dictionary(my_string):
    my_string=my_string.lower().split()
    sample_dictionary={}
    for word in my_string:
        words=len(word)
        sample_dictionary[words]=word
    print(sample_dictionary)
    return sample_dictionary

{2: 'is', 3: 'you', 4: 'they', 5: 'treat', 6: 'become'}

该函数应返回：

{2: ['is'], 3: ['and', 'see', 'the', 'way', 'you'], 4: ['them', 'they', 'what'], 5: ['treat'], 6: ['become', 'people']}

我写的代码如下：

n_letter_dictionary("The way you see people is the way you treat them and the Way you treat them is what they become")

def n_letter_dictionary(my_string):
    my_string=my_string.lower().split()
    sample_dictionary={}
    for word in my_string:
        words=len(word)
        sample_dictionary[words]=word
    print(sample_dictionary)
    return sample_dictionary

{2: 'is', 3: 'you', 4: 'they', 5: 'treat', 6: 'become'}

函数将返回一个字典，如下所示：

n_letter_dictionary("The way you see people is the way you treat them and the Way you treat them is what they become")

def n_letter_dictionary(my_string):
    my_string=my_string.lower().split()
    sample_dictionary={}
    for word in my_string:
        words=len(word)
        sample_dictionary[words]=word
    print(sample_dictionary)
    return sample_dictionary

{2: 'is', 3: 'you', 4: 'they', 5: 'treat', 6: 'become'}

字典不包含所有字母数相同的单词，但只返回字符串中的最后一个。

使用

示例字典[words]=word

覆盖到目前为止放在那里的当前内容。您需要一个列表，并且可以附加到该列表

my_string="a aa bb ccc a bb".lower().split()
sample_dictionary={}
for word in my_string:
    words=len(word)
    if words not in sample_dictionary:
        sample_dictionary[words] = []
    sample_dictionary[words].append(word)
print(sample_dictionary)

相反，您需要：

if words in sample_dictionary.keys():
    sample_dictionary[words].append(word)
else:
    sample_dictionary[words]=[word]

因此，如果这个键有一个值，我将附加到它，否则创建一个新列表

因为您只想在

列表中存储唯一的值，所以实际上使用集合
更有意义。您的代码几乎是正确的，您只需确保如果words
不是词典中的关键字，您就创建了一个set
，但是如果words
已经是词典中的关键字，您就可以添加到set
。如下所示：
def n_letter_dictionary(my_string):
    my_string=my_string.lower().split()
    sample_dictionary={}
    for word in my_string:
        words=len(word)
        if words in sample_dictionary:
            sample_dictionary[words].add(word)
        else:
            sample_dictionary[words] = {word}
    print(sample_dictionary)
    return sample_dictionary

n_letter_dictionary("The way you see people is the way you treat them and the Way you treat them is what they become")

输出
{2: set(['is']), 3: set(['and', 'the', 'see', 'you', 'way']), 
 4: set(['them', 'what', 'they']), 5: set(['treat']), 6: set(['become', 'people'])}

您可以使用collections
库中的defaultdict
。您可以使用它为字典的值部分创建一个默认类型，在本例中是一个列表，然后根据单词的长度添加到它
from collections import defaultdict

def n_letter_dictionary(my_string):
    my_dict = defaultdict(list)
    for word in my_string.split():
        my_dict[len(word)].append(word)

    return my_dict

您仍然可以在没有defaultdict的情况下执行此操作，但长度会稍长一些
def n_letter_dictionary(my_string):
    my_dict = {}
    for word in my_string.split():
        word_length = len(word)
        if word_length in my_dict:
            my_dict[word_length].append(word)
        else:
            my_dict[word_length] = [word]

    return my_dict

确保值列表中没有重复项，而不使用set（）
。但是，请注意，如果您的值列表很大，并且您的输入数据相当独特，那么您将遇到性能挫折，因为检查该值是否已存在于列表中只会在遇到该值时提前退出
from collections import defaultdict

def n_letter_dictionary(my_string):
    my_dict = defaultdict(list)
    for word in my_string.split():
        if word not in my_dict[len(word)]:
            my_dict[len(word)].append(word)

    return my_dict

# without defaultdicts
def n_letter_dictionary(my_string):
    my_dict = {}                                  # Init an empty dict
    for word in my_string.split():                # Split the string and iterate over it
        word_length = len(word)                   # Get the length, also the key
        if word_length in my_dict:                # Check if the length is in the dict
            if word not in my_dict[word_length]:  # If the length exists as a key, but the word doesn't exist in the value list
                my_dict[word_length].append(word) # Add the word
        else:
            my_dict[word_length] = [word]         # The length/key doesn't exist, so you can safely add it without checking for its existence

因此，如果你有一个高频率的副本和一个简短的单词列表来扫描，这种方法是可以接受的。例如，如果您有一个随机生成的单词列表，其中只包含字母字符的排列，导致值列表膨胀，那么扫描这些单词将变得非常昂贵。
itertools groupby
是实现这一点的完美工具
from itertools import groupby
def n_letter_dictionary(string):
    result = {}
    for key, group in groupby(sorted(string.split(), key = lambda x: len(x)), lambda x: len(x)):
        result[key] = list(group)
    return result

印刷字母词典（“你看人的方式就是你对待他们的方式，你对待他们的方式就是他们变成什么样子”）
你的代码的问题是你只是把最新的单词放到字典里。相反，您必须将该单词添加到具有相同长度的单词集合中。在您的示例中，这是一个列表
，但是集合
似乎更合适，假设顺序不重要
def n_letter_dictionary(my_string):
    my_string=my_string.lower().split()
    sample_dictionary={}
    for word in my_string:
        if len(word) not in sample_dictionary:
            sample_dictionary[len(word)] = set()
        sample_dictionary[len(word)].add(word)
    return sample_dictionary

您可以通过使用以下命令将其缩短一点：
或使用，但为此，您必须按长度排序，首先：
    words_sorted = sorted(my_string.lower().split(), key=len)
    return {k: set(g) for k, g in itertools.groupby(words_sorted, key=len)}

示例（三种实现的结果相同）：
我想出的最短解决方案使用了defaultdict
：
from collections import defaultdict

sentence = ("The way you see people is the way you treat them"
            " and the Way you treat them is what they become")

现在算法是：
wordsOfLength = defaultdict(list)
for word in sentence.split():
    wordsOfLength[len(word)].append(word)

现在wordsOfLength
将保存所需的字典。
哦，这更好，我们的其他解决方案将引发KeyError…如何对列表进行排序['the'，'way'，'you'，'see'，'the'，'way'，'you'，'and'，'the'，'way'，'you']只需执行一些列表。sort（）
如果您想按字母顺序调用它，您实际上不需要.keys（）
嗨，非常感谢您的帮助。尽管如此，我还是得到了字典中已经存在的键的重复值。你知道一种不使用set（）防止重复单词的方法吗？为什么不使用set（）？当然有办法。将else:
替换为elif-word not in sample\u dictionary[words]：
--然后它将检查此条件确实，让我快速更正。另外，key=lambda x:len（x）
与刚才的key=len
；-）相同是的，注意到了，谢谢！仅仅为了取悦groupby
，排序是不必要的。重新考虑这一方面。完全正确，当然删除重复项更有意义！重新考虑变量的名称words
。这相当于字长
或类似的字长。非常感谢，我仍然得到字典中已经存在的键的重复值。有没有一种方法可以在不使用set（）的情况下删除重复的单词？我添加了一节，介绍如何在不使用set（）
的情况下确保没有重复的单词。我正在尝试使用第一种方法（不使用defaultdict），方法是在“for the word in my_string.split（）：”之后添加一个“if word not in my_dict”（如果单词不在my_dict中），但对于重复的单词，我仍然得到相同的输出。您能帮我介绍一下不带defaultdict的方法吗？我添加了一个不带defaultdict
的示例，但是在列表中没有使用set（）
就有了唯一的结果。如果您有If word不在my_dict
中，它将始终返回True
，因为word
在值中，并且您的语句仅检查my_dict
的键。