从python 3中多次出现的字符串返回单词？_Python

从python 3中多次出现的字符串返回单词？

python

从python 3中多次出现的字符串返回单词？,python,Python,我试图从头开始编写一个函数，f（x，n），它返回排序列表中出现次数为n的单词例如： f("the apple the banana the apple", 2) >>> ['apple', 'the'] f("the kid jumped off the roof", 1) >>> ['jumped', 'kid', 'off', 'roof', the'] 因为和苹果是唯一出现两次或两次以上的单词另一个例子： f("the apple the ban

我试图从头开始编写一个函数，

f（x，n）

，它返回排序列表中出现次数为

的单词

例如：

f("the apple the banana the apple", 2)
>>> ['apple', 'the']

f("the kid jumped off the roof", 1)
>>> ['jumped', 'kid', 'off', 'roof', the']

因为

和苹果
是唯一出现两次或两次以上的单词
另一个例子：
f("the apple the banana the apple", 2)
>>> ['apple', 'the']

f("the kid jumped off the roof", 1)
>>> ['jumped', 'kid', 'off', 'roof', the']

到目前为止，我一直在尝试但没有成功：
def f(x, n):
   words = list(x.split())
   a= ""
   for word in words:
     if len(word) >= n:
       a += word
         return(list(word))

您提供的函数中的问题是，您实际上是在检查每个单词的频率（通过执行if len（word）…
），而不是检查字符串中的频率
您可以简单地使用和类似的：
from collections import Counter

def f(string, n):
    count = Counter(string.split()).items()
    return [i for (i, j) in count if j >= n]

print(f("the apple the banana the apple", 2))

输出：
['apple', 'the']

['the', 'apple']

您提供的函数中的问题是，您实际上是在检查每个单词的频率（通过执行if len（word）…
），而不是检查字符串中的频率
您可以简单地使用和类似的：
from collections import Counter

def f(string, n):
    count = Counter(string.split()).items()
    return [i for (i, j) in count if j >= n]

print(f("the apple the banana the apple", 2))

输出：
['apple', 'the']

['the', 'apple']

这是你的朋友。试着这样做：
from collections import Counter

def f(x, n):
   words = x.split()
   c = Counter(words)
   return [word for word, v in c.items() if v >= n]

然后：
>>> print(f("the kid jumped off the roof", 1))
['the', 'kid', 'off', 'roof', 'jumped']

这是你的朋友。试着这样做：
from collections import Counter

def f(x, n):
   words = x.split()
   c = Counter(words)
   return [word for word, v in c.items() if v >= n]

然后：
>>> print(f("the kid jumped off the roof", 1))
['the', 'kid', 'off', 'roof', 'jumped']

您可以使用字符串和内置函数来实现这一点：
>>> def f(x, n):
...     return sorted(set(s for s in x.split() if x.count(s) >= n))
... 
>>> s1 = "the apple the banana the apple"
>>> s2 = "the kid jumped off the roof"
>>> f(s1, 2)
['apple', 'the']
>>> f(s2, 1)
['jumped', 'kid', 'off', 'roof', 'the']

您可以使用字符串和内置函数来实现这一点：
>>> def f(x, n):
...     return sorted(set(s for s in x.split() if x.count(s) >= n))
... 
>>> s1 = "the apple the banana the apple"
>>> s2 = "the kid jumped off the roof"
>>> f(s1, 2)
['apple', 'the']
>>> f(s2, 1)
['jumped', 'kid', 'off', 'roof', 'the']

这将迭代拆分后生成的列表中存在的项目，并将每个项目添加到字典中，如果该项目在该字典中不存在，则其计数为1
。如果该项已存在，则它会将其相应的值增加1
。单词的作用就像一个键，计数的作用就像一个值
def f(x, n):
    words = x.split()
    d = {}
    for word in words:
        if word in d:
            d[word] += 1
        else:
            d[word] = 1     
    print [i for i,j in d.items() if d[i] >= n]        

f("the apple the banana the apple", 2)

输出：
['apple', 'the']

['the', 'apple']

这将迭代拆分后生成的列表中存在的项目，并将每个项目添加到字典中，如果该项目在该字典中不存在，则其计数为1
。如果该项已存在，则它会将其相应的值增加1
。单词的作用就像一个键，计数的作用就像一个值
def f(x, n):
    words = x.split()
    d = {}
    for word in words:
        if word in d:
            d[word] += 1
        else:
            d[word] = 1     
    print [i for i,j in d.items() if d[i] >= n]        

f("the apple the banana the apple", 2)

输出：
['apple', 'the']

['the', 'apple']

这是一个有效的解决方案

既然您提到了“从头开始”，我将在不导入任何模块的情况下编写此代码


逻辑：

一,。反复浏览单词列表（仅一次）{O（n）complexity}，并使用字典记录出现的次数。字典是理想的，因为你不能有重复的


二,。在字典中迭代一次{O（n）complexity}并检查该值是否大于n->如果大于，则将其附加到将返回的列表中（如果列表中还没有）
这是一个有效的解决方案

既然您提到了“从头开始”，我将在不导入任何模块的情况下编写此代码


逻辑：

一,。反复浏览单词列表（仅一次）{O（n）complexity}，并使用字典记录出现的次数。字典是理想的，因为你不能有重复的


二,。在字典中迭代一次{O（n）complexity}并检查该值是否大于n->如果大于，则将其附加到将返回的列表中（如果列表中还没有）
我确实喜欢这里提供的答案，但我试图测试计数器（）
和列表（）
变量之间的结果差异，因此我实现了这两个函数，以便它们返回一个排序数组，其中包含单词和计数数，以便更好地比较结果：
from collections import Counter
# this is the Counter version returned sorted
def f(x,n): return sorted(["%s:%s" % (w,c) for w,c in Counter(x.split()).most_common() if c >= n])
# this is the list version returned sorted
def g(x, n): return sorted(list(set("%s:%s" % (s, x.count(s)) for s in x.split() if x.count(s) >= n)))

现在我给这两个函数输入了500个单词的文本。我感到惊讶的是，事实上存在差异
两个版本有一个共同的缺陷，就是它们没有考虑标点符号。因此，如果我在文本中有Apple
，它与Apple、
或Apple.
或Apple不是同一个词，等等。在计算单词之前，您可以轻松地替换/删除所有标点符号
另外，Apple
与Apple
不一样，这可能是预期的，但如果不是，您还必须.lower（）
字符串
但最大的区别在于计数本身。实际上，这里的list（）
版本失败了，因为它会统计单词，如果单词出现在另一个单词中，它会自动删除。因此，函数f（）
计算了8次，这是正确的，但函数g（）
显示了38次计数-显然x.count不仅返回单词，还返回子匹配项。太糟糕了，这使得list（）
版本失败
用一句难懂的句子来试一下，结果如下：
>>> print f("This test so nice, is like ice! Test... likely;",1)
['Test...:1', 'This:1', 'ice!:1', 'is:1', 'like:1', 'likely;:1', 'nice,:1', 'so:1', 'test:1']

>>> print g("This test so nice, is like ice! Test... likely;",1)
['Test...:1', 'This:1', 'ice!:1', 'is:2', 'like:2', 'likely;:1', 'nice,:1', 'so:1', 'test:1']

在这里您可以看到该行为，list（）
版本实际上计数为和大约两倍，因为它们包含在这个列表中并且很可能
所以获胜者是：
from collections import Counter
# this is the Counter version but result returned sorted
def f(x,n): return sorted([w for w,c in Counter(x.split()).most_common() if c >= n])

现在，这仍然没有考虑到大小写和标点符号。如果您想要一个结果，正如我所期望的那样，您可以添加字符串
模块以获得预期的结果：
from collections import Counter
import string
# return correct result lowercase without punctuation and sorted
def f(x,n): return sorted([w for w,c in Counter(x.translate(None, string.punctuation).lower().split()).most_common() if c >= n])

.translate（无，字符串.标点符号）.lower（）在这里发挥了所有的魔力，结果是：
>>> print f("This test so nice, is like ice! Test... likely;",1)
['ice', 'is', 'like', 'likely', 'nice', 'so', 'test', 'this']

伙计们，我喜欢一个线性函数：）但是如果python初学者在这里提问，我们不应该太多地关注我们的偏好，而应该关注一个代码，该代码能够很好地洞察python以及为什么事情会像它们那样运行，因此对于所选择的答案，初学者可读
 我确实喜欢这里提供的答案，但我试图测试计数器（）
和列表（）
变量之间的结果差异，因此我实现了这两个函数，以便它们返回一个排序数组，其中包含单词和计数数，以便更好地比较结果：
from collections import Counter
# this is the Counter version returned sorted
def f(x,n): return sorted(["%s:%s" % (w,c) for w,c in Counter(x.split()).most_common() if c >= n])
# this is the list version returned sorted
def g(x, n): return sorted(list(set("%s:%s" % (s, x.count(s)) for s in x.split() if x.count(s) >= n)))

现在我给这两个函数输入了500个单词的文本。我感到惊讶的是，事实上存在差异
两个版本有一个共同的缺陷，就是它们没有考虑标点符号。所以如果我有<代码