Python 在字符串中查找重复项，并仅为重复项返回单个结果_Python_String

Python 在字符串中查找重复项，并仅为重复项返回单个结果

python string

Python 在字符串中查找重复项，并仅为重复项返回单个结果,python,string,Python,String,我在这里看到了很多例子，但我没有找到一个适合我的场景我试着用一个字符串，比如： string = "Hi my Name is Bill, Bill likes coding, coding is fun" 并仅返回每个副本的1值因此，输出如下（忽略标点符号）：如何在Python3中实现这一点？将字符串拆分为所有单词，然后只打印出现多次的单词（count>1）：产出： is Bill coding 将字符串拆分为单词。根据不同的需求，有不同的实现方法。这里有一个方法： words =

我在这里看到了很多例子，但我没有找到一个适合我的场景

我试着用一个字符串，比如：

string = "Hi my Name is Bill, Bill likes coding, coding is fun"

并仅返回每个副本的1值

因此，输出如下（忽略标点符号）：

如何在Python3中实现这一点？将字符串拆分为所有单词，然后只打印出现多次的单词（

count>1

）：

产出：

is
Bill
coding

将字符串拆分为单词。根据不同的需求，有不同的实现方法。这里有一个方法：

words = re.findall('\w+', string)

计算单词的频率：

word_counts = collections.Counter(words)

获取多次出现的所有单词：

result = [word for word in word_counts if word_counts[word] > 1]

使用

re

替换标点符号

import string
import re


text = "Hi my Name is Bill, Bill likes coding, coding is fun"

regex = re.compile('[%s]' % re.escape(string.punctuation))
out = regex.sub(' ', text)

使用

计数器

计数：

from collections import Counter

out = out.split()

counter = Counter(out)

ans = [i[0] for i in counter.items() if i[1] >1]

print(ans)

如果我猜对了，你想过滤掉重复的吗？如果是这样，你可以这样做

string = "Hi my Name is Bill, Bill likes coding, coding is fun"
string = string.replace(',' , '')
string = list(set(string.split()))
string = '\n'.join(string)
print(string)

您可以尝试使用正则表达式来找出正确的单词，忽略标点符号，试试这个

import re
import collections
sentence="Hi my Name is Bill, Bill likes coding, coding is fun"
wordList = re.sub("[^\w]", " ",  sentence).split()
print [item for item, count in collections.Counter(wordList).items() if count > 1]

集合应该能够找出重复的地方。

那么“是”呢？你可以使用

集合来实现这一点。到目前为止，您的代码是什么样子的？@HåkenLid使用集
可能会让人困惑，因为这样会删除重复项，从而删除获得所需结果所需的计数信息。除非我忽略了使用set
s的另一种方式，否则请详细说明。这是一个很好的解决方案，我将自己限制在str.split
并尝试先替换标点符号。用于共享解决方案的Thx。或者result=[逐字计算，按单词计数\u counts.items（）如果计数>1]
？没有太多的简洁，但现在它不需要重新进行word\u计数
查找。否则，答案很好-我希望在这些解决方案中有一个计数器，或者我将发布一个：）@dwanderson：我避免使用items（）
的主要原因是我不知道这是Python 2还是Python 3。在Python 2中，我更喜欢使用iteritems（）
，尽管这并不重要。我写这篇文章的方式就是不用考虑它。啊，我本来打算编辑我的评论，提到iteritems（）
，如果它是python2，但是我变懒了。那就足够公平了；再次强调，我确信这种差异是可以忽略不计的，除非它是一本大得离谱的字典，即使如此，我也不知道事情是如何运作的（在py2中调用.items（）
，将是非常浪费的）。我接受你的理由（不管它值多少钱，例如，0）。我刚刚在我的终端上运行了它，它工作了，它给你带来了什么错误？它运行正常，只是没有消除副本，也没有打印副本。。。（这是要求）哦，很抱歉，我没有喝我早上喝的咖啡，所以。。。正是我的答案
def result(x): #input should be the string
    repeated = []
    listed = x.split()
    for each in listed:
        number = listed.count(each)
        if number > 1:
            repeated.append(each)

    return set(repeated) #there can't be repeated values in a set

string = "Hi my Name is Bill, Bill likes coding, coding is fun"
string = string.replace(',' , '')
string = list(set(string.split()))
string = '\n'.join(string)
print(string)

import re
import collections
sentence="Hi my Name is Bill, Bill likes coding, coding is fun"
wordList = re.sub("[^\w]", " ",  sentence).split()
print [item for item, count in collections.Counter(wordList).items() if count > 1]

def result(x): #input should be the string
    repeated = []
    listed = x.split()
    for each in listed:
        number = listed.count(each)
        if number > 1:
            repeated.append(each)

    return set(repeated) #there can't be repeated values in a set