Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/loops/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何在文本文件中找到重复的单词?_Python_Loops_For Loop_Duplicates - Fatal编程技术网

Python 如何在文本文件中找到重复的单词?

Python 如何在文本文件中找到重复的单词?,python,loops,for-loop,duplicates,Python,Loops,For Loop,Duplicates,我的电脑上有一个文件,里面有一首很长的诗,我想看看每行是否有重复的单词(因此它被标点符号分割) 我有那么多,我不想使用模块或计数器,我更喜欢使用循环。有什么想法吗 file_str = input("Enter poem: ") my_file = open(file_str, "r") words = file_str.split(',' or ';') 有循环:/ 就我们俩在刺穿上分道扬镳 def Counter(text): d = {} for word in text.s

我的电脑上有一个文件,里面有一首很长的诗,我想看看每行是否有重复的单词(因此它被标点符号分割)

我有那么多,我不想使用模块或计数器,我更喜欢使用循环。有什么想法吗

file_str = input("Enter poem: ")
my_file = open(file_str, "r")
words = file_str.split(',' or ';')
有循环:/

就我们俩在刺穿上分道扬镳

def Counter(text):
   d = {}
   for word in text.split():
       d[word]  = d.get(word,0) + 1
   return d

可以使用集合跟踪看到的项目和重复项:

matches = re.split("[!.?]",my_corpus)
for match in matches:
    print Counter(match)
对于这种文件

>>> words = 'the fox jumped over the lazy dog and over the bear'.split()
>>> seen = set()
>>> dups = set()
>>> for word in words:
        if word in seen:
            if word not in dups:
                print(word)
                dups.add(word)
        else:
            seen.add(word)


the
over
这将检查整首诗歌

A hearth came to us from your hearth
foreign hairs with hearth are same are hairs
输出:

lst = []
with open ("coz.txt") as f:
    for line in f:
        for word in line.split(): #splited by gaps (space)
            if word not in lst:
                lst.append(word)
            else:
                print (word)
正如你看到的,这里有两个
炉子
,因为在整首诗中有三个
炉子

逐行检查

>>> 
hearth
hearth
are
hairs
>>> 
解决了!!! 我可以用工作程序来解释

sam.txt的文件内容

sam.txt

您好,这里是star您好,数据是Hello,因此您可以移动到 你好

输出:

file_content = []
resultant_list = []
repeated_element_list = []
with open(file="sam.txt", mode="r") as file_obj:
  file_content = file_obj.readlines()
  
print("\n debug the file content ",file_content)

for line in file_content:
  temp = line.strip('\n').split(" ")    # This will strip('\n') and split the line with spaces and stored as list
  for _ in temp:
    resultant_list.append(_)
  
print("\n debug resultant_list",resultant_list)

#Now this is the main for loop to check the string with the adjacent string
for ii in range(0, len(resultant_list)):
  # is_repeated will check the element count is greater than 1. If so it will proceed with identifying duplicate logic
  is_repeated = resultant_list.count(resultant_list[ii])
  if is_repeated > 1:
    if ii not in repeated_element_list:
      for2count = ii + 1
      #This for loop for shifting the iterator to the adjacent string
      for jj in range(for2count, len(resultant_list)):
        if resultant_list[ii] == resultant_list[jj]:
          repeated_element_list.append(resultant_list[ii])
          
print("The repeated strings are {}\n and total counts {}".format(repeated_element_list, len(repeated_element_list)))

谢谢

您为什么不使用柜台?计数器是正确的解决方案…编码时,请不要每次都决定“不想使用”实际解决方案。你在试图解决问题,而不是扔掉一个解决方案。你只想逐行检查吗?还是整首诗?收藏怎么样。计数器?哈哈,他明确表示他不想要它。。。所以这里有一个自定义实现:P。。。但这是正确的答案。例如,imhoSets不是最清晰的解决方案,行可以以“Word”开头,以“Word”结尾。Python和set的情况不同。但对我们来说,它们是相同的。所以,你们的集合可以有“word”和“word”。但这些对我们来说都是一样的,这不是最清楚的solution@JoranBeasley它们对我们来说是重复的,而不是Python。在你的现实生活中,你实际上会说:“杰克”和“杰克”不是同一个词。是的,它们在编程空间中是截然不同的词。。。请注意,问题陈述显然没有提到大小写,如果没有提到,则假定您正在寻找区分大小写的匹配项。。。如果问题语句中包含不区分大小写的匹配项,则明确声明需要该匹配项。。。默认的解释确实是,在对OP可能会考虑一个不同的单词的解释性解释中,它不会被丢失。请关注问题的实质,即如何仅使用循环和本机对象识别列表中的重复项。OP想要如何为单词建立一个等价的类是一个任意的选择,由他或她自己决定(与所问的中心问题无关)。当你可以使用O(1)哈希表搜索和dict或set来代替时,使用线性列表搜索几乎从来都不是一个好的建议。
with open (r"specify the path of the file") as f:
    data =  f.read()
    if(set([i for i in data if f.count(f)>1])):
        print "Duplicates found"
    else:
        print "None"
file_content = []
resultant_list = []
repeated_element_list = []
with open(file="sam.txt", mode="r") as file_obj:
  file_content = file_obj.readlines()
  
print("\n debug the file content ",file_content)

for line in file_content:
  temp = line.strip('\n').split(" ")    # This will strip('\n') and split the line with spaces and stored as list
  for _ in temp:
    resultant_list.append(_)
  
print("\n debug resultant_list",resultant_list)

#Now this is the main for loop to check the string with the adjacent string
for ii in range(0, len(resultant_list)):
  # is_repeated will check the element count is greater than 1. If so it will proceed with identifying duplicate logic
  is_repeated = resultant_list.count(resultant_list[ii])
  if is_repeated > 1:
    if ii not in repeated_element_list:
      for2count = ii + 1
      #This for loop for shifting the iterator to the adjacent string
      for jj in range(for2count, len(resultant_list)):
        if resultant_list[ii] == resultant_list[jj]:
          repeated_element_list.append(resultant_list[ii])
          
print("The repeated strings are {}\n and total counts {}".format(repeated_element_list, len(repeated_element_list)))
debug the file content  ['Hello this is abdul hello\n', 'the data are Hello so you can move to the hello']

 debug resultant_list ['Hello', 'this', 'is', 'abdul', 'hello', 'the', 'data', 'are', 'Hello', 'so', 'you', 'can', 'move', 'to', 'the', 'hello']

The repeated strings are ['Hello', 'hello', 'the']
 and total counts 3