使用Python逐行比较两个文本文件_Python_Count_Line_Frequency

使用Python逐行比较两个文本文件

python

使用Python逐行比较两个文本文件,python,count,line,frequency,Python,Count,Line,Frequency,我有两个文本文件要比较。第一个文件包含唯一的项，第二个文件包含相同的项，但重复多次。我想看看第二个文件中每行重复了多少次。这是我写的： import os import sys f1 = open('file1.txt') # this has the 27 unique lines, f1data = f1.readlines() f2 = open('file2.txt') # this has lines repeated various times, with a total

我有两个文本文件要比较。第一个文件包含唯一的项，第二个文件包含相同的项，但重复多次。我想看看第二个文件中每行重复了多少次。这是我写的：

import os
import sys

f1 = open('file1.txt')  # this has the 27 unique lines, 
f1data = f1.readlines()

f2 = open('file2.txt')  # this has lines repeated various times, with a total of 11162 lines
f2data = f2.readlines()

sys.stdout = open("linecount.txt", "w")


for line1 in f1data:
    linecount = 0
    for line2 in f2data:
        if line1 in line2:
        linecount+=1

    print line2, crime

问题是，当我将行计数结果相加时，它返回11586，而不是11162。行数增加的原因是什么

有没有其他方法可以使用Python获得线路频率输出？

对于Unicode和字符串类型，当且仅当x是y的子字符串时，

y中的x

为真

而不是

    if line1 in line2:

我想你是想写信

    if line1 == line2:

或者可以替换整个

for line2 in f2data:
    if line1 in line2:
        linecount+=1

路障

if line1 in f2data:
    linecount += 1

即使我们稍微修改一下代码，它也不起作用。我从这个代码中得到了一些更好的结果

>> import os
>> import sys

>> f1 = open('hmd4.csv')   
>> f2 = open('svm_words.txt')  

>> linecount = 0

>> for word1 in f1.read().split("."):
>>     for word2 in f2.read().split("\n"):
>>         if word1 in word2:
>>             linecount+=1 
>>             print (linecount)

仍然无法理解为什么它只比较一个值如何将每个单词与两个文本文件中的另一个单词进行比较，以便我们可以计算相同的字符串