Python 如何根据文本文件的关联值从文本文件中提取公共行？_Python

Python 如何根据文本文件的关联值从文本文件中提取公共行？

python

Python 如何根据文本文件的关联值从文本文件中提取公共行？,python,Python,我有3个文本文件，如下所示： List1.txt： 032_M5，5 035_M9，5 036_M4，3 038平方米，6 041_M1，6 List2.txt： 032_M5,6 035_M9，6 036_M4，5 038平方米，5 041_M1，6 List3.txt： 032_M5,6 035_M9，6 036_M4，4 038平方米，5 041_M1，6 其中，所有3个文本文件中的行的第一部分（即字符串）相同，但第二部分（即编号）发生变化我想从中获取三个输出文件： Output1

我有3个文本文件，如下所示：

List1.txt：

032_M5，5
035_M9，5
036_M4，3
038平方米，6
041_M1，6

List2.txt：

032_M5,6
035_M9，6
036_M4，5
038平方米，5
041_M1，6

List3.txt：

032_M5,6
035_M9，6
036_M4，4
038平方米，5
041_M1，6

其中，所有3个文本文件中的行的第一部分（即字符串）相同，但第二部分（即编号）发生变化

我想从中获取三个输出文件：

Output1.txt-->数字对应字符串的所有行都不同。例如：

036_M4 3,5,4

Output2.txt-->数字对应字符串的所有行都相同。例如：

041_M1，6

Output3.txt-->至少两个数字对应一个字符串的所有行都相同（其中也包括Output2.txt的结果）。例如：

032_M5,6
035_M9，6
038平方米，5
041_M1，6

然后我需要Output3.txt中包含数字1、数字2、数字3、数字4、数字5和数字6的行数

这是我试过的。它给了我错误的输出

从集合导入defaultdict
数据=默认DICT（列表）
对于[“List1.txt”、“List2.txt”、“List3.txt”中的文件名：
将open（fileName，'r'）作为文件1：
对于文件1中的行：
col1，value=line.split（“，”）
数据[col1]。追加（int（value））
以open（“Output3.txt”、“w”）作为输出：
对于（col1），data.items（）中的值：
如果len（值）<3：继续
结果=最大值（x表示x的值）
write（f“{col1}，{result}\n”）

max

给出列表中最大的数字，而不是最常见的数字。为此，请使用

statistics.mode

from collections import defaultdict
from statistics import mode

data = defaultdict(list)
for fileName in ["List1.txt","List2.txt", "List3.txt"]:
    with open(fileName,'r') as file1:
        for line in file1:
            col1,value = line.split(",") 
            data[col1].append(int(value))

with open("Output1.txt","w") as output:
    for (col1),values in data.items():
        if len(values) < 3: continue
        if values[0] != values[1] != values[2] and values[0] != values[2]:
            output.write(f"{col1}, {values[0]}, {values[1]}, {values[2]}\n")

with open("Output2.txt","w") as output:
    for (col1),values in data.items():
        if len(values) < 3: continue
        if values[0] == values[1] == values[2]:
            output.write(f"{col1}, {values[0]}\n")

with open("Output3.txt","w") as output:
    for (col1),values in data.items():
        if len(values) < 3: continue
        if len(set(values)) >= 2:
            output.write(f"{col1}, {mode(values)}\n")

从集合导入defaultdict
从统计导入模式
数据=默认DICT（列表）
对于[“List1.txt”、“List2.txt”、“List3.txt”中的文件名：
将open（fileName，'r'）作为文件1：
对于文件1中的行：
col1，value=line.split（“，”）
数据[col1]。追加（int（value））
以open（“Output1.txt”、“w”）作为输出：
对于（col1），data.items（）中的值：
如果len（值）<3：继续
如果值为[0]！=值[1]！=值[2]和值[0]！=数值[2]：
write（f“{col1}，{values[0]}，{values[1]}，{values[2]}\n”）
以open（“Output2.txt”、“w”）作为输出：
对于（col1），data.items（）中的值：
如果len（值）<3：继续
如果值[0]==值[1]==值[2]：
write（f“{col1}，{values[0]}\n”）
以open（“Output3.txt”、“w”）作为输出：
对于（col1），data.items（）中的值：
如果len（值）<3：继续
如果len（设置（值））>=2：
output.write（f“{col1}，{mode（values）}\n”）

max

给出列表中最大的数字，而不是最常见的数字。为此，请使用

statistics.mode

from collections import defaultdict
from statistics import mode

data = defaultdict(list)
for fileName in ["List1.txt","List2.txt", "List3.txt"]:
    with open(fileName,'r') as file1:
        for line in file1:
            col1,value = line.split(",") 
            data[col1].append(int(value))

with open("Output1.txt","w") as output:
    for (col1),values in data.items():
        if len(values) < 3: continue
        if values[0] != values[1] != values[2] and values[0] != values[2]:
            output.write(f"{col1}, {values[0]}, {values[1]}, {values[2]}\n")

with open("Output2.txt","w") as output:
    for (col1),values in data.items():
        if len(values) < 3: continue
        if values[0] == values[1] == values[2]:
            output.write(f"{col1}, {values[0]}\n")

with open("Output3.txt","w") as output:
    for (col1),values in data.items():
        if len(values) < 3: continue
        if len(set(values)) >= 2:
            output.write(f"{col1}, {mode(values)}\n")

从集合导入defaultdict
从统计导入模式
数据=默认DICT（列表）
对于[“List1.txt”、“List2.txt”、“List3.txt”中的文件名：
将open（fileName，'r'）作为文件1：
对于文件1中的行：
col1，value=line.split（“，”）
数据[col1]。追加（int（value））
以open（“Output1.txt”、“w”）作为输出：
对于（col1），data.items（）中的值：
如果len（值）<3：继续
如果值为[0]！=值[1]！=值[2]和值[0]！=数值[2]：
write（f“{col1}，{values[0]}，{values[1]}，{values[2]}\n”）
以open（“Output2.txt”、“w”）作为输出：
对于（col1），data.items（）中的值：
如果len（值）<3：继续
如果值[0]==值[1]==值[2]：
write（f“{col1}，{values[0]}\n”）
以open（“Output3.txt”、“w”）作为输出：
对于（col1），data.items（）中的值：
如果len（值）<3：继续
如果len（设置（值））>=2：
output.write（f“{col1}，{mode（values）}\n”）

以下是一种不使用任何python模块的方法，它完全依赖于本机内置的python函数：

with open("List1.txt", "r") as list1, open("List2.txt", "r") as list2, open("List3.txt", "r") as list3:
  # Forming association between keywords and numbers.
  data1 = list1.readlines()
  totalKeys = [elem.split(',')[0] for elem in data1]
  numbers1 = [elem.split(',')[1].strip() for elem in data1]
  numbers2 = [elem.split(',')[1].strip() for elem in list2.readlines()]
  numbers3 = [elem.split(',')[1].strip() for elem in list3.readlines()]
  totalValues = list(zip(numbers1,numbers2,numbers3))
  totalDict = dict(zip(totalKeys,totalValues))

  #Outputs
  output1 = []
  output2 = []
  output3 = []
  for key in totalDict.keys():
    #Output1
    if len(set(totalDict[key])) == 3:
      output1.append([key, totalDict[key]])
    #Output2
    if len(set(totalDict[key])) == 1:
      output2.append([key, totalDict[key][0]])
    #Output3
    if len(set(totalDict[key])) <= 2:
      output3.append([key, max(totalDict[key], key=lambda elem: totalDict[key].count(elem))])

  #Output1
  print('Output1:')
  for elem in output1:
    print(elem[0] + ' ' + ", ".join(elem[1]))
  print()

  #Output2
  print('Output2:')
  for elem in output2:
    print(elem[0] + ' ' + " ".join(elem[1]))
  print()

  #Output3
  print('Output3:')
  for elem in output3:
    print(elem[0] + ' ' + " ".join(elem[1]))

以下是一种不使用任何python模块的方法，它完全依赖于本机内置python函数：

with open("List1.txt", "r") as list1, open("List2.txt", "r") as list2, open("List3.txt", "r") as list3:
  # Forming association between keywords and numbers.
  data1 = list1.readlines()
  totalKeys = [elem.split(',')[0] for elem in data1]
  numbers1 = [elem.split(',')[1].strip() for elem in data1]
  numbers2 = [elem.split(',')[1].strip() for elem in list2.readlines()]
  numbers3 = [elem.split(',')[1].strip() for elem in list3.readlines()]
  totalValues = list(zip(numbers1,numbers2,numbers3))
  totalDict = dict(zip(totalKeys,totalValues))

  #Outputs
  output1 = []
  output2 = []
  output3 = []
  for key in totalDict.keys():
    #Output1
    if len(set(totalDict[key])) == 3:
      output1.append([key, totalDict[key]])
    #Output2
    if len(set(totalDict[key])) == 1:
      output2.append([key, totalDict[key][0]])
    #Output3
    if len(set(totalDict[key])) <= 2:
      output3.append([key, max(totalDict[key], key=lambda elem: totalDict[key].count(elem))])

  #Output1
  print('Output1:')
  for elem in output1:
    print(elem[0] + ' ' + ", ".join(elem[1]))
  print()

  #Output2
  print('Output2:')
  for elem in output2:
    print(elem[0] + ' ' + " ".join(elem[1]))
  print()

  #Output3
  print('Output3:')
  for elem in output3:
    print(elem[0] + ' ' + " ".join(elem[1]))

实际上，

max（）

允许一个

key

参数，该参数指定用于查找max的函数。例如，

max（x，key=lambda x:len（x））

将给出一个列表列表，其中包含最多元素的列表，

@complezabot是的，您可以使用

max（x，key=lambda y:x.count（y））找到模式

实际上

max（）

允许一个

key

参数，该参数指定用于查找最大值的函数。例如，

max（x，key=lambda x:len（x））

将给出一个列表列表中包含最多元素的列表，

@complezabot是的，您可以使用

max（x，key=lambda y:x.count（y））找到模式

您可以查看集合，尤其是计数函数。在

most_common（1）[0]

中，您可以查看集合，尤其是计数函数。和最常见的（1）[0]