Python 3.x 如何在Python中读取循环中的多个文件并获得匹配单词的计数
我有两个文本文件和两个列表(FIRST_LIST,SCND_LIST),我想分别从FIRST_LIST,SCND_LIST中找出每个文件匹配单词的数量 第一张名单=Python 3.x 如何在Python中读取循环中的多个文件并获得匹配单词的计数,python-3.x,Python 3.x,我有两个文本文件和两个列表(FIRST_LIST,SCND_LIST),我想分别从FIRST_LIST,SCND_LIST中找出每个文件匹配单词的数量 第一张名单= "accessorizes","accessorizing","accessorized","accessorize" SCNDU列表= "accessorize","accessorized","accessorizes","accessorizing" 文本文件1包含: 这是一个很好的问题,你已经收到了很好的答案,描述了有趣
"accessorizes","accessorizing","accessorized","accessorize"
SCNDU列表=
"accessorize","accessorized","accessorizes","accessorizing"
文本文件1包含:
这是一个很好的问题,你已经收到了很好的答案,描述了有趣的话题,附加了附加的
文本文件2包含:
更多应用,使用附件,附件,附件
输出
File1 first list count=2
File1 second list count=0
File2 first list count=0
File2 second list count=4
这段代码我试图实现这个功能,但无法获得预期的输出。 如果有任何帮助,我们将不胜感激
import os
import glob
files=[]
for filename in glob.glob("*.txt"):
files.append(filename)
# remove Punctuations
import re
def remove_punctuation(line):
return re.sub(r'[^\w\s]', '', line)
two_files=[]
for filename in files:
for line in open(filename):
#two_files.append(remove_punctuation(line))
print(remove_punctuation(line),end='')
two_files.append(remove_punctuation(line))
FIRST_LIST = "accessorizes","accessorizing","accessorized","accessorize"
SCND_LIST="accessorize","accessorized","accessorizes","accessorizing"
c=[]
for match in FIRST_LIST:
if any(match in value for value in two_files):
#c=match+1
print (match)
c.append(match)
print(c)
len(c)
d=[]
for match in SCND_LIST:
if any(match in value for value in two_files):
#c=match+1
print (match)
d.append(match)
print(d)
len(d)
使用和一些列表理解是解决问题的许多不同方法之一
我假设,您的示例输出是错误的,因为有些单词是两个列表和两个文件的一部分,但没有被计算在内。此外,我在示例字符串中添加了第二行,以演示如何处理多行字符串,这些字符串可能是给定文件的典型内容
对象模拟您的文件,但使用文件系统中的真实文件的效果完全相同,因为两者都提供类似文件的对象或类似文件的界面:
from collections import Counter
list_a = ["accessorizes", "accessorizing", "accessorized", "accessorize"]
list_b = ["accessorize", "accessorized", "accessorizes", "accessorizing"]
# added a second line to each string just for the sake
file_contents_a = 'This is a very good question, and you have received good answers which describe interesting topics accessorized accessorize.\nThis is the second line in file a'
file_contents_b = 'is more applied,using accessorize accessorized,accessorizes,accessorizing\nThis is the second line in file b'
# using io.StringIO to simulate a file input (--> file-like object)
# you should use `with open(filename) as ...` for real file input
file_like_a = io.StringIO(file_contents_a)
file_like_b = io.StringIO(file_contents_b)
# read file contents and split lines into a list of strings
lines_of_file_a = file_like_a.read().splitlines()
lines_of_file_b = file_like_b.read().splitlines()
# iterate through all lines of each file (for file a here)
for line_number, line in enumerate(lines_of_file_a):
words = line.replace('.', ' ').replace(',', ' ').split(' ')
c = Counter(words)
in_list_a = sum([v for k,v in c.items() if k in list_a])
in_list_b = sum([v for k,v in c.items() if k in list_b])
print("Line {}".format(line_number))
print("- in list a {}".format(in_list_a))
print("- in list b {}".format(in_list_b))
# iterate through all lines of each file (for file b here)
for line_number, line in enumerate(lines_of_file_b):
words = line.replace('.', ' ').replace(',', ' ').split(' ')
c = Counter(words)
in_list_a = sum([v for k,v in c.items() if k in list_a])
in_list_b = sum([v for k,v in c.items() if k in list_b])
print("Line {}".format(line_number))
print("- in list a {}".format(in_list_a))
print("- in list b {}".format(in_list_b))
# actually, your two lists are the same
lists_are_equal = sorted(list_a) == sorted(list_b)
print(lists_are_equal)
“但无法获得预期的输出”:您当前的实际结果是什么?您的示例输出是错误的,因为
附加值
和附加值
是这两个列表的一部分,因此将由一致的代码计算。感谢albert感谢您的帮助,解决方案对我有效,因为我以*.txt格式获取数据(多个文件)我正在*.txt文件中获取数据,我想知道如何分配给它变量。如果我的答案解决了您的问题,请将其标记为已接受,以便结束您的问题。您希望为变量分配什么?如果这是一个更复杂的问题,请提交一个新问题。