Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/317.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/python-3.x/17.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何在每个单词出现频率之前写出文本文件的名称?_Python_Python 3.x_Dictionary_Frequency_Word Frequency - Fatal编程技术网

Python 如何在每个单词出现频率之前写出文本文件的名称?

Python 如何在每个单词出现频率之前写出文本文件的名称?,python,python-3.x,dictionary,frequency,word-frequency,Python,Python 3.x,Dictionary,Frequency,Word Frequency,如何在每个单词频率中写入文本文件名,以便它首先显示fileno,然后显示该文件中的单词频率。 例如: {like:['file1',2,'file2,'4']} 这两个文件都包含like这个词,我想在它们的频率之前写file1和file2。 对于任何数量的文件都应该是通用的。 这是我的密码 file_list = [open(file, 'r') for file in files] num_files = len(file_list) wordFreq = {}

如何在每个单词频率中写入文本文件名,以便它首先显示fileno,然后显示该文件中的单词频率。 例如: {like:['file1',2,'file2,'4']} 这两个文件都包含like这个词,我想在它们的频率之前写file1和file2。 对于任何数量的文件都应该是通用的。

这是我的密码

file_list = [open(file, 'r') for file in files] 
    num_files = len(file_list) 
    wordFreq = {}  
    for i, f in enumerate(file_list): 
        for line in f: 
            for word in line.lower().split():
                if not word in wordFreq:
                    wordFreq[word] = [0 for _ in range(num_files)]
                wordFreq[word][i] += 1

我知道我的代码并不漂亮,也不完全是您想要的,但它是一个解决方案。我更喜欢使用字典而不是像
['file1',2','file2','4']

让我们定义两个文件作为示例:

file1.txt:

this is an example
file2.txt:

this is an example
but multi line example
以下是解决方案:

from collections import Counter

filenames = ["file1.txt", "file2.txt"]

# First, find word frequencies in files
file_dict = {}
for filename in filenames:
    with open(filename) as f:
        text = f.read()
    words = text.split()

    cnt = Counter()
    for word in words:
        cnt[word] += 1
    file_dict[filename] = dict(cnt)

print("file_dict: ", file_dict)

#Then, calculate frequencies in files for each word 
word_dict = {}
for filename, words in file_dict.items():
    for word, count in words.items():
        if word not in word_dict.keys():
            word_dict[word] = {filename: count}
        else:
            if filename not in word_dict[word].keys():
                word_dict[word][filename] = count    
            else:
                word_dict[word][filename] += count


print("word_dict: ", word_dict)
输出:

file_dict:  {'file1.txt': {'this': 1, 'is': 1, 'an': 1, 'example': 1}, 'file2.txt': {'this': 1, 'is': 1, 'an': 1, 'example': 2, 'but': 1, 'multi': 1, 'line': 1}}
word_dict:  {'this': {'file1.txt': 1, 'file2.txt': 1}, 'is': {'file1.txt': 1, 'file2.txt': 1}, 'an': {'file1.txt': 1, 'file2.txt': 1}, 'example': {'file1.txt': 1, 'file2.txt': 2}, 'but': {'file2.txt': 1}, 'multi': {'file2.txt': 1}, 'line': {'file2.txt': 1}}

这是一个很好的用例;我建议为每个文件制作一个计数器

from collections import Counter

def make_counter(filename):
    cnt = Counter()

    with open(filename) as f:
        for line in f:                # read line by line, is more performant for big files
            cnt.update(line.split())  # split line by whitespaces and updated word counts

    print(filename, cnt)
    return cnt
此函数可用于每个文件,生成一个包含所有计数器的
dict

filename_list = ['f1.txt', 'f2.txt', 'f3.txt']
counter_dict = {                      # this will hold a counter for each file
    fn: make_counter(fn)
    for fn in filename_list}
现在可以使用
set
获取文件中出现的所有不同单词:

all_words = set(                      # this will hold all different words that appear
    word                              # in any of the files
    for cnt in counter_dict.values()
    for word in cnt.keys())
这些行打印每个单词以及每个文件中单词的计数:

for word in sorted(all_words):
    print(word)
    for fn in filename_list:
        print('  {}: {}'.format(fn, counter_dict[fn][word]))
显然,您可以根据自己的具体需要调整打印,但这种方法应该允许您获得所需的灵活性


如果您希望使用一个
dict
,将所有单词作为键,并将其计数作为值,您可以尝试以下操作:

all_words = {}

for fn, cnt in counter_dict.items():
    for word, n in cnt.items():
        all_words.setdefault(word, {}).setdefault(fn, 0)
        all_words[word][fn] += 0