Can'；无法使用唯一的单词/短语计数器-Python_Python_Shell_Keyword

Can'；无法使用唯一的单词/短语计数器-Python

python shell

Can'；无法使用唯一的单词/短语计数器-Python,python,shell,keyword,Python,Shell,Keyword,我很难在outut文件（word\u count.txt）中写入任何内容我希望脚本能够查看myphrases.txt文档中的所有500个短语，并输出所有单词及其出现次数的列表 from re import findall,sub from os import listdir from collections import Counter # path to folder containg all the files str_dir_folder = '

我很难在outut文件（word\u count.txt）中写入任何内容

我希望脚本能够查看myphrases.txt文档中的所有500个短语，并输出所有单词及其出现次数的列表

    from re import findall,sub
    from os import listdir
    from collections import Counter

    # path to folder containg all the files
    str_dir_folder = '../data'

    # name and location of output file
    str_output_file = '../data/word_count.txt'

    # the list where all the words will be placed
    list_file_data = '../data/phrases.txt'

    # loop through all the files in the directory
    for str_each_file in listdir(str_dir_folder):
        if str_each_file.endswith('data'):

    # open file and read
    with open(str_dir_folder+str_each_file,'r') as file_r_data:
        str_file_data = file_r_data.read()

    # add data to list
    list_file_data.append(str_file_data)

    # clean all the data so that we don't have all the nasty bits in it
    str_full_data = ' '.join(list_file_data)
    str_clean1 = sub('t','',str_full_data)
    str_clean_data = sub('n',' ',str_clean1)

    # find all the words and put them into a list
    list_all_words = findall('w+',str_clean_data)

    # dictionary with all the times a word has been used
    dict_word_count = Counter(list_all_words)

    # put data in a list, ready for output file
    list_output_data = []
    for str_each_item in dict_word_count:
        str_word = str_each_item
        int_freq = dict_word_count[str_each_item]

        str_out_line = '&quot;%s&quot;,%d' % (str_word,int_freq)

        # populates output list
        list_output_data.append(str_out_line)

    # create output file, write data, close it
    file_w_output = open(str_output_file,'w')
    file_w_output.write('n'.join(list_output_data))
    file_w_output.close()

任何帮助都会很好（特别是如果我能够在输出列表中实际输出“单个”单词的话）

非常感谢。

如果我们获得更多信息，例如您尝试了什么以及您收到了什么类型的错误消息，那将非常有用。正如kaveh在上面所评论的，此代码存在一些重大缩进问题。一旦我解决了这些问题，还有许多其他逻辑错误需要解决。我做了一些假设：

列表文件数据被分配给“../data/phrases.txt”，但有一个循环遍历目录中的所有文件。因为您对在其他地方有多个文件，我已经删除了该逻辑并引用了文件列在列表文件数据中（并添加了一点错误如果你想浏览一个目录，我建议你使用os.walk（）（）
您将文件命名为“pharses.txt”，但随后检查文件是否最后是“数据”。我已经删除了这个逻辑
当findall可以很好地处理字符串并忽略手动删除的特殊字符时，您已将数据集放入列表中。请在此处测试：确保
将“w+”更改为“\w+”-请查看上面的链接
不需要转换到输出循环之外的列表-您的dict\u word\u count是一个计数器对象，它有一个“iteritems”方法来滚动每个键和值。还将变量名更改为“Counter\u word\u count”以稍微精确一点
我没有手动生成csv，而是导入csv并使用writerow方法（和引用选项）

代码如下，希望对您有所帮助：

import csv
import os

from collections import Counter
from re import findall,sub


# name and location of output file
str_output_file = '../data/word_count.txt'
# the list where all the words will be placed
list_file_data = '../data/phrases.txt'

if not os.path.exists(list_file_data):
    raise OSError('File {} does not exist.'.format(list_file_data))

with open(list_file_data, 'r') as file_r_data:
    str_file_data = file_r_data.read()
    # find all the words and put them into a list
    list_all_words = findall('\w+',str_file_data)
    # dictionary with all the times a word has been used
    counter_word_count = Counter(list_all_words)

    with open(str_output_file, 'w') as output_file:
        fieldnames = ['word', 'freq']
        writer = csv.writer(output_file, quoting=csv.QUOTE_ALL)
        writer.writerow(fieldnames)

        for key, value in counter_word_count.iteritems():
            output_row = [key, value]
            writer.writerow(output_row)

像这样的

from collections import Counter
from glob import glob

def extract_words_from_line(s):
    # make this as complicated as you want for extracting words from a line
    return s.strip().split()

tally = sum(
    (Counter(extract_words_from_line(line)) 
        for infile in glob('../data/*.data')
            for line in open(infile)), 
     Counter())

for k in sorted(tally, key=tally.get, reverse=True):
    print k, tally[k]

您粘贴的代码中存在缩进问题。使用语句缩进

中的行，将它们放入循环中。嘿，西蒙，看起来您可能是新手。如果您觉得有答案解决了问题，请单击绿色复选标记将其标记为“已接受”。这有助于将重点放在仍然没有答案的较旧的SO上。不是吗hanks@robertrodkey全部完成。祝你周末愉快。谢谢你，Robert，帮了大忙。剧本现在很完美。