计算文本文件Python中特定单词的列表_Python

计算文本文件Python中特定单词的列表

python

计算文本文件Python中特定单词的列表,python,Python,我想计算特定单词连词的出现次数：也，虽然，和，as，因为，之前，但是，for，if，nor，of，or，since，that，tough，when，where，while，while，while，还有来自txt文件的标点符号这就是我所做的： def count(fname, words_list): if fname: try: file = open(str(fname), 'r') full_text = file.readlines()

我想计算特定单词连词的出现次数：也，虽然，和，as，因为，之前，但是，for，if，nor，of，or，since，that，tough，when，where，while，while，while，还有来自txt文件的标点符号

这就是我所做的：

def count(fname, words_list):
if fname:
    try:
        file = open(str(fname), 'r')
        full_text = file.readlines()
        file.close()
        count_result = dict()
        for word in words_list:
            for text in full_text:
                if word in count_result:
                    count_result[word] = count_result[word] + text.count(word)
                else:
                    count_result[word] = text.count(word)
        return count_result
    except:
        print('Something really bad just happened!')

print(count('sample2.txt', ["also", "although", "and", "as", "because", "before", "but", "for", "if", "nor", "of",
"or", "since", "that", "though", "until", "when", "whenever", "whereas",
"which", "while", "yet", ",", ";", "-", "'"]))

但它所做的是，它被算作，我该如何修复它，或者是否有其他方法来归档它？谢谢

预期输出类似于：

{'allow'：0，'althouse'：0，'and'：27，'as'：2，'before'：2，'but'：4，'for'：2，'if'：2，'nor'：0，'of'：13，'or'：2，'before'：0，'that'：10，'虽然'：2，'until'：0，'when'：3，'while'：0，'while'：0，'while'：0，'yet'：0，'yet'：0，'yet'，'41，'；3，'t'：1，'17，'words"单词/句子：25.4286，'s

在2.7和3.1中，对于您试图实现的目标有一个特殊的定义

因为您还没有发布任何示例输出。我想给你一个方法，你可以使用。维护一个列表。在列表中添加您需要的单词。例如，如果您也接近该单词，请将其附加到列表中

>>> l.append("also")
>>> l
['also']

同样，您会遇到单词“尽管”，列表变成：

>>> l.append("although")
>>> l
['also', 'although']

['also', 'although', 'also']

如果您再次遇到，请再次将其添加到上面的列表中

该列表变为：

>>> l.append("although")
>>> l
['also', 'although']

['also', 'although', 'also']

现在使用计数器计算列表元素的出现次数：

>>> l = ['also', 'although', 'also']
>>> result = Counter(l)
>>> l
['also', 'although', 'also']
>>> result
Counter({'also': 2, 'although': 1})

str.countsub函数计算子字符串sub。当您希望它计算为时，它在单词was中找到它，然后在找到时增加子字符串的数量

您可以在这里使用regex来指定希望作为完整单词，而不是另一个单词的子字符串。标志\b表示一个单词的结尾。

我会做一些简单的事情，比如检查文件中的单词，并检查它们是否都在要计数的单词列表中。在这种情况下，在该单词的输入处向计数器字典添加1

    # get all the words in the file
    word_list_in_text = file.read().split()
        count_result={}
        for word in words_list_in_text:
            #check if each word in the file is in your target list
            if word is in word_list:
                if word not in count_result:
                    count_result[word] = 1
                else:
                    count_result[word] += 1
    print(count_result)

如果要在一个函数中执行此操作：

def word_count(fname, word_list, punctaction):
    count_w = dict()
    for w in word_list:
        count_w[w] = 0

    count_p = dict()
    for p in punctaction:
        count_p[p] = 0

    with open(fname) as input_text:
        text = input_text.read()
        words = text.lower().split()
        for word in words:
            _word = word.strip('.,:-)()')
            if _word in count_w:
                count_w[_word] +=1

        for c in text:
            if c in punctaction:
                count_p[c] +=1

    count_w.update(count_p)
    return count_w




print(word_count('c_prog.txt', ["also", "although", "and", "as", "because", "before", "but", "for", "if", "nor", "of", "or", "since", "that",
                                "though", "until", "when", "whenever", "whereas", "which", "while", "yet"], [",", ";", "-", "'"]))

样本输入和预期输出？我在那里上传了3个样本，用于测试预期输出如下：{'allow'：0，'althouse'：0，'and'：27，'as'：2，'before'：2，'but'：4，'for'：2，'if'：2，'nor'：0，'of'：13，'or'：2，'before'：0，'that'：10，'虽然'：2，'until'：0，'when'：3，'while'：0，'while'：0，'while'：0，'yet'：0，'yet'：0，'yet'，'41，'；3，'t'：1，'17，'words"单词/句子：25.4286，'s我如何组合这两个函数，以便获得预期回报？