Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/309.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
计算文本文件Python中特定单词的列表_Python - Fatal编程技术网

计算文本文件Python中特定单词的列表

计算文本文件Python中特定单词的列表,python,Python,我想计算特定单词连词的出现次数:也,虽然,和,as,因为,之前,但是,for,if,nor,of,or,since,that,tough,when,where,while,while,while,还有来自txt文件的标点符号 这就是我所做的: def count(fname, words_list): if fname: try: file = open(str(fname), 'r') full_text = file.readlines()

我想计算特定单词连词的出现次数:也,虽然,和,as,因为,之前,但是,for,if,nor,of,or,since,that,tough,when,where,while,while,while,还有来自txt文件的标点符号

这就是我所做的:

def count(fname, words_list):
if fname:
    try:
        file = open(str(fname), 'r')
        full_text = file.readlines()
        file.close()
        count_result = dict()
        for word in words_list:
            for text in full_text:
                if word in count_result:
                    count_result[word] = count_result[word] + text.count(word)
                else:
                    count_result[word] = text.count(word)
        return count_result
    except:
        print('Something really bad just happened!')

print(count('sample2.txt', ["also", "although", "and", "as", "because", "before", "but", "for", "if", "nor", "of",
"or", "since", "that", "though", "until", "when", "whenever", "whereas",
"which", "while", "yet", ",", ";", "-", "'"]))
但它所做的是,它被算作,我该如何修复它,或者是否有其他方法来归档它?谢谢

预期输出类似于:


{'allow':0,'althouse':0,'and':27,'as':2,'before':2,'but':4,'for':2,'if':2,'nor':0,'of':13,'or':2,'before':0,'that':10,'虽然':2,'until':0,'when':3,'while':0,'while':0,'while':0,'yet':0,'yet':0,'yet','41,';3,'t':1,'17,'words"单词/句子:25.4286,'s

在2.7和3.1中,对于您试图实现的目标有一个特殊的定义

因为您还没有发布任何示例输出。我想给你一个方法,你可以使用。维护一个列表。在列表中添加您需要的单词。例如,如果您也接近该单词,请将其附加到列表中

>>> l.append("also")
>>> l
['also']
同样,您会遇到单词“尽管”,列表变成:

>>> l.append("although")
>>> l
['also', 'although']
['also', 'although', 'also']
如果您再次遇到,请再次将其添加到上面的列表中

该列表变为:

>>> l.append("although")
>>> l
['also', 'although']
['also', 'although', 'also']
现在使用计数器计算列表元素的出现次数:

>>> l = ['also', 'although', 'also']
>>> result = Counter(l)
>>> l
['also', 'although', 'also']
>>> result
Counter({'also': 2, 'although': 1})

str.countsub函数计算子字符串sub。当您希望它计算为时,它在单词was中找到它,然后在找到时增加子字符串的数量


您可以在这里使用regex来指定希望作为完整单词,而不是另一个单词的子字符串。标志\b表示一个单词的结尾。

我会做一些简单的事情,比如检查文件中的单词,并检查它们是否都在要计数的单词列表中。在这种情况下,在该单词的输入处向计数器字典添加1

    # get all the words in the file
    word_list_in_text = file.read().split()
        count_result={}
        for word in words_list_in_text:
            #check if each word in the file is in your target list
            if word is in word_list:
                if word not in count_result:
                    count_result[word] = 1
                else:
                    count_result[word] += 1
    print(count_result)
如果要在一个函数中执行此操作:

def word_count(fname, word_list, punctaction):
    count_w = dict()
    for w in word_list:
        count_w[w] = 0

    count_p = dict()
    for p in punctaction:
        count_p[p] = 0

    with open(fname) as input_text:
        text = input_text.read()
        words = text.lower().split()
        for word in words:
            _word = word.strip('.,:-)()')
            if _word in count_w:
                count_w[_word] +=1

        for c in text:
            if c in punctaction:
                count_p[c] +=1

    count_w.update(count_p)
    return count_w




print(word_count('c_prog.txt', ["also", "although", "and", "as", "because", "before", "but", "for", "if", "nor", "of", "or", "since", "that",
                                "though", "until", "when", "whenever", "whereas", "which", "while", "yet"], [",", ";", "-", "'"]))

样本输入和预期输出?我在那里上传了3个样本,用于测试预期输出如下:{'allow':0,'althouse':0,'and':27,'as':2,'before':2,'but':4,'for':2,'if':2,'nor':0,'of':13,'or':2,'before':0,'that':10,'虽然':2,'until':0,'when':3,'while':0,'while':0,'while':0,'yet':0,'yet':0,'yet','41,';3,'t':1,'17,'words"单词/句子:25.4286,'s我如何组合这两个函数,以便获得预期回报?