Python 建议代码改进:计算单词实例,同时忽略标点符号和常用单词

Python 建议代码改进:计算单词实例,同时忽略标点符号和常用单词,python,list,file,dictionary,tuples,Python,List,File,Dictionary,Tuples,本程序的要点是计算单词的出现次数,同时忽略标点符号、冠词和连词。所需的输出是使用的前15个单词和使用的后15个单词的列表,但不显示它们的出现情况。我是初学者,任何帮助都将不胜感激。谢谢 # This program reads a text file, performs a content analysis # and prints both a top 15 and a bottom 15 report name = input('Enter name of file: ') # Cle

本程序的要点是计算单词的出现次数,同时忽略标点符号、冠词和连词。所需的输出是使用的前15个单词和使用的后15个单词的列表,但不显示它们的出现情况。我是初学者,任何帮助都将不胜感激。谢谢

# This program reads a text file, performs a content analysis
# and prints both a top 15 and a bottom 15 report

name = input('Enter name of file: ')

 # Clean Function
def clean(s):
    punctuations = ["!","@","#","$"]
    art_con = ['the','a','an','some','and','but','or','nor','for']
    for each in punctuations:
        s = s.replace(each,"")
    words = s.split()
    resultwords = [word for word in words if word.lower() not in art_con]
    result= ''.join(resultwords)
    return result

# Analyze Function
def analyze(name):
    print('Reading',name,'for analysis...')
    print('===========================')
    print('Creating content analysis dictionary...')
    r = open(name, 'r')
    s = r.read()
    result = clean(s)
    count = dict((x,result.count(x)) for x in set(result))
    print('Analysis complete!')
    print('===================')
    return count

count = analyze(name)

# turn dictionary into a list of tuples to sort
def function(count):
    list1 = []
    for key in count:
        t = (count[key],key)
        list1.append((t))
    list1.sort()
    result = [list1[i] for i in range(len(list1))]
    t15 = result[0:15]
    b15 = result[-15:0]
    print("The top 15 words are ",t15)
    print("The bottom 15 words are ",b15)

#Main Function
def main():
    count = analyze(name)
    function(count)
main() 
我是初学者,任何帮助都将不胜感激

总的来说,代码看起来不错。clean()函数可以更快、更简洁,方法是:1)在开始时降低输入字符串的大小写;2)使用正则表达式提取单词,同时忽略标点符号;3)使用set difference操作消除常用单词

以下是一个粗略的切入点,让您开始:

words = re.findall(r"[a-z\'\-]+", s.lower())
return set(words) - {'the','a','an','some','and','but','or','nor','for'}

欢迎来到StackOverflow。请阅读并遵循帮助文档中的发布指南。适用于这里。在你发布你的MCVE代码并准确描述问题之前,我们无法有效地帮助你。好的……你的问题是什么?