Python UnicodeDecodeError:&x27;ascii';编解码器可以';t解码字节0x92?

Python UnicodeDecodeError:&x27;ascii';编解码器可以';t解码字节0x92?,python,character-encoding,python-unicode,Python,Character Encoding,Python Unicode,所以我试图从一个.txt文件中读取数据,然后找到最常见的30个单词并打印出来。但是,每当我读取txt文件时,我都会收到错误: "UnicodeDecodeError: 'ascii' codec can't decode byte 0x92 in position 338: ordinal not in range(128)". 这是我的密码: filename = 'wh_2015_national_security_strategy_obama.txt' #catches the year

所以我试图从一个.txt文件中读取数据,然后找到最常见的30个单词并打印出来。但是,每当我读取txt文件时,我都会收到错误:

"UnicodeDecodeError: 'ascii' codec can't decode byte 0x92 in position 338: ordinal not in range(128)".
这是我的密码:

filename = 'wh_2015_national_security_strategy_obama.txt'
#catches the year of named in the file
year = filename[0:4]
ecount = 30
#opens the file and reads it
file = open(filename,'r').read()   #THIS IS WHERE THE ERROR IS
#counts the characters, then counts the lines, replaces the non word characters, slipts the list and changes it all to lower case.
numchar = len(file)
numlines = file.count('\n')
file = file.replace(",","").replace("'s","").replace("-","").replace(")","")
words = file.lower().split()
dictionary = {}
#this is a dictionary of all the words to not count for the most commonly used. 
dontcount = {"the", "of", "in", "to", "a", "and", "that", "we", "our", "is", "for", "at", "on", "as", "by", "be", "are", "will","this", "with", "or",
             "an", "-", "not", "than", "you", "your", "but","it","a","and", "i", "if","they","these","has","been","about","its","his","no"
             "because","when","would","was", "have", "their","all","should","from","most", "were","such","he", "very","which","may","because","--------"
             "had", "only", "no", "one", "--------", "any", "had", "other", "those", "us", "while",
             "..........", "*", "$", "so", "now","what", "who", "my","can", "who","do","could", "over", "-",
             "...............","................", "during","make","************",
             "......................................................................", "get", "how", "after",
             "..................................................", "...........................", "much", "some",
             "through","though","therefore","since","many", "then", "there", "–", "both", "them", "well", "me", "even", "also", "however"}
for w in words:
    if not w in dontcount:
        if w in dictionary:
            dictionary[w] +=1
        else:
            dictionary[w] = 1
num_words = sum(dictionary[w] for w in dictionary)
#This sorts the dictionary and makes it so that the most popular is at the top.
x = [(dictionary[w],w) for w in dictionary]
x.sort()
x.reverse()
#This prints out the number of characters, line, and words(not including stop words.
print(str(filename))
print('The file has ',numchar,' number of characters.')
print('The file has ',numlines,' number of lines.')
print('The file has ',num_words,' number of words.')
#This provides the stucture for how the most common words should be printed out
i = 1
for count, word in x[:ecount]:
    print("{0}, {1}, {2}".format(i,count,word))
    i+=1

我正在学习python,所以请记住这个回答

file=open(filename,'r').read()#这就是错误所在

根据我到目前为止学到的知识,您的阅读与open()对象创建相结合。open()函数创建文件句柄,read()函数将文件读入字符串。这两个函数都将返回success/fail,或者在open()函数的情况下返回file对象引用的一部分。我不确定它们能否成功结合

到目前为止,与我所学到的不同,这需要分两步完成。 i、 e

file=open(文件名为'r')#创建对象 myString=file.read()#将整个对象读入字符串

函数的作用是:创建文件对象,因此可能会返回对象号或success/fail

对象上使用read、read(n)、readline()或readlines()函数

.read将整个文件读入单个字符串 .read(n)将接下来的n个字节读入字符串 .readline()将下一行读入字符串 .readline()将整个文件读入字符串列表


你可以把它们分开,看看是否会出现同样的结果???一个新手的想法是:)

在Python3中,当以文本模式(默认)打开文件时,Python使用您的环境设置来选择适当的编码

如果它不能解析它(或者您的环境专门定义了ASCII),那么它将使用ASCII。这就是你的情况

如果ASCII解码器发现任何不是ASCII的内容,那么它将抛出一个错误。在您的例子中,它在字节0x92上抛出了一个错误。这不是有效的ASCII,也不是有效的UTF-8。但是,它在
windows-1252
编码中是有意义的,因为它是
(智能引号/“右单引号”)。它在其他8位代码页中也有意义,但您必须自己知道或解决这个问题

要使代码读取windows-1252编码文件,需要将
open()
命令更改为:

file = open(filename, 'r', encoding='windows-1252').read()

可能重复&请参阅我链接到的帖子和,特别是它的
编码
参数。对于Python2来说,
open
的“新”版本在中。PS:这个字节很可能是一个非标准的(微软)右单引号,经常被误用为“卷曲”撇号。这不是上面提到的——所有这些问题和答案都与Python 2有关。没有人能帮助OP解决与Python 3的TextIOWrapper抛出异常有关的非常简单的问题,必须通过选择正确的编码来纠正该问题在读取局部变量之前将类似文件的对象分配给该局部变量不会改变该文件的内容,也不会改变它们如何从字节转换为字符串,这就是造成这场灾难的原因。请参阅
编码
错误
参数,以及它返回的(“文本文件”)的各种
读取
相关方法。