python请求从Google Translate下载错误的声音文件_Python_Encoding_Python Requests_Google Translate

python请求从Google Translate下载错误的声音文件

python encoding

python请求从Google Translate下载错误的声音文件,python,encoding,python-requests,google-translate,Python,Encoding,Python Requests,Google Translate,我正在使用下面的脚本下载中文老師, 但当我运行它时，我得到的文件与该URL中的文件不同。我认为这是一个编码问题，但正如我指定的UTF-8，我不确定发生了什么 #!/usr/bin/python # -*- coding: utf-8 -*- import requests url = "http://translate.google.com/translate_tts?tl=zh-CN&q=老師" r = requests.get(url) with open('test.mp3

我正在使用下面的脚本下载中文老師, 但当我运行它时，我得到的文件与该URL中的文件不同。我认为这是一个编码问题，但正如我指定的UTF-8，我不确定发生了什么

#!/usr/bin/python
# -*- coding: utf-8 -*-

import requests

url = "http://translate.google.com/translate_tts?tl=zh-CN&q=老師"

r = requests.get(url)

with open('test.mp3', 'wb') as test:
    test.write(r.content)

更新：

根据@abarnert的建议，我已经检查了文件是否为带有BOM的UTF-8，并使用“idna”测试了代码

#!/usr/bin/python3
# -*- coding: utf-8 -*-

import requests

url_1 = "http://translate.google.com/translate_tts?tl=zh-CN&q=老師"
url_2 = "http://translate.google.com/translate_tts?tl=zh-CN&q=\u8001\u5e2b"

r_1 = requests.get(url_1)
r_1_b = requests.get(url_1.encode('idna'))
r_2 = requests.get(url_2)
r_2_b = requests.get(url_2.encode('idna'))

# This downloads nonsense:
with open('r_1.mp3', 'wb') as test:
    test.write(r_1.content)

# This throws the error specified at bottom:
with open('r_1_b.mp3', 'wb') as test:
    test.write(r_1_b.content)

# This parses the characters individually, producing
# a file consisting of "u, eight, zero..." in Mandarin
with open('r_2.mp3', 'wb') as test:
    test.write(r_2.content)

# This produces a sound file consisting of "u, eight, zero, zero..." in Mandarin
with open('r_2_b.mp3', 'wb') as test:
    test.write(r_2_b.content)

我得到的错误是：

Traceback (most recent call last):
  File "/home/MZ/Desktop/tts3.py", line 12, in <module>
    r_1_b = requests.get(url_1.encode('idna'))
  File "/usr/lib64/python2.7/encodings/idna.py", line 164, in encode
    result.append(ToASCII(label))
  File "/usr/lib64/python2.7/encodings/idna.py", line 76, in ToASCII
    label = nameprep(label)
  File "/usr/lib64/python2.7/encodings/idna.py", line 21, in nameprep
    newlabel.append(stringprep.map_table_b2(c))
  File "/usr/lib64/python2.7/stringprep.py", line 197, in map_table_b2
    b = unicodedata.normalize("NFKC", al)
TypeError: must be unicode, not str
[Finished in 15.3s with exit code 1]

回溯（最近一次呼叫最后一次）：
文件“/home/MZ/Desktop/tts3.py”，第12行，在
r_1_b=requests.get（url_1.encode（'idna'））
文件“/usr/lib64/python2.7/encodings/idna.py”，第164行，编码
结果.附加（ToASCII（标签））
文件“/usr/lib64/python2.7/encodings/idna.py”，第76行，在ToASCII中
标签=名称准备（标签）
文件“/usr/lib64/python2.7/encodings/idna.py”，第21行，在nameprep中
newlabel.append（stringprep.map_table_b2（c））
文件“/usr/lib64/python2.7/stringprep.py”，第197行，在map_table_b2中
b=unicodedata.normalize（“NFKC”，al）
TypeError:必须是unicode，而不是str
[以15.3秒完成，退出代码为1]

我已经能够在Linux和Windows上用Python 2重现您的问题（尽管我在这两个平台上得到的废话各不相同）。但我无法在Python3中复制它，而且我认为您实际上也没有复制它

简短的版本是：如果希望包含非ASCII字符，则始终希望使用Unicode字符串文字。在Python2上，这意味着一个

前缀（在Python3上，

前缀没有意义但无害）：

最安全的做法是（因为文本编辑器中的错误编码或编码声明不会影响任何内容）：

如果不这样做，您将向

请求传递大量UTF-8字节，而不告诉它它们是UTF-8
在这种情况下，我希望它能做的是查看sys.getdefaultencoding（）
，它至少在Mac和Linux上可能是“ascii”，尝试用它进行解码，并得到一个异常。在Windows上，它可能是“cp1252”或“big5”或任何您的系统设置，因此它可能会发送mojibake
但实际上它并没有这样做。我不确定它在做什么，但它正确地猜测了Mac上的UTF-8，做了一些奇怪的事情，导致Linux上出现三种不同的音调“eh-eh”（我认为它只是将字节解释为等效的代码点，所以老变成了U+00E8，U+0080，U+0081？），还有一些不同而奇怪的东西，以相同的第一个音节开始，但在Windows上有不同的音节
对于url\u 2
，它有点简单：在2.x非Unicode字符串文本中，\u8001
不被视为转义序列，它只是六个字符的反斜杠，u
，8
，0
，0，
1。其中
requests`将尽职尽责地发送给谷歌，谷歌将翻译这些字符，并在有人读出这些字符时发送回你
但是，如果添加u
前缀，这两个前缀都可以工作
在Python3中，无论是否使用u
前缀，它们都可以工作。（有趣的是，在3.x中，它甚至可以与b
前缀一起工作……但显然这只是因为它总是假设3.x中的字节是UTF-8；如果我给它Big5字节，它会将它们作为UTF-8，即使我的sys.getdefaultencoding
是正确的。）
此外，手动查询字符串编码查询也可以工作，但这不是必需的，也没有任何区别。
您在哪里指定了UTF-8？不是在你的代码，你的URL，你的源文件编码，或者我能看到的任何东西中。还有，这是Python2还是Python3？对不起，我漏掉了标题。我已经在2和3中试过了。好的，首先，你确定你的文本编辑器同意这是UTF-8吗？你能用\u8001\u5e2b
代替老師？第二，我确信requests知道如何进行IDNA编码，但为了排除这种可能性，您可以尝试requests.get（url.encode（'IDNA'））？请查看我对该问题的更新。谢谢
url = u"http://translate.google.com/translate_tts?tl=zh-CN&q=老師"

url_2 = u"http://translate.google.com/translate_tts?tl=zh-CN&q=\u8001\u5e2b"