基于Python/MySQL的管道中的字符编码问题_Python_Mysql_Python 3.x_Unicode_Character Encoding

基于Python/MySQL的管道中的字符编码问题

python mysql python-3.x unicode character-encoding

基于Python/MySQL的管道中的字符编码问题,python,mysql,python-3.x,unicode,character-encoding,Python,Mysql,Python 3.x,Unicode,Character Encoding,我使用以下组件开发的应用程序在编码/解码正常工作时遇到问题： Python 3.6 美丽之群使用UTF-8的废弃网页 MySQL json 兰姆达当我将数据放入前端（Alexa）时，它在某些情况下包含unicode字符（例如，\u00e2\u0080\u0099）。任何帮助都将不胜感激以下是整个管道中的代码片段：原始网页为：在Chrome开发者工具中选中document.characterSet 我正在使用以下Python/BeautifulSoup代码： from bs4 impo

我使用以下组件开发的应用程序在编码/解码正常工作时遇到问题：

Python 3.6

美丽之群

使用UTF-8的废弃网页

MySQL

json

兰姆达

当我将数据放入前端（Alexa）时，它在某些情况下包含unicode字符（例如，\u00e2\u0080\u0099）。任何帮助都将不胜感激

以下是整个管道中的代码片段：

原始网页为：在Chrome开发者工具中选中document.characterSet

我正在使用以下Python/BeautifulSoup代码：

from bs4 import BeautifulSoup
import pymysql
    if page_response.status_code == 200:
        page_content = BeautifulSoup(page_response.content, "html.parser")    
        if str(page_content.find(attrs={'id': 'main'})).find(page_test) != -1:
            for table_row in page_content.select("div#page_filling_chart center table tr"):
                cells = table_row.findAll('td')
                if cells:
                    records += 1
                    bo_entry.title = cells[2].text.strip()

使用以下命令将数据放入数据库：

connection = pymysql.connect(
        host=rds_host,
        user=name,
        password=password,
        db=db_name
        )
    try:
        with connection.cursor() as cursor:
            # UPSERT: https://chartio.com/resources/tutorials/how-to-insert-if-row-does-not-exist-upsert-in-mysql/
            sql = (
                    f"REPLACE INTO weekend_box_office(weekend_date, market, title_id, title,gross,total_gross,rank_order, previous_rank, distributor, distributor_id, change_pct, theaters, per_theater, week_in_release, gross_num, total_gross_num)"
                    f"VALUE(%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s);"
                )
            data = (
                    bo_entry.weekend, bo_entry.market, bo_entry.title_id, bo_entry.title, bo_entry.gross, bo_entry.total_gross, 
                    bo_entry.rank, bo_entry.previous_rank, bo_entry.distributor, bo_entry.distributor_id, bo_entry.change_pct, bo_entry.theaters,
                    bo_entry.per_theater, bo_entry.weeks_in_release, bo_entry.gross_num, bo_entry.total_gross_num
                    )
#            print(sql)

当前数据库排序规则和字符集设置为：

存储数据的MySQL表排序规则如下：

我使用以下Python 3.6代码从数据库获取数据：

connection = pymysql.connect(
        host=rds_host,
        user=name,
        password=password,
        db=db_name
        )

        with connection.cursor() as cursor:
            sql = (
                    f"select weekend_date, title_id, title, gross, gross_num, total_gross, total_gross_num, CONCAT(cast(ROUND(gross_num / total_gross_num * 100,1) as CHAR),'%') as weekend_pct, week_in_release "
                    "from weekend_box_office "
                    f"where weekend_date = '{weekend_text}' "
                    f"order by gross_num desc limit {limit_row_no}; "
                )
            try:
                cursor.execute(sql)
                result = cursor.fetchall()              
                for row in result:
                    title = row[2]

这就是我放置断点并在Spyder的变量资源管理器中检查它时的样子。

当我归还它时，它看起来是这样的：

使用此代码：答复(文本+)( f“由{title}引入${SpeechUtils.speaked_human_format（gross_num）}领导。” ) 返回响应文本

当我使用json Python库从Lambda返回它时，它如下所示：返回{ “状态代码”：200， “body”：json.dumps（说出top5（BoxOffice.get\u previous\u friday（））， “标题”：{ “内容类型”：“应用程序/json”， “访问控制允许来源”：“*”

}，

将mysql连接字符集更改为

charset='utf8'

后重试

connection = pymysql.connect(
    host=rds_host,
    user=name,
    password=password,
    db=db_name,
    charset='utf8'
    )

请参阅将mysql连接字符集更改为

charset='utf8'

后重试中的详细信息

connection = pymysql.connect(
    host=rds_host,
    user=name,
    password=password,
    db=db_name,
    charset='utf8'
    )

请参阅

中的详细信息注意：我做了更多的研究，确实阅读了这篇文章，发现在json.dumps（x，sure\u ascii=False）中添加参数“sure\u ascii=False”帮助。重音字符仍然存在问题，但它删除了Unicode符号。注意：我做了更多的研究，确实阅读了本文，发现在json中添加参数“Sure_ascii=False”。dumps（x，Sure_ascii=False）帮助。重音字符仍然存在问题，但它删除了Unicode符号。