Python从2.7升级到3.7：unicode错误_Python_Python 3.x_Python 2.7_Unicode

Python从2.7升级到3.7：unicode错误

python python-3.x python-2.7 unicode

Python从2.7升级到3.7：unicode错误,python,python-3.x,python-2.7,unicode,Python,Python 3.x,Python 2.7,Unicode,我正在将python代码从2.7更新到3.7。基本上，我尝试在google dataflow上运行一个管道，该管道从Big Query视图读取数据并进行转换，然后在另一个表中写回Big Query 但是，在更新是否使用unicode时出错：NameError:未定义名称“unicode” bq_source = beam.io.BigQuerySource(query=query, use_standard_sql=True) records =

我正在将python代码从2.7更新到3.7。基本上，我尝试在google dataflow上运行一个管道，该管道从Big Query视图读取数据并进行转换，然后在另一个表中写回Big Query

但是，在更新是否使用unicode时出错：NameError:未定义名称“unicode”

            bq_source = beam.io.BigQuerySource(query=query, use_standard_sql=True)
            records = (pipeline
                      | 'Read %s From BQ' % v.get('name') >> beam.io.Read(bq_source)
                      | 'BQ Create KV %s' % count >> beam.Map(lambda row: (row['value'].encode("utf-8"),
                                                                            {unicode(key).encode("utf-8"): unicode(
                                                                                value).encode("utf-8")
                                                                            for key, value in row.items()}))
                      | 'BQ Group By Key %s' % count >> beam.GroupByKey()
                      | 'BQ Calculate %s  Score' % v.get('name') >> beam.ParDo(ProcessDataDoFn(),
                                                                                    filter_id=v.get('filter_id'),
                                                                                    date=date)
                      )

如果我在Python2.7中运行与上面相同的代码，它运行得很好

过了一段时间，我尝试在python 3+中将unicode读取为str时更新代码-如果我将代码更新为将unicode替换为str。大查询中的文件未被读取，因此导致以后出现键错误：

            bq_source = beam.io.BigQuerySource(query=query, use_standard_sql=True)
            records = (pipeline
                      | 'Read %s From BQ' % v.get('name') >> beam.io.Read(bq_source)
                      | 'BQ Create KV %s' % count >> beam.Map(lambda row: (row['value'].encode("utf-8"),
                                                                            {str(key).encode("utf-8"): str(
                                                                                value).encode("utf-8")
                                                                            for key, value in row.items()}))
                      | 'BQ Group By Key %s' % count >> beam.GroupByKey()
                      | 'BQ Calculate %s Score' % v.get('name') >> beam.ParDo(ProcessDataDoFn(),

编辑1：

不编码更新代码-现在可以工作了

bq_source = beam.io.BigQuerySource(query=query, use_standard_sql=True)
                records = (pipeline
                           | 'Read %s From BQ' % v.get('name') >> beam.io.Read(bq_source)
                           | 'BQ Create KV %s' % count >> beam.Map(lambda row: (row['value'],
                                                                                {key:
                                                                                     value
                                                                                 for key, value in row.items()}))
                           | 'BQ Group By Key %s' % count >> beam.GroupByKey()
                           | 'BQ Calculate %s  Score' % v.get('name') >> beam.ParDo(ProcessDataDoFn(),
                                                                                         filter_id=v.get('filter_id'),
                                                                                         date=date)
                           )

蟒蛇38

在python27中

（，）

这种转换的目的显然是将unicode转换为str，这在python3中不再是一个相关的问题。相反，我们将其更改为字节，这是不兼容的。只需不编码，使用str（key）——或者只使用key，如果您已经知道它是unicode。

在调用

str

之前，键的类型是什么？它的int type@snakecharmerbhave您尝试过不编码吗？在python3中，它强制将其设置为字节，而在python2中，它只是将其设置为str。我会发布一个更详细的答案，但我没有谷歌大查询来测试。是的，它现在可以工作了。将您的更新作为您自己的答案，以便您可以将其标记为已解决。

s = 'hello'
u = u'hello'
b = u.encode('utf-8')
print (type(s), type(u), type(b))