Python 使用load_table_from_dataframe与数值数据类型发生Google BigQuery架构冲突（pyarrow错误）_Python_Pandas_Google Bigquery_Pyarrow

Python 使用load_table_from_dataframe与数值数据类型发生Google BigQuery架构冲突（pyarrow错误）

python pandas google-bigquery

Python 使用load_table_from_dataframe与数值数据类型发生Google BigQuery架构冲突（pyarrow错误）,python,pandas,google-bigquery,pyarrow,Python,Pandas,Google Bigquery,Pyarrow,当我将数值数据（int64或float64）从Pandas数据帧上载到“数值”Google BigQuery数据类型时，出现以下错误： pyarrow.lib.ArrowInvalid:获取长度为8（预期为16）的bytestring 我试图从Pandas dataframe更改“tt”字段的数据类型，但没有结果： df_data_f['tt'] = df_data_f['tt'].astype('float64') 及使用模式： job_config.schema = [

当我将数值数据（int64或float64）从Pandas数据帧上载到“数值”Google BigQuery数据类型时，出现以下错误：

pyarrow.lib.ArrowInvalid:获取长度为8（预期为16）的bytestring

我试图从Pandas dataframe更改“tt”字段的数据类型，但没有结果：

df_data_f['tt'] = df_data_f['tt'].astype('float64')

及

使用模式：

 job_config.schema = [
                    ...             
                    bigquery.SchemaField('tt', 'NUMERIC')
                    ...]

读到这里，我得到了：

NUMERIC=pyarrow.decimal128（38,9）

因此，“Numeric”Google BigQuery数据类型使用的字节比“float64”或“int64”多，这就是pyarrow无法匹配数据类型的原因

我有：

Python 3.6.4

熊猫1.0.3

pyarrow 0.17.0

谷歌云bigquery 1.24.0

我不确定这是否是最好的解决方案，但我通过更改数据类型解决了这个问题：

import decimal
...
df_data_f['tt'] = df_data_f['tt'].astype(str).map(decimal.Decimal)

我想这取决于您的用例。您是否有使用数字类型的特定原因？如果不是，并且您的数据已经是8个字节，则更简单的解决方案是直接将BigQuery中的数据类型设置为FLOAT64（如果是整数，则设置为INT64）！您是对的，这会更容易，但是表模式是在几个月前定义的，目前有很多数据。因此，问题是在不更改目标表模式的情况下追加新数据。

import decimal
...
df_data_f['tt'] = df_data_f['tt'].astype(str).map(decimal.Decimal)