Python awswrangler返回具有不同数据类型的数据帧_Python_Pandas_Amazon Web Services_Amazon S3

Python awswrangler返回具有不同数据类型的数据帧

python pandas amazon-web-services amazon-s3

Python awswrangler返回具有不同数据类型的数据帧,python,pandas,amazon-web-services,amazon-s3,Python,Pandas,Amazon Web Services,Amazon S3,我使用awswrangler将一个简单的数据框转换为拼花地板，将其推到s3桶中，然后再次读取。代码如下： import boto3 import awswrangler as wr import pandas as pd test_bucket = 'test-bucket' test_data = 'test_data.parquet' s3 = boto3.client('s3') df1 = pd.DataFrame( [[1990, 1], [2000, 2], [198

我使用awswrangler将一个简单的数据框转换为拼花地板，将其推到s3桶中，然后再次读取。代码如下：

import boto3
import awswrangler as wr
import pandas as pd
test_bucket = 'test-bucket'
test_data = 'test_data.parquet'
s3 = boto3.client('s3')
df1 = pd.DataFrame(
        [[1990, 1], [2000, 2], [1985, 6]], columns=["Feature1", "Feature2"]
    )
wr.s3.to_parquet(df=df1, path=f"s3://{test_bucket}/{test_data}")
raw_data_s3_objects = s3.list_objects(Bucket=test_bucket)

for path in raw_data_s3_objects["Contents"]:
    file_name = path["Key"]
    raw_dataset = wr.s3.read_parquet(path=f"s3://{test_bucket}/{file_name}")

当我打印原始数据帧（df1）和输出的数据帧（raw_数据集）时，我得到了（int64和int64）数据类型

这会导致数据帧不相等。这是一个bug还是我遗漏了什么？

首先，numpy和pandas类型之间存在差异

int64

代表numpy类型（

np.int64

），而

int64

代表pandas类型（

pd.Int64Dtype

）

这是在中报告的，这是有原因的，但这在realease 2.6.0中是“固定”的。现在，您可以通过使用

map\u types

参数（默认值为

True

，该参数进行您不希望的转换）：

print(df1.dtypes)
print(raw_dataset.dtypes)
Feature1    int64
Feature2    int64
dtype: object
Feature1    Int64
Feature2    Int64
dtype: object

raw_dataset = wr.s3.read_parquet(path=f"s3://{test_bucket}/{file_name}", map_types=False)