Python 将.DataFrame转换为字节_Python_Numpy_Pandas_Type Conversion_Dataframe

Python 将.DataFrame转换为字节

python numpy pandas dataframe

Python 将.DataFrame转换为字节,python,numpy,pandas,type-conversion,dataframe,Python,Numpy,Pandas,Type Conversion,Dataframe,我需要将存储在.DataFrame中的数据转换成字节字符串，其中每列可以有单独的数据类型（整数或浮点）。以下是一组简单的数据： df = pd.DataFrame([ 10, 15, 20], dtype='u1', columns=['a']) df['b'] = np.array([np.iinfo('u8').max, 230498234019, 32094812309], dtype='u8') df['c'] = np.array([1.324e10, 3.14159, 234.134

我需要将存储在.DataFrame中的数据转换成字节字符串，其中每列可以有单独的数据类型（整数或浮点）。以下是一组简单的数据：

df = pd.DataFrame([ 10, 15, 20], dtype='u1', columns=['a'])
df['b'] = np.array([np.iinfo('u8').max, 230498234019, 32094812309], dtype='u8')
df['c'] = np.array([1.324e10, 3.14159, 234.1341], dtype='f8')

df看起来像这样：

    a            b                  c
0   10  18446744073709551615    1.324000e+10
1   15  230498234019            3.141590e+00
2   20  32094812309             2.341341e+02

data_to_pack = [tuple(record) for _, record in df.iterrows()]
data_array = np.array(data_to_pack, dtype=zip(df.columns, df.dtypes))
data_bytes = data_array.tostring()

DataFrame

知道每个列的类型

df.dtypes

，因此我想做如下操作：

    a            b                  c
0   10  18446744073709551615    1.324000e+10
1   15  230498234019            3.141590e+00
2   20  32094812309             2.341341e+02

data_to_pack = [tuple(record) for _, record in df.iterrows()]
data_array = np.array(data_to_pack, dtype=zip(df.columns, df.dtypes))
data_bytes = data_array.tostring()

这通常可以正常工作，但在这种情况下（由于

df['b'][0]

中存储的最大值）。上面的第二行将元组数组转换为具有给定类型集的

np.array

，会导致以下错误：

OverflowError: Python int too large to convert to C long

错误结果（我相信）出现在第一行，该行将记录提取为具有单一数据类型（默认为

float64

）的

Series

，并且在

float64

中为最大

uint64

值选择的表示形式不能直接转换回

uint64

1）既然

DataFrame

已经知道每个列的类型，那么有没有办法绕过创建一行元组来输入类型化的

numpy.array

构造函数？或者有没有比上面所述更好的方法来保存这种转换中的类型信息

2）是否有一种方法可以使用每列的类型信息直接从

数据帧

转换为表示数据的字节字符串。

您可以使用将数据帧转换为numpy重新排列，然后调用

.tostring（）

将其转换为字节字符串：

rec = df.to_records(index=False)

print(repr(rec))
# rec.array([(10, 18446744073709551615, 13240000000.0), (15, 230498234019, 3.14159),
#  (20, 32094812309, 234.1341)], 
#           dtype=[('a', '|u1'), ('b', '<u8'), ('c', '<f8')])

s = rec.tostring()
rec2 = np.fromstring(s, rec.dtype)

print(np.all(rec2 == rec))
# True

rec=df.to_记录（index=False）
打印（报告（记录））
#记录数组（[（1018446744073709551615113240000000.0），（15230498234019，3.14159），
#  (20, 32094812309, 234.1341)], 
#数据类型=[（'a'，'u1'），（'b'，'