Python 使用数组元组中的列构建数据框架_Python_Pandas_Numpy_Dataframe

Python 使用数组元组中的列构建数据框架

python pandas numpy dataframe

Python 使用数组元组中的列构建数据框架,python,pandas,numpy,dataframe,Python,Pandas,Numpy,Dataframe,我正在努力完成从np.unique（arr，return\u counts=True）生成的元组中构建按值计数的数据帧的基本任务，例如： import numpy as np import pandas as pd np.random.seed(123) birds=np.random.choice(['African Swallow','Dead Parrot','Exploding Penguin'], size=int(5e4)) someTuple=np.unique(birds,

我正在努力完成从

np.unique（arr，return\u counts=True）

生成的元组中构建按值计数的数据帧的基本任务，例如：

import numpy as np
import pandas as pd

np.random.seed(123)  
birds=np.random.choice(['African Swallow','Dead Parrot','Exploding Penguin'], size=int(5e4))
someTuple=np.unique(birds, return_counts = True)
someTuple
#(array(['African Swallow', 'Dead Parrot', 'Exploding Penguin'], 
#       dtype='<U17'), array([16510, 16570, 16920], dtype=int64))

我还尝试了

pd.DataFrame.from_records（someTuple）

，它返回相同的内容

但我要找的是：

#              birdType      birdCount
# 0     African Swallow          16510  
# 1         Dead Parrot          16570  
# 2   Exploding Penguin          16920

正确的语法是什么？

您可以使用计数器

from collections import Counter

c = Counter(birds)

>>> pd.Series(c)
African Swallow      16510
Dead Parrot          16570
Exploding Penguin    16920
dtype: int64

您还可以对序列使用

value\u counts

>>> pd.Series(birds).value_counts()
Exploding Penguin    16920
Dead Parrot          16570
African Swallow      16510
dtype: int64

使用元组，可以执行以下操作：

In [4]: pd.DataFrame(list(zip(*someTuple)), columns = ['Bird', 'BirdCount'])
Out[4]: 
                Bird  BirdCount
0    African Swallow      16510
1        Dead Parrot      16570
2  Exploding Penguin      16920

这里有一个基于NumPy的解决方案-

或与-

基准测试

np.transpose

，

np.column\u stack

和

np.vstack

用于将

1D

数组固定到列中以形成

2D

数组-

In [54]: tup1 = (np.random.rand(1000),np.random.rand(1000))

In [55]: %timeit np.transpose(tup1)
100000 loops, best of 3: 15.9 µs per loop

In [56]: %timeit np.column_stack(tup1)
100000 loops, best of 3: 11 µs per loop

In [57]: %timeit np.vstack(tup1).T
100000 loops, best of 3: 14.1 µs per loop

创建字典

pd.DataFrame(dict(birdType=someTuple[0], birdCount=someTuple[1]))

不错。我需要更频繁地使用带有关键字参数的普通字典构造函数。它真的很方便。渴望峡湾！这些都是非常快速的numpy解决方案，正是我所寻找的。另一个同样快速的答案是另一个用户给出的但随后被删除的

pd.DataFrame（np.transpose（someTuple），columns=['birdType'，'birdCount']）

。@C8H10N4O2在这三个方法上添加了一些计时，看起来都一样快。您的第二个方法是通过附加的“.T”功能实现的：pd.DataFrame.from\u records（某元组）

pd.DataFrame(np.vstack(someTuple).T,columns=['birdType','birdCount'])

In [54]: tup1 = (np.random.rand(1000),np.random.rand(1000))

In [55]: %timeit np.transpose(tup1)
100000 loops, best of 3: 15.9 µs per loop

In [56]: %timeit np.column_stack(tup1)
100000 loops, best of 3: 11 µs per loop

In [57]: %timeit np.vstack(tup1).T
100000 loops, best of 3: 14.1 µs per loop

pd.DataFrame(dict(birdType=someTuple[0], birdCount=someTuple[1]))