Python 创建包含计算的容器和数据框
我想创建一个新的数据框,其中列“X”将聚集在10个偶数箱中。然后,每年需要计算每个集群的总和:'R'*'X',其中'R'是'h' 编辑 所需最终结果的示例: bins/2012/2013/2014/总年份/总数量 0<1.5/15/8/5/28/7新猜测Python 创建包含计算的容器和数据框,python,pandas,bins,Python,Pandas,Bins,我想创建一个新的数据框,其中列“X”将聚集在10个偶数箱中。然后,每年需要计算每个集群的总和:'R'*'X',其中'R'是'h' 编辑 所需最终结果的示例: bins/2012/2013/2014/总年份/总数量 0
import pandas as pd
import numpy as np
import random
import string
N = 100
J = [2012,2013,2014]
K = ['A','B','C','D','E','F','G','H']
L = ['h','d','a']
df = pd.DataFrame(
np.random.uniform(1,10,size=(N, 3)),
columns=list('XYZ')
)
df['ht'] = pd.Series(random.choice(K) for _ in range(N))
df['at'] = pd.Series(random.choice(K) for _ in range(N))
df['J'] = pd.Series(random.choice(J) for _ in range(N))
df['R'] = pd.Series(random.choice(L) for _ in range(N))
df1 = (df.X).groupby([df.ht, df.J]).agg(['sum', 'size']).unstack(fill_value=0)
print(df.head())
输出
df_agg = df.groupby([pd.cut(df.X, 10), 'R', 'J'])['X'].agg(['sum', 'size'])\
.unstack('J', fill_value=0)\
.reset_index('R')
df_agg = df_agg.loc[df_agg['R'] == 'h']
df_agg['total_sum_years'] = df_agg['sum'].sum(1)
df_agg['total_number_h'] = df_agg['size'].sum(1)
我在这里打一针。还是不确定你到底想要什么。我首先重新编辑了您的数据帧创建以使其更好
R sum size \
J 2012 2013 2014 2012 2013 2014
X
(1.203, 2.0842] h 1.421185 2.660724 3.380401 1 2 2
(2.0842, 2.956] h 4.984133 5.044891 0.000000 2 2 0
(2.956, 3.828] h 0.000000 3.190256 6.644137 0 1 2
(3.828, 4.7] h 4.086577 0.000000 0.000000 1 0 0
(4.7, 5.572] h 0.000000 9.595351 0.000000 0 2 0
(5.572, 6.444] h 11.659066 6.037559 12.452256 2 1 2
(6.444, 7.316] h 6.535510 0.000000 6.820929 1 0 1
(7.316, 8.188] h 0.000000 0.000000 23.259200 0 0 3
(8.188, 9.0605] h 8.944386 8.352764 25.645607 1 1 3
(9.0605, 9.933] h 18.863608 29.606962 9.222994 2 3 1
total_sum_years total_number_h
J
X
(1.203, 2.0842] 7.462311 5
(2.0842, 2.956] 10.029024 4
(2.956, 3.828] 9.834393 3
(3.828, 4.7] 4.086577 1
(4.7, 5.572] 9.595351 2
(5.572, 6.444] 30.148881 5
(6.444, 7.316] 13.356440 2
(7.316, 8.188] 23.259200 3
(8.188, 9.0605] 42.942758 5
(9.0605, 9.933] 57.693565 6
然后,我将X与pd.cut分组,并将J与ht分组
N = 100
J = [2012,2013,2014]
K = ['A','B','C','D','E','F','G','H']
L = ['h','d','a']
df = pd.DataFrame(
{'X': np.random.uniform(1,10,N),
'Y': np.random.uniform(1,10,N),
'Z': np.random.uniform(1,10,N),
'ht':np.random.choice(K, N),
'at':np.random.choice(K, N),
'J':np.random.choice(J, N),
'R':np.random.choice(L, N)
})
这产生了以下结果。我想这会让你接近你想要的
df.groupby([pd.cut(df.X, 10), 'ht', 'J'])['X'].sum().unstack('J', fill_value=0)
如果你花时间更好地解释/展示你想要的,你会有答案的。事实上,我不知道您希望按X的存储箱对哪个数据帧进行聚类。我已经编辑了这个问题,希望这有帮助。
plt.hist(df.X.values)
将返回存储箱以及每个存储箱的计数。这就是你的意思吗?@Zanshin在你的例子中,你按“ht”分组。你的意思是按“R”@Ted分组,按箱子分组,当R为“h”@piRSquared Ha时,在年份列X中求和!谢谢
J 2012 2013 2014
X ht
(1.203, 2.0842] A 2.076946 1.360880 1.544429
B 0.000000 0.000000 1.434798
C 0.000000 1.313596 1.835972
D 0.000000 1.212149 1.280920
F 0.000000 0.000000 3.545768
H 1.421185 1.299844 0.000000
(2.0842, 2.956] A 2.331453 0.000000 0.000000
B 0.000000 2.489030 0.000000
C 5.689538 2.555860 0.000000
D 5.338711 0.000000 0.000000
H 0.000000 0.000000 6.428545
(2.956, 3.828] A 0.000000 6.692342 0.000000
B 0.000000 0.000000 3.211878
C 0.000000 3.673353 3.062432
D 0.000000 0.000000 3.432259
E 3.789112 0.000000 3.612064
G 0.000000 3.190256 3.117251
(3.828, 4.7] E 8.758016 0.000000 4.302206
G 4.086577 0.000000 0.000000
(4.7, 5.572] A 5.268921 0.000000 0.000000
B 0.000000 4.845556 0.000000
C 0.000000 4.990270 5.078201
E 0.000000 4.749795 0.000000
F 0.000000 0.000000 5.260480
G 4.811551 0.000000 0.000000
H 0.000000 0.000000 4.817087
.....