Python 在dataframe中透视一列并创建4个新列
我正在使用熊猫数据帧。我有如下数据:Python 在dataframe中透视一列并创建4个新列,python,pandas,dataframe,pivot-table,Python,Pandas,Dataframe,Pivot Table,我正在使用熊猫数据帧。我有如下数据: df COUNTRY LINE PRODUCT SERVICE Argelia 1 1.0 Mobile Argelia 1 2.0 Mobile Argelia 1 3.0 Mobile Argelia 2 1.0 Mobile Argelia 3
df
COUNTRY LINE PRODUCT SERVICE
Argelia 1 1.0 Mobile
Argelia 1 2.0 Mobile
Argelia 1 3.0 Mobile
Argelia 2 1.0 Mobile
Argelia 3 2.0 Mobile
Argelia 3 3.0 Mobile
我想按行和透视产品列进行分组,但我需要4个产品列(PRODUCT_1、PRODUCT_2、PRODUCT_3和PRODUCT_4),不管是否有任何PRODUCT value=4
我正在尝试使用get_dummies
与此代码:
df = pd.concat([df, pd.get_dummies(dfs['PRODUCT'], prefix='product')], axis=1)
df.drop(['PRODUCT'], axis=1, inplace=True)
df = df.groupby(['COUNTRY', 'LINE', 'SERVICE']).agg({'product_1' : np.max, 'product_2': np.max, 'product_3':np.max, 'product_4':np.max}).reset_index()
但它只给我3列产品,我希望4列有这个数据框:
COUNTRY LINE SERVICE product_1 product_2 product_3 product_4
Argelia 1 Mobile 1 1 1 0
Argelia 2 Mobile 1 0 0 0
Argelia 3 Mobile 0 1 1 0
可能吗
(我也需要将产品值类型1.0更改为1)用于所有可能的产品的新列,这里有一个替代解决方案,我希望更快,对于最大1
值,rename
用于将浮点列转换为整数,以及reindex
:
cols = [f'product_{i}' for i in range(1, 5)]
df1 = (df.pivot_table(index=['COUNTRY', 'LINE', 'SERVICE'],
columns='PRODUCT',
fill_value=0,
aggfunc='size')
.clip(upper=1)
.rename(columns=int)
.add_prefix('product_')
.reindex(cols, axis=1, fill_value=0))
print (df1)
PRODUCT product_1 product_2 product_3 product_4
COUNTRY LINE SERVICE
Argelia 1 Mobile 1 1 1 0
2 Mobile 1 0 0 0
3 Mobile 0 1 1 0
df = pd.concat([df, pd.get_dummies(df.pop('PRODUCT').astype(int),prefix='product')], axis=1)
cols = [f'product_{i}' for i in range(1, 5)]
df = df.groupby(['COUNTRY', 'LINE', 'SERVICE']).max().reindex(cols, axis=1, fill_value=0)
print (df)
product_1 product_2 product_3 product_4
COUNTRY LINE SERVICE
Argelia 1 Mobile 1 1 1 0
2 Mobile 1 0 0 0
3 Mobile 0 1 1 0
在“用于提取”列的解决方案中,将其转换为整数,然后按max
进行聚合,然后添加reindex
:
cols = [f'product_{i}' for i in range(1, 5)]
df1 = (df.pivot_table(index=['COUNTRY', 'LINE', 'SERVICE'],
columns='PRODUCT',
fill_value=0,
aggfunc='size')
.clip(upper=1)
.rename(columns=int)
.add_prefix('product_')
.reindex(cols, axis=1, fill_value=0))
print (df1)
PRODUCT product_1 product_2 product_3 product_4
COUNTRY LINE SERVICE
Argelia 1 Mobile 1 1 1 0
2 Mobile 1 0 0 0
3 Mobile 0 1 1 0
df = pd.concat([df, pd.get_dummies(df.pop('PRODUCT').astype(int),prefix='product')], axis=1)
cols = [f'product_{i}' for i in range(1, 5)]
df = df.groupby(['COUNTRY', 'LINE', 'SERVICE']).max().reindex(cols, axis=1, fill_value=0)
print (df)
product_1 product_2 product_3 product_4
COUNTRY LINE SERVICE
Argelia 1 Mobile 1 1 1 0
2 Mobile 1 0 0 0
3 Mobile 0 1 1 0