Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/285.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 将groupby转换为具有新列的单行_Python_Python 3.x_Pandas_Pandas Groupby - Fatal编程技术网

Python 将groupby转换为具有新列的单行

Python 将groupby转换为具有新列的单行,python,python-3.x,pandas,pandas-groupby,Python,Python 3.x,Pandas,Pandas Groupby,我希望能够将一个groupby转换为一行,但是如果没有足够的数据,则将该groupby中第二列的值聚合为新列或-99 使用此输入按会话\u id分组后: user_id session_id timestamp step impressions n_clicks 0 004A07DM0IDW 1d688ec168932 1541555799 7 2059240 5.0 1 004A07DM0

我希望能够将一个groupby转换为一行,但是如果没有足够的数据,则将该groupby中第二列的值聚合为新列或-99

使用此输入按会话\u id分组后:

             user_id     session_id   timestamp  step  impressions   n_clicks
0       004A07DM0IDW  1d688ec168932  1541555799     7      2059240        5.0
1       004A07DM0IDW  1d688ec168932  1541555799     7      2033381        3.0
2       004A07DM0IDW  1d688ec168932  1541555799     7      1724779        4.0
3       004A07DM0IDW  1d688ec168932  1541555799     7       127131        2.0
4       004A07DM0IDW  1d688ec168932  1541555799     7       399441        1.0
5       004A07DM0IDW  1d688ec168932  1541555799     7       103357        3.0
6       004A07DM0IDW  1d688ec168932  1541555799     7       127132        3.0
7       004A07DM0IDW  1d688ec168932  1541555799     7      1167004        1.0
8       004A07DM0IDW  1d688ec168932  1541555799     7      4491766        4.0
9       004A07DM0IDW  1d688ec168932  1541555799     7      2249874        5.0
10      00Y1Z24X8084  26b6d294d66e7  1541651823     3      4476010        4.0
11      00Y1Z24X8084  26b6d294d66e7  1541651823     3      3843244        5.0
我想生产这个产品

             user_id     session_id   timestamp  step  count_0 count_1 count_2 count... count_24
0       004A07DM0IDW  1d688ec168932  1541555799     7      5.0     3.0    4.0    2.0         -99
1       00Y1Z24X8084  26b6d294d66e7  1541555799     3      4.0     5.0    -99    -99         -99
我们看到的是,
用户id
会话id
时间戳
步骤
对于每一行总是相同的。然而,印象是不同的。对于每一行(最多25行),单击列中的值映射到一个
count\u x
,但是,如果行数不够,后续值将取-99

由于第一个groupby帧中有10行,这意味着列
count\u 10
count\u 24
的值将为-99。对于第二个groupby框架列,
count_2
count_24
将具有-99。

使用:

cols = ['user_id','session_id','timestamp','step']
df['g'] = df.groupby(cols).cumcount()
df = (df.set_index(cols + ['g'])['n_clicks']
        .unstack(fill_value=-99)
        .reindex(range(25), fill_value=-99, axis=1)
        .add_prefix('count_')
        .reset_index()
        .rename_axis(None, axis=1))
print (df)
        user_id     session_id   timestamp  step  count_0  count_1  count_2  \
0  004A07DM0IDW  1d688ec168932  1541555799     7      5.0      3.0      4.0   
1  00Y1Z24X8084  26b6d294d66e7  1541651823     3      4.0      5.0    -99.0   

   count_3  count_4  count_5  ...  count_15  count_16  count_17  count_18  \
0      2.0      1.0      3.0  ...       -99       -99       -99       -99   
1    -99.0    -99.0    -99.0  ...       -99       -99       -99       -99   

   count_19  count_20  count_21  count_22  count_23  count_24  
0       -99       -99       -99       -99       -99       -99  
1       -99       -99       -99       -99       -99       -99  

[2 rows x 29 columns]
说明

  • 为计数器创建列
  • 创建多索引依据和重塑依据
  • 范围(25)
    按添加缺少的列
  • 按名称重命名列
  • 最后一次清洁-使用