Python 数据透视中具有多个值的时间序列数据

Python 数据透视中具有多个值的时间序列数据,python,pandas,dataframe,Python,Pandas,Dataframe,我有几个snsposts的URL,每天都在跟踪它的readCount,likeCount,commentCount等 这是我使用pandas获得的数据帧: post_url nickname date_key readCount likeCount commentCount 5 a_url user1 2020-06-12 2874.0 194 NaN 4 a_url

我有几个sns
post
s的URL,每天都在跟踪它的
readCount
likeCount
commentCount

这是我使用pandas获得的数据帧:

     post_url       nickname    date_key  readCount  likeCount  commentCount
5       a_url          user1  2020-06-12     2874.0        194           NaN
4       a_url          user1  2020-06-13     4030.0        208          48.0
6       a_url          user1  2020-06-14        NaN        220          48.0
7       a_url          user1  2020-06-15        NaN        223          48.0
0       b_url          user2  2020-06-13    16882.0        295          88.0
2       b_url          user2  2020-06-14        NaN        296          88.0
3       b_url          user2  2020-06-15        NaN        299          88.0
我想要达到的结果是(请不要介意实际值,因为它们是即兴创作的):

请注意,每个
post
都有不同的
date\u键的子集,我的目标是将所有现有的
date\u键组合到列中

关于这个主题,我已经尝试过搜索,但是没有找到完全相同的用例

你能给我一个实现这个目标的方法吗? 谢谢。

用于取消PIVOT,然后用于聚合的一般解决方案,方法是
平均值
如果可能,每列的url、昵称、类型、日期都有重复项

df = (df.melt(['post_url','nickname','date_key'], var_name='type')
        .dropna(subset=['value'])
        .pivot_table(index=['post_url','nickname','type'], 
                     columns='date_key', 
                     values='value', 
                     aggfunc='mean')
        .rename_axis(None, axis=1)
        .reset_index())
print (df)
  post_url nickname          type  2020-06-12  2020-06-13  2020-06-14  \
0    a_url    user1  commentCount         NaN        48.0        48.0   
1    a_url    user1     likeCount       194.0       208.0       220.0   
2    a_url    user1     readCount      2874.0      4030.0         NaN   
3    b_url    user2  commentCount         NaN        88.0        88.0   
4    b_url    user2     likeCount         NaN       295.0       296.0   
5    b_url    user2     readCount         NaN     16882.0         NaN   

   2020-06-15  
0        48.0  
1       223.0  
2         NaN  
3        88.0  
4       299.0  
5         NaN  
另一个想法,如果不需要与和进行聚合:


这太神奇了。非常感谢你。只是一个简单的问题:
.dropna(subset=['value'])
中的
'value'
是指什么?@ShinhongPark-这是从
melt
创建的新列,从
type
s列。
df = (df.melt(['post_url','nickname','date_key'], var_name='type')
        .dropna(subset=['value'])
        .pivot_table(index=['post_url','nickname','type'], 
                     columns='date_key', 
                     values='value', 
                     aggfunc='mean')
        .rename_axis(None, axis=1)
        .reset_index())
print (df)
  post_url nickname          type  2020-06-12  2020-06-13  2020-06-14  \
0    a_url    user1  commentCount         NaN        48.0        48.0   
1    a_url    user1     likeCount       194.0       208.0       220.0   
2    a_url    user1     readCount      2874.0      4030.0         NaN   
3    b_url    user2  commentCount         NaN        88.0        88.0   
4    b_url    user2     likeCount         NaN       295.0       296.0   
5    b_url    user2     readCount         NaN     16882.0         NaN   

   2020-06-15  
0        48.0  
1       223.0  
2         NaN  
3        88.0  
4       299.0  
5         NaN  
df = (df.set_index(['post_url','nickname','date_key'])
        .stack()
        .unstack(2)
        .rename_axis(index=['post_url','nickname','type'], columns=None)
        .reset_index()
        )
print (df)
  post_url nickname          type  2020-06-12  2020-06-13  2020-06-14  \
0    a_url    user1     readCount      2874.0      4030.0         NaN   
1    a_url    user1     likeCount       194.0       208.0       220.0   
2    a_url    user1  commentCount         NaN        48.0        48.0   
3    b_url    user2     readCount         NaN     16882.0         NaN   
4    b_url    user2     likeCount         NaN       295.0       296.0   
5    b_url    user2  commentCount         NaN        88.0        88.0   

   2020-06-15  
0         NaN  
1       223.0  
2        48.0  
3         NaN  
4       299.0  
5        88.0