Python 数据透视中具有多个值的时间序列数据
我有几个snsPython 数据透视中具有多个值的时间序列数据,python,pandas,dataframe,Python,Pandas,Dataframe,我有几个snsposts的URL,每天都在跟踪它的readCount,likeCount,commentCount等 这是我使用pandas获得的数据帧: post_url nickname date_key readCount likeCount commentCount 5 a_url user1 2020-06-12 2874.0 194 NaN 4 a_url
post
s的URL,每天都在跟踪它的readCount
,likeCount
,commentCount
等
这是我使用pandas获得的数据帧:
post_url nickname date_key readCount likeCount commentCount
5 a_url user1 2020-06-12 2874.0 194 NaN
4 a_url user1 2020-06-13 4030.0 208 48.0
6 a_url user1 2020-06-14 NaN 220 48.0
7 a_url user1 2020-06-15 NaN 223 48.0
0 b_url user2 2020-06-13 16882.0 295 88.0
2 b_url user2 2020-06-14 NaN 296 88.0
3 b_url user2 2020-06-15 NaN 299 88.0
我想要达到的结果是(请不要介意实际值,因为它们是即兴创作的):
请注意,每个post
都有不同的date\u键的子集,我的目标是将所有现有的date\u键组合到列中
关于这个主题,我已经尝试过搜索,但是没有找到完全相同的用例
你能给我一个实现这个目标的方法吗?
谢谢。用于取消PIVOT,然后用于聚合的一般解决方案,方法是平均值
如果可能,每列的url、昵称、类型、日期都有重复项:
df = (df.melt(['post_url','nickname','date_key'], var_name='type')
.dropna(subset=['value'])
.pivot_table(index=['post_url','nickname','type'],
columns='date_key',
values='value',
aggfunc='mean')
.rename_axis(None, axis=1)
.reset_index())
print (df)
post_url nickname type 2020-06-12 2020-06-13 2020-06-14 \
0 a_url user1 commentCount NaN 48.0 48.0
1 a_url user1 likeCount 194.0 208.0 220.0
2 a_url user1 readCount 2874.0 4030.0 NaN
3 b_url user2 commentCount NaN 88.0 88.0
4 b_url user2 likeCount NaN 295.0 296.0
5 b_url user2 readCount NaN 16882.0 NaN
2020-06-15
0 48.0
1 223.0
2 NaN
3 88.0
4 299.0
5 NaN
另一个想法,如果不需要与和进行聚合:
这太神奇了。非常感谢你。只是一个简单的问题:.dropna(subset=['value'])
中的'value'
是指什么?@ShinhongPark-这是从melt
创建的新列,从type
s列。
df = (df.melt(['post_url','nickname','date_key'], var_name='type')
.dropna(subset=['value'])
.pivot_table(index=['post_url','nickname','type'],
columns='date_key',
values='value',
aggfunc='mean')
.rename_axis(None, axis=1)
.reset_index())
print (df)
post_url nickname type 2020-06-12 2020-06-13 2020-06-14 \
0 a_url user1 commentCount NaN 48.0 48.0
1 a_url user1 likeCount 194.0 208.0 220.0
2 a_url user1 readCount 2874.0 4030.0 NaN
3 b_url user2 commentCount NaN 88.0 88.0
4 b_url user2 likeCount NaN 295.0 296.0
5 b_url user2 readCount NaN 16882.0 NaN
2020-06-15
0 48.0
1 223.0
2 NaN
3 88.0
4 299.0
5 NaN
df = (df.set_index(['post_url','nickname','date_key'])
.stack()
.unstack(2)
.rename_axis(index=['post_url','nickname','type'], columns=None)
.reset_index()
)
print (df)
post_url nickname type 2020-06-12 2020-06-13 2020-06-14 \
0 a_url user1 readCount 2874.0 4030.0 NaN
1 a_url user1 likeCount 194.0 208.0 220.0
2 a_url user1 commentCount NaN 48.0 48.0
3 b_url user2 readCount NaN 16882.0 NaN
4 b_url user2 likeCount NaN 295.0 296.0
5 b_url user2 commentCount NaN 88.0 88.0
2020-06-15
0 NaN
1 223.0
2 48.0
3 NaN
4 299.0
5 88.0