Python 数据帧组内整数系列的上采样
我的问题是如何为数据帧中的多个“分组”中的每一个向上采样一个int系列。(在我的案例中,针对每个“团队”和“LeadWeek”分组) 我看到了内置函数和许多对timeseries进行上采样的示例,但没有看到对整数进行上采样的示例。出于各种原因,我现在不想讨论,我想用整数而不是时间序列来实现这一点 在我的例子中,我有“团队”和“领导周”,我想对每个“团队”和“领导周”组合的“转换周”进行抽样,将其增加为[0,1,2,3,4] 我认为有一种方法可以通过Python 数据帧组内整数系列的上采样,python,pandas,resampling,Python,Pandas,Resampling,我的问题是如何为数据帧中的多个“分组”中的每一个向上采样一个int系列。(在我的案例中,针对每个“团队”和“LeadWeek”分组) 我看到了内置函数和许多对timeseries进行上采样的示例,但没有看到对整数进行上采样的示例。出于各种原因,我现在不想讨论,我想用整数而不是时间序列来实现这一点 在我的例子中,我有“团队”和“领导周”,我想对每个“团队”和“领导周”组合的“转换周”进行抽样,将其增加为[0,1,2,3,4] 我认为有一种方法可以通过多索引/groupby+重采样()实现这一点,但
多索引
/groupby
+重采样()
实现这一点,但经过几个小时的修补,我还没有足够的智慧来解决这个问题。向这里的智者寻求帮助
下面是示例数据框:
df = pd.DataFrame([
['Team A', pd.datetime(2017, 12, 1), 0, 2]
,['Team A', pd.datetime(2017, 12, 1), 2, 1]
,['Team A', pd.datetime(2017, 12, 1), 4, 1]
,['Team A', pd.datetime(2017, 12, 8), 3, 2]
,['Team B', pd.datetime(2017, 12, 1), 0, 1]
,['Team B', pd.datetime(2017, 12, 1), 2, 3]
,['Team B', pd.datetime(2017, 12, 8), 1, 3]
,['Team B', pd.datetime(2017, 12, 8), 3, 2]
]
, columns=['Team', 'LeadWeek', 'ConversionWeek', 'Conversions']
)
我的期望输出如下,每个团队/LeadWeek分组有5行“ConversionWeek”,编号为0到4:
Team LeadWeek ConversionWeek Conversions
0 Team A 2017-12-01 0 2.0
1 Team A 2017-12-01 1 0.0
2 Team A 2017-12-01 2 1.0
3 Team A 2017-12-01 3 0.0
4 Team A 2017-12-01 4 1.0
5 Team A 2017-12-08 0 0.0
6 Team A 2017-12-08 1 0.0
7 Team A 2017-12-08 2 0.0
8 Team A 2017-12-08 3 2.0
9 Team A 2017-12-08 4 0.0
10 Team B 2017-12-01 0 1.0
11 Team B 2017-12-01 1 0.0
12 Team B 2017-12-01 2 3.0
13 Team B 2017-12-01 3 0.0
14 Team B 2017-12-01 4 0.0
15 Team B 2017-12-08 0 0.0
16 Team B 2017-12-08 1 3.0
17 Team B 2017-12-08 2 0.0
18 Team B 2017-12-08 3 2.0
19 Team B 2017-12-08 4 0.0
我确实有一个解决方案,但它不是很像蟒蛇。这与我在SQL中的解决方法相同,即使用所有不同元素的笛卡尔积创建一个“脚手架”,然后将我的实际转换连接到它。在Python中,此方法使用itertools.product()
我的解决办法是:
import pandas as pd
import numpy as np
import itertools as it
df = pd.DataFrame([
['Team A', pd.datetime(2017, 12, 1), 0, 2]
,['Team A', pd.datetime(2017, 12, 1), 2, 1]
,['Team A', pd.datetime(2017, 12, 1), 4, 1]
,['Team A', pd.datetime(2017, 12, 8), 3, 2]
,['Team B', pd.datetime(2017, 12, 1), 0, 1]
,['Team B', pd.datetime(2017, 12, 1), 2, 3]
,['Team B', pd.datetime(2017, 12, 8), 1, 3]
,['Team B', pd.datetime(2017, 12, 8), 3, 2]
]
, columns=['Team', 'LeadWeek', 'ConversionWeek', 'Conversions']
)
ConversionWeek = np.linspace(0, 4, 5, dtype=int)
Team = df['Team'].unique()
LeadWeek = df['LeadWeek'].unique()
scaffold_raw = []
for i in it.product(Team, LeadWeek, ConversionWeek):
scaffold_raw.append(i)
scaffold = pd.DataFrame(scaffold_raw, columns=['Team', 'LeadWeek', 'ConversionWeek'])
new_frame = scaffold.merge(df, how='left')
new_frame = new_frame.sort_values(by=['Team', 'LeadWeek', 'ConversionWeek']).reset_index(drop=True)
new_frame['Conversions'].fillna(0, inplace=True)
感谢您对更优雅解决方案的帮助。通过传递
pd.MultiIndex
-
idx = pd.MultiIndex.from_product(
[df.Team.unique(), df.LeadWeek.unique(), np.arange(5)]
)
v = df.set_index(['Team', 'LeadWeek', 'ConversionWeek'])\
.reindex(idx)\
.fillna(0)\
.reset_index()
v.columns = df.columns
v
Team LeadWeek ConversionWeek Conversions
0 Team A 2017-12-01 0 2.0
1 Team A 2017-12-01 1 0.0
2 Team A 2017-12-01 2 1.0
3 Team A 2017-12-01 3 0.0
4 Team A 2017-12-01 4 1.0
5 Team A 2017-12-08 0 0.0
6 Team A 2017-12-08 1 0.0
7 Team A 2017-12-08 2 0.0
8 Team A 2017-12-08 3 2.0
9 Team A 2017-12-08 4 0.0
10 Team B 2017-12-01 0 1.0
11 Team B 2017-12-01 1 0.0
12 Team B 2017-12-01 2 3.0
13 Team B 2017-12-01 3 0.0
14 Team B 2017-12-01 4 0.0
15 Team B 2017-12-08 0 0.0
16 Team B 2017-12-08 1 3.0
17 Team B 2017-12-08 2 0.0
18 Team B 2017-12-08 3 2.0
19 Team B 2017-12-08 4 0.0
好极了非常感谢。