Python 合并行+;Groupby/apply函数
我有如下数据帧(dataframe1): 我想创建一个新的dataframe,若for unique ID的结束值等于另一行的开始值,那个么它应该合并到一行中(ID必须在同一个组中)。 所以,我想要这样的东西(dataframe2): 现在对我来说太复杂了。所以首先我想到了groupby和apply。我在每个组中将结束列移动了一个(向下),并检查开始值与结束移位值相等的次数(我也可以在以后的数据集分析中使用它,所以它并没有那么无用)。所以我写了一个函数:Python 合并行+;Groupby/apply函数,python,dataframe,group-by,apply,Python,Dataframe,Group By,Apply,我有如下数据帧(dataframe1): 我想创建一个新的dataframe,若for unique ID的结束值等于另一行的开始值,那个么它应该合并到一行中(ID必须在同一个组中)。 所以,我想要这样的东西(dataframe2): 现在对我来说太复杂了。所以首先我想到了groupby和apply。我在每个组中将结束列移动了一个(向下),并检查开始值与结束移位值相等的次数(我也可以在以后的数据集分析中使用它,所以它并没有那么无用)。所以我写了一个函数: def mygroup(df):
def mygroup(df):
is_continued = 0
df['End'] = df['End'].shift(1)
for index, row in df.iterrows():
if (row['Start'] == row['End']):
is_continued = is_continued + 1
return is_continued
然后:
is_continued = dataframe.groupby(['ID']).apply(mygroup)
我以为它会给我4个ID1和4个ID2,但不是
ID
ID1 4
ID2 0
dtype: int64
所以我的问题是
我对熊猫几乎没有经验,因此(1)我不能回答你的第一个问题,(2)至于你的第二个问题,这可能不是专家会做的。但我将其作为一种解决方案:
import pandas as pd
def combine_rows(df):
id = []
group = []
start = []
end = []
prev_row = None
for index, row in df.iterrows():
if prev_row is None:
prev_row = row
prev_end = prev_row.End
continue
if row.ID == prev_row.ID and row.Group == prev_row.Group and row.Start == prev_end:
prev_end = row.End
else:
id.append(prev_row.ID)
group.append(prev_row.Group)
start.append(prev_row.Start)
end.append(prev_end)
prev_row = row
prev_end = prev_row.End
if prev_row is not None:
id.append(prev_row.ID)
group.append(prev_row.Group)
start.append(prev_row.Start)
end.append(prev_end)
return pd.DataFrame({"ID": id, "Group": group, "Start": start, "End": end})
df = pd.DataFrame({
"ID": ['ID1', 'ID1', 'ID1', 'ID1', 'ID1', 'ID2', 'ID2', 'ID2', 'ID2', 'ID2'],
"Group": ['A', 'A', 'A', 'B', 'B', 'A', 'A', 'B', 'B', 'B'],
"Start": [1, 2, 3, 4, 5, 6, 7, 8, 9, 11],
"End": [2, 3, 4, 5, 6, 7, 8, 9, 10, 12]
})
print(df)
df = combine_rows(df)
print(df))
印刷品:
ID Group Start End
0 ID1 A 1 2
1 ID1 A 2 3
2 ID1 A 3 4
3 ID1 B 4 5
4 ID1 B 5 6
5 ID2 A 6 7
6 ID2 A 7 8
7 ID2 B 8 9
8 ID2 B 9 10
9 ID2 B 11 12
ID Group Start End
0 ID1 A 1 4
1 ID1 B 4 6
2 ID2 A 6 8
3 ID2 B 8 10
4 ID2 B 11 12
import pandas as pd
def combine_rows(df):
id = []
group = []
start = []
end = []
prev_row = None
for index, row in df.iterrows():
if prev_row is None:
prev_row = row
prev_end = prev_row.End
continue
if row.ID == prev_row.ID and row.Group == prev_row.Group and row.Start == prev_end:
prev_end = row.End
else:
id.append(prev_row.ID)
group.append(prev_row.Group)
start.append(prev_row.Start)
end.append(prev_end)
prev_row = row
prev_end = prev_row.End
if prev_row is not None:
id.append(prev_row.ID)
group.append(prev_row.Group)
start.append(prev_row.Start)
end.append(prev_end)
return pd.DataFrame({"ID": id, "Group": group, "Start": start, "End": end})
df = pd.DataFrame({
"ID": ['ID1', 'ID1', 'ID1', 'ID1', 'ID1', 'ID2', 'ID2', 'ID2', 'ID2', 'ID2'],
"Group": ['A', 'A', 'A', 'B', 'B', 'A', 'A', 'B', 'B', 'B'],
"Start": [1, 2, 3, 4, 5, 6, 7, 8, 9, 11],
"End": [2, 3, 4, 5, 6, 7, 8, 9, 10, 12]
})
print(df)
df = combine_rows(df)
print(df))
ID Group Start End
0 ID1 A 1 2
1 ID1 A 2 3
2 ID1 A 3 4
3 ID1 B 4 5
4 ID1 B 5 6
5 ID2 A 6 7
6 ID2 A 7 8
7 ID2 B 8 9
8 ID2 B 9 10
9 ID2 B 11 12
ID Group Start End
0 ID1 A 1 4
1 ID1 B 4 6
2 ID2 A 6 8
3 ID2 B 8 10
4 ID2 B 11 12