Python 在数据帧上迭代并组合值;来自不同的专栏
我有两个数据帧: 支持数据:Python 在数据帧上迭代并组合值;来自不同的专栏,python,pandas,dataframe,Python,Pandas,Dataframe,我有两个数据帧: 支持数据: support_data = { 'index_value': [ 100, 250, 500, 30, 10 ] } support_df = pd.DataFrame(support_data) index_value 0 100 1 250 2 500 3 30 4 10 主要数据: data = { 'link_index'
support_data = {
'index_value': [
100,
250,
500,
30,
10
]
}
support_df = pd.DataFrame(support_data)
index_value
0 100
1 250
2 500
3 30
4 10
主要数据:
data = {
'link_index': [
'0', '0',
'0', '1',
'2', '3',
'3', '4',
'4', '4'
],
'value_1': [
'1', '2',
'3', '4',
'5', '6',
'7', '8',
'9', '0'
],
'value_2': [
'11', '28',
'33', '40',
'50', '60',
'70', '80',
'90', '100'
]
}
df = pd.DataFrame(data)
link_index value_1 value_2
0 0 1 11
1 0 2 28
2 0 3 33
3 1 4 40
4 2 5 50
5 3 6 60
6 3 7 70
7 4 8 80
8 4 9 90
9 4 0 100
我需要对数据帧进行切片,压缩值_1和值_2,并通过链接_索引附加来自支持数据帧的值
我已经找到了解决方案,但速度很慢。也许存在更快的决定
我的解决方案和结果:
函数zip值和追加支持数据框中的值
def write(group):
value_1 = group.value_1.tolist()
value_2 = group.value_2.tolist()
result = [b for a in zip(value_1, value_2) for b in a]
index = group.link_index.astype(int).iloc[0]
result.append(support_df.index_value.iloc[index])
result = ','.join(str(e) for e in result)
return result
在长度=nrows且步长=重叠的切片上循环分割数据帧:
overlap = 1
nrows = 2
for i in range(0, len(df) - overlap, nrows - overlap):
row = write(df.iloc[i : i + nrows])
result = result.append(pd.DataFrame({'seq' : [row]}), ignore_index=True)
结果:
seq
0 1,11,2,28,100
1 2,28,3,33,100
2 3,33,4,40,100
3 4,40,5,50,250
4 5,50,6,60,500
5 6,60,7,70,30
6 7,70,8,80,30
7 8,80,9,90,10
8 9,90,0,100,10
我期待更快的解决方案。您可以试试这个(我没有比较速度,但这不涉及任何for循环):
输出:
0 [1, 11, 2, 28, 100]
1 [2, 28, 3, 33, 100]
2 [3, 33, 4, 40, 100]
3 [4, 40, 5, 50, 250]
4 [5, 50, 6, 60, 500]
5 [6, 60, 7, 70, 30]
6 [7, 70, 8, 80, 30]
7 [8, 80, 9, 90, 10]
8 [9, 90, 0, 100, 10]
0 [1, 11, 2, 28, 100]
1 [2, 28, 3, 33, 100]
2 [3, 33, 4, 40, 100]
3 [4, 40, 5, 50, 250]
4 [5, 50, 6, 60, 500]
5 [6, 60, 7, 70, 30]
6 [7, 70, 8, 80, 30]
7 [8, 80, 9, 90, 10]
8 [9, 90, 0, 100, 10]