Pandas 在一行中追加多行
这是我的数据帧:Pandas 在一行中追加多行,pandas,Pandas,这是我的数据帧: d = {'id': [1,1,2,2,3,3,3] , 'a_code': ['abc', 'abclm', 'pqr', 'pqren', 'lmn', 'lmnre', 'xyznt'], 'a_type':['CP','CO','CP','CO','CP','CP','CO'], 'z_code': ['abclm', 'wedvg', 'pqren', 'unfdc', 'lmnre','wqrtn','hgbvcx'], 'z_type': ['CO', 'CO'
d = {'id': [1,1,2,2,3,3,3] ,
'a_code': ['abc', 'abclm', 'pqr', 'pqren', 'lmn', 'lmnre', 'xyznt'],
'a_type':['CP','CO','CP','CO','CP','CP','CO'],
'z_code': ['abclm', 'wedvg', 'pqren', 'unfdc', 'lmnre','wqrtn','hgbvcx'],
'z_type': ['CO', 'CO', 'CO','CO','CP','CO','RT'],
'stepNo': [1,2,1,2,1,2,3]
}
df= pd.DataFrame(d)
每个id都有由stepNo
定义的连续路径行。我想在一行中打印所有步骤,以便可视化路径。stepNo
在2到24之间变化,因此在某些情况下,我可以有5x24列。有可能这样做吗
输出:
id stepNo a_code a_type z_code z_type stepNo a_code a_type z_code z_type stepNo a_code a_type z_code z_type
1 1 abc CP abclm CO 2 abclm CO wedvg CO
2 1 pqr CP pqren CO 2 pqren CO unfdc CO
3 1 lmn CP lmnre CP 2 lmnre CP wqrtn CO 3 xyznt CO hgbvcx RT
更新:
@NYC编码器解决方案在这个示例中失败了,如果有人能帮我找出答案,我将不胜感激,因为我的数据帧具有高维数,所以所有其他答案超时或不清楚如何需要输出
nan = ""
d = {'NAME': [1,1,2,2,3,3,3,3,3,4,4,4,4,4,4],
'col1': ['P100','P100','P100','P100','MS','MS','MS','MS','MS','MS','MS','MS','MS','MS','MS'],
'col2': ['CNMZ',
'CNMZ',
'COMX',
'COMX',
'_NCTE',
'_NCTE',
'_NCTE',
'_NCTE',
'_NCTE',
'T1MF',
'T1MF',
'T1MF',
'T1MF',
'T1MF',
'T1MF'],
'stepNo': [1, 2, 1, 2, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 6],
'col4': ['xyz',
'abc',
'pqr',
'gvt',
'mno',
'tru',
'ercm',
'lotr',
'ddlj',
'refv',
'ecv',
'ecv',
'ecv',
'ecv',
'ecv'],
'col5': ['PHL',
'PHL',
'BHL',
'ALT',
'MRS',
'MRS',
'TUL',
'MRS',
'FAT',
'PHL',
'PHL',
'JEN',
'FTW',
'AMB',
'KGP'],
'col6': ['CP',
'CO',
'CP',
'CO',
'CP',
'CO',
'CO',
'CO',
'RT',
'CO',
'CO',
'CO',
'CP',
'CO',
'CO'],
'col7': ['PHL',
'PHL',
'ALT',
'ALT',
'MRS',
'TUL',
'MRS',
'FAT',
'FAH',
'PHL',
'JEN',
'FTW',
'AMB',
'KGP',
'KGP'],
'col8': ['CO',
'CO',
'CO',
'CO',
'CO',
'CO',
'CO',
'RT',
'CP',
'CO',
'CO',
'CP',
'CO',
'CO',
'CO'],
'col9': ['SID',
'M/M',
'SID',
'U/D',
'AL LO',
'AL LO',
'AL LO',
'AL LO',
'AL LO',
'M/M',
'DCS',
'DCS',
'DCS',
'DCS',
'DCS'],
'col10': ['SID',
'M/M',
'SID',
'U/D',
'AL LO',
'3 M',
'3 M',
'M/M',
'AL LO',
'M/M',
'DCS',
'DCS',
'DCS',
'DCS',
'DCS'],
'col11': [nan,
'ATM',
nan,
'PACK',
'AL LP',
'DCS',
'DCS',
'DAM',
'DAM',
'DCS',
'DCS',
'DCS',
'DCS',
'DCS',
'M/M'],
'col12': [nan,
'SID',
nan,
'PACK',
'CAL LO',
'DCS',
'DCS',
'M/M',
'CAL LO',
'DCS',
'DCS',
'DCS',
'DCS',
'DCS',
'AL LO'],
'col13': ['abc',
'-02-1_',
'-1',
'-13_',
nan,
nan,
nan,
'T1_VT1.',
nan,
'-06',
nan,
nan,
nan,
nan,
'-03_02-03'],
'col14': [nan,
nan,
nan,
nan,
'102/',
'102/',
'102/',
nan,
'101/',
nan,
'3405',
'3102/',
'3111/',
'3102/',
nan]}
df = pd.DataFrame(d)
我想您应该使用
pivot\u表
,然后使用sort\u索引
table=pd.pivot_table(df, index = ['id'],values = ['a_code','a_type','z_code','z_type'],
columns = ['stepNo'], fill_value = '', aggfunc = lambda x: x).swaplevel(0, 1, axis=1).sort_index(axis=1)
stepNo 1 2 ... 3
a_code a_type z_code z_type a_code ... z_type a_code a_type z_code z_type
id ...
1 abc CP abclm CO abclm ... CO
2 pqr CP pqren CO pqren ... CO
3 lmn CP lmnre CP lmnre ... CO xyznt CO hgbvcx RT
或者不切换多索引列级别:
table=pd.pivot_table(df, index = ['id'],values = ['a_code','a_type','z_code','z_type'],
columns = ['stepNo'], fill_value = '', aggfunc = lambda x: x).sort_index(axis=1)
a_code a_type z_code z_type
stepNo 1 2 3 1 2 3 1 2 3 1 2 3
id
1 abc abclm CP CO abclm wedvg CO CO
2 pqr pqren CP CO pqren unfdc CO CO
3 lmn lmnre xyznt CP CP CO lmnre wqrtn hgbvcx CP CO RT
我想您应该使用
pivot\u表
,然后使用sort\u索引
table=pd.pivot_table(df, index = ['id'],values = ['a_code','a_type','z_code','z_type'],
columns = ['stepNo'], fill_value = '', aggfunc = lambda x: x).swaplevel(0, 1, axis=1).sort_index(axis=1)
stepNo 1 2 ... 3
a_code a_type z_code z_type a_code ... z_type a_code a_type z_code z_type
id ...
1 abc CP abclm CO abclm ... CO
2 pqr CP pqren CO pqren ... CO
3 lmn CP lmnre CP lmnre ... CO xyznt CO hgbvcx RT
或者不切换多索引列级别:
table=pd.pivot_table(df, index = ['id'],values = ['a_code','a_type','z_code','z_type'],
columns = ['stepNo'], fill_value = '', aggfunc = lambda x: x).sort_index(axis=1)
a_code a_type z_code z_type
stepNo 1 2 3 1 2 3 1 2 3 1 2 3
id
1 abc abclm CP CO abclm wedvg CO CO
2 pqr pqren CP CO pqren unfdc CO CO
3 lmn lmnre xyznt CP CP CO lmnre wqrtn hgbvcx CP CO RT
您可以这样做:
d = {'id': [1,1,2,2,3,3,3] ,
'a_code': ['abc', 'abclm', 'pqr', 'pqren', 'lmn', 'lmnre', 'xyznt'],
'a_type':['CP','CO','CP','CO','CP','CP','CO'],
'z_code': ['abclm', 'wedvg', 'pqren', 'unfdc', 'lmnre','wqrtn','hgbvcx'],
'z_type': ['CO', 'CO', 'CO','CO','CP','CO','RT'],
'stepNo': [1,2,1,2,1,2,3]
}
df = pd.DataFrame(d)
dfs = []
for i in range(min(df['stepNo']), max(df['stepNo'])+1):
dfs.append(df[df['stepNo']==i].reset_index())
dfx = pd.concat(dfs, axis=1)
dfx.drop(inplace=True, columns=['index'])
print(dfx)
id a_code a_type z_code z_type stepNo id a_code a_type z_code z_type stepNo id a_code a_type z_code z_type stepNo
0 1 abc CP abclm CO 1 1 abclm CO wedvg CO 2 3.0 xyznt CO hgbvcx RT 3.0
1 2 pqr CP pqren CO 1 2 pqren CO unfdc CO 2 NaN NaN NaN NaN NaN NaN
2 3 lmn CP lmnre CP 1 3 lmnre CP wqrtn CO 2 NaN NaN NaN NaN NaN NaN
您可以这样做:
d = {'id': [1,1,2,2,3,3,3] ,
'a_code': ['abc', 'abclm', 'pqr', 'pqren', 'lmn', 'lmnre', 'xyznt'],
'a_type':['CP','CO','CP','CO','CP','CP','CO'],
'z_code': ['abclm', 'wedvg', 'pqren', 'unfdc', 'lmnre','wqrtn','hgbvcx'],
'z_type': ['CO', 'CO', 'CO','CO','CP','CO','RT'],
'stepNo': [1,2,1,2,1,2,3]
}
df = pd.DataFrame(d)
dfs = []
for i in range(min(df['stepNo']), max(df['stepNo'])+1):
dfs.append(df[df['stepNo']==i].reset_index())
dfx = pd.concat(dfs, axis=1)
dfx.drop(inplace=True, columns=['index'])
print(dfx)
id a_code a_type z_code z_type stepNo id a_code a_type z_code z_type stepNo id a_code a_type z_code z_type stepNo
0 1 abc CP abclm CO 1 1 abclm CO wedvg CO 2 3.0 xyznt CO hgbvcx RT 3.0
1 2 pqr CP pqren CO 1 2 pqren CO unfdc CO 2 NaN NaN NaN NaN NaN NaN
2 3 lmn CP lmnre CP 1 3 lmnre CP wqrtn CO 2 NaN NaN NaN NaN NaN NaN
如果您不想使用上面@Gio建议的多索引方法,我认为应该这样做,记住每个步骤都要重新命名数据头:
import pandas as pd
import numpy as np
d = {'id': [1,1,2,2,3,3,3] ,
'a_code': ['abc', 'abclm', 'pqr', 'pqren', 'lmn', 'lmnre', 'xyznt'],
'a_type':['CP','CO','CP','CO','CP','CP','CO'],
'z_code': ['abclm', 'wedvg', 'pqren', 'unfdc', 'lmnre','wqrtn','hgbvcx'],
'z_type': ['CO', 'CO', 'CO','CO','CP','CO','RT'],
'stepNo': [1,2,1,2,1,2,3]
}
df= pd.DataFrame(d)
hstackedDf =pd.pivot_table(df, index=['id'], aggfunc=lambda x: (np.hstack(x.values.ravel()).astype(str)).tolist(), values=['stepNo']).stepNo.apply(pd.Series).fillna(0).reset_index()
#get length of steps to use in the process for flexible number of steps
noOfSteps = len(hstackedDf.columns)
#loop over steps
for i in range(1,noOfSteps):
#rename to unique step number
hstackedDf = hstackedDf.rename(columns={ hstackedDf.columns[i]: 'stepNo_' + str(i)})
#convert to integer
hstackedDf['stepNo_' + str(i)] = hstackedDf['stepNo_' + str(i)].astype(int)
#merge rest of data for the current step
hstackedDf = hstackedDf.merge(df, how='left', left_on=['id', 'stepNo_' + str(i)], right_on=['id', 'stepNo'])
#drop stenpNo column
hstackedDf = hstackedDf.drop(['stepNo'], axis=1)
#rename data to spicific step number
hstackedDf = hstackedDf.rename(columns={ 'a_code': 'a_code_' + str(i), 'a_type': 'a_type_' + str(i), 'z_code': 'z_code_' + str(i), 'z_type': 'z_type_' + str(i)})
#create list of steps lists
stepsColumns = list()
for i in range(1,(noOfSteps)):
indList = [c for c in hstackedDf if ('_'+str(i)) in c]
stepsColumns.append(indList)
#convert to one flat list
flat_list = list()
for sublist in stepsColumns:
for item in sublist:
flat_list.append(item)
#add ID column
flat_list.insert(0, 'id')
#reorder output dataframe
outputDF = hstackedDf[flat_list]
print(outputDF)
id stepNo_1 a_code_1 a_type_1 ... a_code_3 a_type_3 z_code_3 z_type_3
1 1 abc CP ... NaN NaN NaN NaN
2 1 pqr CP ... NaN NaN NaN NaN
3 1 lmn CP ... xyznt CO hgbvcx RT
如果您不想使用上面@Gio建议的多索引方法,我认为应该这样做,记住每个步骤都要重新命名数据头:
import pandas as pd
import numpy as np
d = {'id': [1,1,2,2,3,3,3] ,
'a_code': ['abc', 'abclm', 'pqr', 'pqren', 'lmn', 'lmnre', 'xyznt'],
'a_type':['CP','CO','CP','CO','CP','CP','CO'],
'z_code': ['abclm', 'wedvg', 'pqren', 'unfdc', 'lmnre','wqrtn','hgbvcx'],
'z_type': ['CO', 'CO', 'CO','CO','CP','CO','RT'],
'stepNo': [1,2,1,2,1,2,3]
}
df= pd.DataFrame(d)
hstackedDf =pd.pivot_table(df, index=['id'], aggfunc=lambda x: (np.hstack(x.values.ravel()).astype(str)).tolist(), values=['stepNo']).stepNo.apply(pd.Series).fillna(0).reset_index()
#get length of steps to use in the process for flexible number of steps
noOfSteps = len(hstackedDf.columns)
#loop over steps
for i in range(1,noOfSteps):
#rename to unique step number
hstackedDf = hstackedDf.rename(columns={ hstackedDf.columns[i]: 'stepNo_' + str(i)})
#convert to integer
hstackedDf['stepNo_' + str(i)] = hstackedDf['stepNo_' + str(i)].astype(int)
#merge rest of data for the current step
hstackedDf = hstackedDf.merge(df, how='left', left_on=['id', 'stepNo_' + str(i)], right_on=['id', 'stepNo'])
#drop stenpNo column
hstackedDf = hstackedDf.drop(['stepNo'], axis=1)
#rename data to spicific step number
hstackedDf = hstackedDf.rename(columns={ 'a_code': 'a_code_' + str(i), 'a_type': 'a_type_' + str(i), 'z_code': 'z_code_' + str(i), 'z_type': 'z_type_' + str(i)})
#create list of steps lists
stepsColumns = list()
for i in range(1,(noOfSteps)):
indList = [c for c in hstackedDf if ('_'+str(i)) in c]
stepsColumns.append(indList)
#convert to one flat list
flat_list = list()
for sublist in stepsColumns:
for item in sublist:
flat_list.append(item)
#add ID column
flat_list.insert(0, 'id')
#reorder output dataframe
outputDF = hstackedDf[flat_list]
print(outputDF)
id stepNo_1 a_code_1 a_type_1 ... a_code_3 a_type_3 z_code_3 z_type_3
1 1 abc CP ... NaN NaN NaN NaN
2 1 pqr CP ... NaN NaN NaN NaN
3 1 lmn CP ... xyznt CO hgbvcx RT
这不是最好的答案,但这里有一个例子,可以用更少的代码来完成,尽管我希望我可以避免列表格式
df2 = df.groupby(['id', 'stepNo']).agg(list)
df3 = df2.unstack(level=-1, fill_value='')
a_code a_type z_code z_type
stepNo 1 2 3 1 2 3 1 2 3 1 2 3
id
1 [abc] [abclm] [CP] [CO] [abclm] [wedvg] [CO] [CO]
2 [pqr] [pqren] [CP] [CO] [pqren] [unfdc] [CO] [CO]
3 [lmn] [lmnre] [xyznt] [CP] [CP] [CO] [lmnre] [wqrtn] [hgbvcx] [CP] [CO] [RT]
这不是最好的答案,但这里有一个例子,可以用更少的代码来完成,尽管我希望我可以避免列表格式
df2 = df.groupby(['id', 'stepNo']).agg(list)
df3 = df2.unstack(level=-1, fill_value='')
a_code a_type z_code z_type
stepNo 1 2 3 1 2 3 1 2 3 1 2 3
id
1 [abc] [abclm] [CP] [CO] [abclm] [wedvg] [CO] [CO]
2 [pqr] [pqren] [CP] [CO] [pqren] [unfdc] [CO] [CO]
3 [lmn] [lmnre] [xyznt] [CP] [CP] [CO] [lmnre] [wqrtn] [hgbvcx] [CP] [CO] [RT]
我可以只保留第一个id列吗?我已经用一个样本数据集更新了这个问题,该样本数据集不符合您的解决方案。你能检查一下吗?我不明白,什么是失败的?即使是对于新数据,我也能正确地看到输出。你能在新数据集上发布你的输出吗?检查这个问题我能只保留第一个id列吗?我已经用一个样本数据集更新了这个问题,但你的解决方案失败了。你能检查一下吗?我不明白,什么是失败的?即使是新数据,我也能正确地看到输出。您能在新数据集上发布您的输出吗?请查看此内容,谢谢您提供的解决方案,但我的数据框中有16列以上,步骤高达82,我的数据框是跟踪路径,所以我想让行一个接一个地排列起来,这样在我只有5-6个步骤的情况下可以减少额外的滚动。谢谢你的解决方案,但是我的数据框中有16列以上,步骤高达82,我的数据框是一个跟踪路径,所以我希望行一行接一行地排列,这样在我只有5-6个步骤的情况下可以减少额外的滚动。这个解决方案需要很长时间才能运行。我运行了将近1小时,然后不得不关闭内核。这是我的数据帧(2768734行×16列)的形状。此解决方案需要很长时间才能运行。我运行了将近1小时,然后不得不关闭内核。这是我的数据帧的形状(2768734行×16列)。