Pandas 在一行中追加多行_Pandas

Pandas 在一行中追加多行

pandas

Pandas 在一行中追加多行,pandas,Pandas,这是我的数据帧： d = {'id': [1,1,2,2,3,3,3] , 'a_code': ['abc', 'abclm', 'pqr', 'pqren', 'lmn', 'lmnre', 'xyznt'], 'a_type':['CP','CO','CP','CO','CP','CP','CO'], 'z_code': ['abclm', 'wedvg', 'pqren', 'unfdc', 'lmnre','wqrtn','hgbvcx'], 'z_type': ['CO', 'CO'

这是我的数据帧：

d = {'id': [1,1,2,2,3,3,3] ,
'a_code': ['abc', 'abclm', 'pqr', 'pqren', 'lmn', 'lmnre', 'xyznt'], 
'a_type':['CP','CO','CP','CO','CP','CP','CO'],
'z_code': ['abclm', 'wedvg', 'pqren', 'unfdc', 'lmnre','wqrtn','hgbvcx'],
'z_type': ['CO', 'CO', 'CO','CO','CP','CO','RT'],
'stepNo': [1,2,1,2,1,2,3]
}

df= pd.DataFrame(d)

每个id都有由

stepNo

定义的连续路径行。我想在一行中打印所有步骤，以便可视化路径。

stepNo

在2到24之间变化，因此在某些情况下，我可以有5x24列。有可能这样做吗

输出：

id   stepNo  a_code  a_type   z_code   z_type     stepNo    a_code   a_type   z_code   z_type    stepNo    a_code   a_type   z_code   z_type

 1     1      abc     CP     abclm      CO         2       abclm     CO      wedvg     CO
 2     1      pqr     CP     pqren      CO         2       pqren     CO      unfdc     CO        
 3     1      lmn     CP     lmnre      CP         2       lmnre     CP      wqrtn     CO           3      xyznt     CO      hgbvcx     RT

更新：

@NYC编码器解决方案在这个示例中失败了，如果有人能帮我找出答案，我将不胜感激，因为我的数据帧具有高维数，所以所有其他答案超时或不清楚如何需要输出

nan = ""
d = {'NAME': [1,1,2,2,3,3,3,3,3,4,4,4,4,4,4],
 'col1': ['P100','P100','P100','P100','MS','MS','MS','MS','MS','MS','MS','MS','MS','MS','MS'],
 'col2': ['CNMZ',
  'CNMZ',
  'COMX',
  'COMX',
  '_NCTE',
  '_NCTE',
  '_NCTE',
  '_NCTE',
  '_NCTE',
  'T1MF',
  'T1MF',
  'T1MF',
  'T1MF',
  'T1MF',
  'T1MF'],
 'stepNo': [1, 2, 1, 2, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 6],
 'col4': ['xyz',
  'abc',
  'pqr',
  'gvt',
  'mno',
  'tru',
  'ercm',
  'lotr',
  'ddlj',
  'refv',
  'ecv',
  'ecv',
  'ecv',
  'ecv',
  'ecv'],
 'col5': ['PHL',
  'PHL',
  'BHL',
  'ALT',
  'MRS',
  'MRS',
  'TUL',
  'MRS',
  'FAT',
  'PHL',
  'PHL',
  'JEN',
  'FTW',
  'AMB',
  'KGP'],
 'col6': ['CP',
  'CO',
  'CP',
  'CO',
  'CP',
  'CO',
  'CO',
  'CO',
  'RT',
  'CO',
  'CO',
  'CO',
  'CP',
  'CO',
  'CO'],
 'col7': ['PHL',
  'PHL',
  'ALT',
  'ALT',
  'MRS',
  'TUL',
  'MRS',
  'FAT',
  'FAH',
  'PHL',
  'JEN',
  'FTW',
  'AMB',
  'KGP',
  'KGP'],
 'col8': ['CO',
  'CO',
  'CO',
  'CO',
  'CO',
  'CO',
  'CO',
  'RT',
  'CP',
  'CO',
  'CO',
  'CP',
  'CO',
  'CO',
  'CO'],
 'col9': ['SID',
  'M/M',
  'SID',
  'U/D',
  'AL LO',
  'AL LO',
  'AL LO',
  'AL LO',
  'AL LO',
  'M/M',
  'DCS',
  'DCS',
  'DCS',
  'DCS',
  'DCS'],
 'col10': ['SID',
  'M/M',
  'SID',
  'U/D',
  'AL LO',
  '3 M',
  '3 M',
  'M/M',
  'AL LO',
  'M/M',
  'DCS',
  'DCS',
  'DCS',
  'DCS',
  'DCS'],
 'col11': [nan,
  'ATM',
  nan,
  'PACK',
  'AL LP',
  'DCS',
  'DCS',
  'DAM',
  'DAM',
  'DCS',
  'DCS',
  'DCS',
  'DCS',
  'DCS',
  'M/M'],
 'col12': [nan,
  'SID',
  nan,
  'PACK',
  'CAL LO',
  'DCS',
  'DCS',
  'M/M',
  'CAL LO',
  'DCS',
  'DCS',
  'DCS',
  'DCS',
  'DCS',
  'AL LO'],
 'col13': ['abc',
  '-02-1_',
  '-1',
  '-13_',
  nan,
  nan,
  nan,
  'T1_VT1.',
  nan,
  '-06',
  nan,
  nan,
  nan,
  nan,
  '-03_02-03'],
 'col14': [nan,
  nan,
  nan,
  nan,
  '102/',
  '102/',
  '102/',
  nan,
  '101/',
  nan,
  '3405',
  '3102/',
  '3111/',
  '3102/',
  nan]}
df = pd.DataFrame(d)

我想您应该使用

pivot\u表

，然后使用

sort\u索引

table=pd.pivot_table(df, index = ['id'],values = ['a_code','a_type','z_code','z_type'],
                    columns = ['stepNo'], fill_value = '', aggfunc = lambda x: x).swaplevel(0, 1, axis=1).sort_index(axis=1) 

stepNo      1                           2  ...             3                      
       a_code a_type z_code z_type a_code  ... z_type a_code a_type  z_code z_type
id                                         ...                                    
1         abc     CP  abclm     CO  abclm  ...     CO                             
2         pqr     CP  pqren     CO  pqren  ...     CO                             
3         lmn     CP  lmnre     CP  lmnre  ...     CO  xyznt     CO  hgbvcx     RT

或者不切换多索引列级别：

table=pd.pivot_table(df, index = ['id'],values = ['a_code','a_type','z_code','z_type'],
                    columns = ['stepNo'], fill_value = '', aggfunc = lambda x: x).sort_index(axis=1)

       a_code               a_type         z_code                z_type        
stepNo      1      2      3      1   2   3      1      2       3      1   2   3
id                                                                             
1         abc  abclm            CP  CO      abclm  wedvg             CO  CO    
2         pqr  pqren            CP  CO      pqren  unfdc             CO  CO    
3         lmn  lmnre  xyznt     CP  CP  CO  lmnre  wqrtn  hgbvcx     CP  CO  RT

我想您应该使用

pivot\u表

，然后使用

sort\u索引

table=pd.pivot_table(df, index = ['id'],values = ['a_code','a_type','z_code','z_type'],
                    columns = ['stepNo'], fill_value = '', aggfunc = lambda x: x).swaplevel(0, 1, axis=1).sort_index(axis=1) 

stepNo      1                           2  ...             3                      
       a_code a_type z_code z_type a_code  ... z_type a_code a_type  z_code z_type
id                                         ...                                    
1         abc     CP  abclm     CO  abclm  ...     CO                             
2         pqr     CP  pqren     CO  pqren  ...     CO                             
3         lmn     CP  lmnre     CP  lmnre  ...     CO  xyznt     CO  hgbvcx     RT

或者不切换多索引列级别：

table=pd.pivot_table(df, index = ['id'],values = ['a_code','a_type','z_code','z_type'],
                    columns = ['stepNo'], fill_value = '', aggfunc = lambda x: x).sort_index(axis=1)

       a_code               a_type         z_code                z_type        
stepNo      1      2      3      1   2   3      1      2       3      1   2   3
id                                                                             
1         abc  abclm            CP  CO      abclm  wedvg             CO  CO    
2         pqr  pqren            CP  CO      pqren  unfdc             CO  CO    
3         lmn  lmnre  xyznt     CP  CP  CO  lmnre  wqrtn  hgbvcx     CP  CO  RT

您可以这样做：

d = {'id': [1,1,2,2,3,3,3] ,
'a_code': ['abc', 'abclm', 'pqr', 'pqren', 'lmn', 'lmnre', 'xyznt'], 
'a_type':['CP','CO','CP','CO','CP','CP','CO'],
'z_code': ['abclm', 'wedvg', 'pqren', 'unfdc', 'lmnre','wqrtn','hgbvcx'],
'z_type': ['CO', 'CO', 'CO','CO','CP','CO','RT'],
'stepNo': [1,2,1,2,1,2,3]
}
df = pd.DataFrame(d)
dfs = []
for i in range(min(df['stepNo']), max(df['stepNo'])+1):
    dfs.append(df[df['stepNo']==i].reset_index())
dfx = pd.concat(dfs, axis=1)
dfx.drop(inplace=True, columns=['index'])
print(dfx)

   id a_code a_type z_code z_type  stepNo  id a_code a_type z_code z_type  stepNo   id a_code a_type  z_code z_type  stepNo
0   1    abc     CP  abclm     CO       1   1  abclm     CO  wedvg     CO       2  3.0  xyznt     CO  hgbvcx     RT     3.0
1   2    pqr     CP  pqren     CO       1   2  pqren     CO  unfdc     CO       2  NaN    NaN    NaN     NaN    NaN     NaN
2   3    lmn     CP  lmnre     CP       1   3  lmnre     CP  wqrtn     CO       2  NaN    NaN    NaN     NaN    NaN     NaN

您可以这样做：

d = {'id': [1,1,2,2,3,3,3] ,
'a_code': ['abc', 'abclm', 'pqr', 'pqren', 'lmn', 'lmnre', 'xyznt'], 
'a_type':['CP','CO','CP','CO','CP','CP','CO'],
'z_code': ['abclm', 'wedvg', 'pqren', 'unfdc', 'lmnre','wqrtn','hgbvcx'],
'z_type': ['CO', 'CO', 'CO','CO','CP','CO','RT'],
'stepNo': [1,2,1,2,1,2,3]
}
df = pd.DataFrame(d)
dfs = []
for i in range(min(df['stepNo']), max(df['stepNo'])+1):
    dfs.append(df[df['stepNo']==i].reset_index())
dfx = pd.concat(dfs, axis=1)
dfx.drop(inplace=True, columns=['index'])
print(dfx)

   id a_code a_type z_code z_type  stepNo  id a_code a_type z_code z_type  stepNo   id a_code a_type  z_code z_type  stepNo
0   1    abc     CP  abclm     CO       1   1  abclm     CO  wedvg     CO       2  3.0  xyznt     CO  hgbvcx     RT     3.0
1   2    pqr     CP  pqren     CO       1   2  pqren     CO  unfdc     CO       2  NaN    NaN    NaN     NaN    NaN     NaN
2   3    lmn     CP  lmnre     CP       1   3  lmnre     CP  wqrtn     CO       2  NaN    NaN    NaN     NaN    NaN     NaN

如果您不想使用上面@Gio建议的多索引方法，我认为应该这样做，记住每个步骤都要重新命名数据头：

import pandas as pd
import numpy as np

d = {'id': [1,1,2,2,3,3,3] ,
'a_code': ['abc', 'abclm', 'pqr', 'pqren', 'lmn', 'lmnre', 'xyznt'], 
'a_type':['CP','CO','CP','CO','CP','CP','CO'],
'z_code': ['abclm', 'wedvg', 'pqren', 'unfdc', 'lmnre','wqrtn','hgbvcx'],
'z_type': ['CO', 'CO', 'CO','CO','CP','CO','RT'],
'stepNo': [1,2,1,2,1,2,3]
}

df= pd.DataFrame(d)


hstackedDf =pd.pivot_table(df, index=['id'], aggfunc=lambda x: (np.hstack(x.values.ravel()).astype(str)).tolist(), values=['stepNo']).stepNo.apply(pd.Series).fillna(0).reset_index()
#get length of steps to use in the process for flexible number of steps
noOfSteps = len(hstackedDf.columns)

#loop over steps
for i in range(1,noOfSteps):
    #rename to unique step number
    hstackedDf = hstackedDf.rename(columns={ hstackedDf.columns[i]: 'stepNo_' + str(i)})
    #convert to integer
    hstackedDf['stepNo_' + str(i)] = hstackedDf['stepNo_' + str(i)].astype(int)
    #merge rest of data for the current step
    hstackedDf = hstackedDf.merge(df, how='left', left_on=['id', 'stepNo_' + str(i)], right_on=['id', 'stepNo'])
    #drop stenpNo column
    hstackedDf = hstackedDf.drop(['stepNo'], axis=1)
    #rename data to spicific step number
    hstackedDf = hstackedDf.rename(columns={ 'a_code': 'a_code_' + str(i), 'a_type': 'a_type_' + str(i), 'z_code': 'z_code_' + str(i), 'z_type': 'z_type_' + str(i)})

#create list of steps lists
stepsColumns = list()
for i in range(1,(noOfSteps)):
    indList = [c for c in hstackedDf if ('_'+str(i)) in c]
    stepsColumns.append(indList)
#convert to one flat list
flat_list = list()
for sublist in stepsColumns:
    for item in sublist:
        flat_list.append(item)
#add ID column
flat_list.insert(0, 'id')  
#reorder output dataframe   
outputDF = hstackedDf[flat_list]

print(outputDF)

id  stepNo_1 a_code_1 a_type_1  ... a_code_3 a_type_3  z_code_3 z_type_3
1         1      abc       CP  ...      NaN      NaN       NaN      NaN
2         1      pqr       CP  ...      NaN      NaN       NaN      NaN
3         1      lmn       CP  ...    xyznt       CO    hgbvcx       RT

如果您不想使用上面@Gio建议的多索引方法，我认为应该这样做，记住每个步骤都要重新命名数据头：

import pandas as pd
import numpy as np

d = {'id': [1,1,2,2,3,3,3] ,
'a_code': ['abc', 'abclm', 'pqr', 'pqren', 'lmn', 'lmnre', 'xyznt'], 
'a_type':['CP','CO','CP','CO','CP','CP','CO'],
'z_code': ['abclm', 'wedvg', 'pqren', 'unfdc', 'lmnre','wqrtn','hgbvcx'],
'z_type': ['CO', 'CO', 'CO','CO','CP','CO','RT'],
'stepNo': [1,2,1,2,1,2,3]
}

df= pd.DataFrame(d)


hstackedDf =pd.pivot_table(df, index=['id'], aggfunc=lambda x: (np.hstack(x.values.ravel()).astype(str)).tolist(), values=['stepNo']).stepNo.apply(pd.Series).fillna(0).reset_index()
#get length of steps to use in the process for flexible number of steps
noOfSteps = len(hstackedDf.columns)

#loop over steps
for i in range(1,noOfSteps):
    #rename to unique step number
    hstackedDf = hstackedDf.rename(columns={ hstackedDf.columns[i]: 'stepNo_' + str(i)})
    #convert to integer
    hstackedDf['stepNo_' + str(i)] = hstackedDf['stepNo_' + str(i)].astype(int)
    #merge rest of data for the current step
    hstackedDf = hstackedDf.merge(df, how='left', left_on=['id', 'stepNo_' + str(i)], right_on=['id', 'stepNo'])
    #drop stenpNo column
    hstackedDf = hstackedDf.drop(['stepNo'], axis=1)
    #rename data to spicific step number
    hstackedDf = hstackedDf.rename(columns={ 'a_code': 'a_code_' + str(i), 'a_type': 'a_type_' + str(i), 'z_code': 'z_code_' + str(i), 'z_type': 'z_type_' + str(i)})

#create list of steps lists
stepsColumns = list()
for i in range(1,(noOfSteps)):
    indList = [c for c in hstackedDf if ('_'+str(i)) in c]
    stepsColumns.append(indList)
#convert to one flat list
flat_list = list()
for sublist in stepsColumns:
    for item in sublist:
        flat_list.append(item)
#add ID column
flat_list.insert(0, 'id')  
#reorder output dataframe   
outputDF = hstackedDf[flat_list]

print(outputDF)

id  stepNo_1 a_code_1 a_type_1  ... a_code_3 a_type_3  z_code_3 z_type_3
1         1      abc       CP  ...      NaN      NaN       NaN      NaN
2         1      pqr       CP  ...      NaN      NaN       NaN      NaN
3         1      lmn       CP  ...    xyznt       CO    hgbvcx       RT

这不是最好的答案，但这里有一个例子，可以用更少的代码来完成，尽管我希望我可以避免列表格式

df2 = df.groupby(['id', 'stepNo']).agg(list)
df3 = df2.unstack(level=-1, fill_value='')

                a_code     a_type      z_code     z_type
  stepNo    1   2   3   1   2   3   1   2   3   1   2   3
 id                                             
 1  [abc]   [abclm]         [CP]    [CO]    [abclm] [wedvg]     [CO]    [CO]    
 2  [pqr]   [pqren]         [CP]    [CO]    [pqren] [unfdc]     [CO]    [CO]    
 3  [lmn]   [lmnre] [xyznt] [CP]    [CP]    [CO]    [lmnre] [wqrtn] [hgbvcx]    [CP]    [CO]    [RT]

这不是最好的答案，但这里有一个例子，可以用更少的代码来完成，尽管我希望我可以避免列表格式

df2 = df.groupby(['id', 'stepNo']).agg(list)
df3 = df2.unstack(level=-1, fill_value='')

                a_code     a_type      z_code     z_type
  stepNo    1   2   3   1   2   3   1   2   3   1   2   3
 id                                             
 1  [abc]   [abclm]         [CP]    [CO]    [abclm] [wedvg]     [CO]    [CO]    
 2  [pqr]   [pqren]         [CP]    [CO]    [pqren] [unfdc]     [CO]    [CO]    
 3  [lmn]   [lmnre] [xyznt] [CP]    [CP]    [CO]    [lmnre] [wqrtn] [hgbvcx]    [CP]    [CO]    [RT]

我可以只保留第一个id列吗？我已经用一个样本数据集更新了这个问题，该样本数据集不符合您的解决方案。你能检查一下吗？我不明白，什么是失败的？即使是对于新数据，我也能正确地看到输出。你能在新数据集上发布你的输出吗？检查这个问题我能只保留第一个id列吗？我已经用一个样本数据集更新了这个问题，但你的解决方案失败了。你能检查一下吗？我不明白，什么是失败的？即使是新数据，我也能正确地看到输出。您能在新数据集上发布您的输出吗？请查看此内容，谢谢您提供的解决方案，但我的数据框中有16列以上，步骤高达82，我的数据框是跟踪路径，所以我想让行一个接一个地排列起来，这样在我只有5-6个步骤的情况下可以减少额外的滚动。谢谢你的解决方案，但是我的数据框中有16列以上，步骤高达82，我的数据框是一个跟踪路径，所以我希望行一行接一行地排列，这样在我只有5-6个步骤的情况下可以减少额外的滚动。这个解决方案需要很长时间才能运行。我运行了将近1小时，然后不得不关闭内核。这是我的数据帧（2768734行×16列）的形状。此解决方案需要很长时间才能运行。我运行了将近1小时，然后不得不关闭内核。这是我的数据帧的形状（2768734行×16列）。