Python 3.x 根据下面不断变化的行数添加列

Python 3.x 根据下面不断变化的行数添加列,python-3.x,pandas,Python 3.x,Pandas,我正在为一个大学项目解决一个机器学习问题。作为输入,我得到了一个excel表格。 需要访问特定行下面的信息(条件:df[c1]!=0)并使用它创建新列。但是特定行之后的行数不是固定的 我尝试运行各种pandas函数(例如:While循环与iloc、iterrows相结合),但似乎什么都不起作用。现在我想知道是否需要创建一个函数,在这个函数中,我为每个顶部元素下面的每个组创建一个新的df。我想一定有更好的选择。我使用Python 3.6和Pandas 0.25.0 我试图得到以下结果 输入: 输出

我正在为一个大学项目解决一个机器学习问题。作为输入,我得到了一个excel表格。 需要访问特定行下面的信息(条件:df[c1]!=0)并使用它创建新列。但是特定行之后的行数不是固定的

我尝试运行各种pandas函数(例如:While循环与iloc、iterrows相结合),但似乎什么都不起作用。现在我想知道是否需要创建一个函数,在这个函数中,我为每个顶部元素下面的每个组创建一个新的df。我想一定有更好的选择。我使用Python 3.6和Pandas 0.25.0

我试图得到以下结果

输入: 输出应如下所示:

Out[191]: 
  name  c1              c2              ka         tz         zz
0   ab   1            info  even more info  more info           
1   tz   0       more info                                      
2   ka   0  even more info                                      
3   cd   2            info                             more info
4   zz   0       more info                                      
输出:
您可以按如下方式执行此操作:

# make sure c1 is of type int (if it isn't already)
# if it is string, just change the comparison further below
df['c1']= df['c1'].astype('int32')

# create two temporary aux columns in the original dataframe
# the first contains 1 for each row where c1 is nonzero
df['nonzero']= (df['c1'] != 0).astype('int')
# the second contains a "group index" to give 
# all rows that belong together the same number
df['group']= df['nonzero'].cumsum()

# create a working copy from the original dataframe
df2= df[['c1', 'c2', 'group']].copy()
# add another column which contains the name of the
# column under which the text should appear
df2['col']= df['name'].where(df['nonzero']==0, 'c2')
# add a dummy column with all ones 
# (needed to merge the original dataframe 
# with the "transposed" dataframe later)
df2['nonzero']= 1

# now the main part
# use the prepared copy and index it on
# group, nonzero(1) and col
df3= df2[['group', 'nonzero', 'col', 'c2']].set_index(['group', 'nonzero', 'col'])
# unstack it, meaning col is "split off" to create a new column
# level (like pivoting), the rest remains in the index
df3= df3.unstack()

# now df3 has a multilevel column index
# to get rid of it and have regular column names
# just rename the columns and remove c2 which
# we get from the original dataframe
df3_names= ['{1}'.format(*tup) for tup in df3.columns]
df3.columns= df3_names
df3.drop(['c2'], axis='columns', inplace=True)

# df3 now contains the "transposed" infos in column c1
# which should appear in the row for which 'nonzero' contains 1
# to get this, use merge
result= df.merge(df3, left_on=['group', 'nonzero'], right_index=True, how='left')
# if you don't like the NaN values (for the rows with nonzero=0), use fillna
result.fillna('', inplace=True)
# remove the aux columns and the merged c2_1 column
# for c2_1 we can use the original c2 column from df
result.drop(['group', 'nonzero'], axis='columns', inplace=True)
# therefore we rename it to get the same naming schema
result.rename({'c2': 'c2_1'}, axis='columns', inplace=True)
结果如下所示:

Out[191]: 
  name  c1              c2              ka         tz         zz
0   ab   1            info  even more info  more info           
1   tz   0       more info                                      
2   ka   0  even more info                                      
3   cd   2            info                             more info
4   zz   0       more info                                      
对于此输入数据:

Out[166]: 
  name  c1              c2
0   ab   1            info
1   tz   0       more info
2   ka   0  even more info
3   cd   2            info
4   zz   0       more info 

# created by the following code:
import io
raw="""  name c1         c2
0   ab  1       info
1   tz  0  more_info
2   ka  0  even_more_info
3   cd  2       info
4   zz  0  more_info"""

df= pd.read_csv(io.StringIO(raw), sep='\s+', index_col=0)
df['c2']=df['c2'].str.replace('_', ' ')

更多信息的行索引是什么?空单元格应该用“”还是nan填充?
Out[166]: 
  name  c1              c2
0   ab   1            info
1   tz   0       more info
2   ka   0  even more info
3   cd   2            info
4   zz   0       more info 

# created by the following code:
import io
raw="""  name c1         c2
0   ab  1       info
1   tz  0  more_info
2   ka  0  even_more_info
3   cd  2       info
4   zz  0  more_info"""

df= pd.read_csv(io.StringIO(raw), sep='\s+', index_col=0)
df['c2']=df['c2'].str.replace('_', ' ')