Python 选择具有非零值的列，这些列在无循环的熊猫中共享索引_Python_Pandas_Dataframe_Join_Pandas Groupby

Python 选择具有非零值的列，这些列在无循环的熊猫中共享索引

python pandas dataframe join

Python 选择具有非零值的列，这些列在无循环的熊猫中共享索引,python,pandas,dataframe,join,pandas-groupby,Python,Pandas,Dataframe,Join,Pandas Groupby,是否有人知道如何将数据帧拆分为一个索引列不受干扰的数据帧，即转换此数据帧 col1 col2 col3 col4 0 1 0 0 0 1 2 0 0 0 2 3 1 0 0 3 4 2 1 0 4 0 3 2 0 5 0 4 3 0 6 0 5 4 0 7

是否有人知道如何将数据帧拆分为一个索引列不受干扰的数据帧，即转换此数据帧

    col1  col2  col3  col4
0     1     0     0     0
1     2     0     0     0
2     3     1     0     0
3     4     2     1     0
4     0     3     2     0
5     0     4     3     0
6     0     5     4     0
7     0     0     5     1
8     0     0     6     2
9     0     0     0     3

对下列事项：

      new_col1  new_col2  new_col3  new_col4  new_col5  new_col6
0         1         0         0         0         0         0
1         2         0         0         0         0         0
2         0         4         0         0         0         0
3         0         0         7         0         0         0
4         0         0         0         5         0         0
5         0         0         0         7         0         0
6         0         0         0         9         0         0
7         0         0         0         0         6         0
8         0         0         0         0         8         0
9         0         0         0         0         0         3

在无任何

for

循环的熊猫中。我们的想法是将共享索引的所有列的值合并到一个新列中，不留下共享索引的列。

Edit Rewrited: 输出：

   new_col1  new_col2  new_col3  new_col4  new_col5  new_col6
0         1         0         0         0         0         0
1         2         0         0         0         0         0
2         0         4         0         0         0         0
3         0         0         7         0         0         0
4         0         0         0         5         0         0
5         0         0         0         7         0         0
6         0         0         0         9         0         0
7         0         0         0         0         6         0
8         0         0         0         0         8         0
9         0         0         0         0         0         3

   new_col1  new_col2  new_col3  new_col4  new_col5  new_col6
0         1         0         0         0         0         0
1         2         0         0         0         0         0
2         0         4         0         0         0         0
3         0         0         7         0         0         0
4         0         0         0         5         0         0
5         0         0         0         7         0         0
6         0         0         0         9         0         0
7         0         0         0         0         6         0
8         0         0         0         0         8         0
9         0         0         0         0         0         3

有一种方法可以做到这一点：

s = df.groupby(df.ne(0)\
     .apply(lambda x: ','.join(df.columns[x].tolist()), axis=1))\
     .cumcount().eq(0).cumsum()

df_out = df.sum(1).to_frame().set_index(s, append=True)[0]\
  .unstack(fill_value=0).add_prefix('new_col')

df_out

输出：

   new_col1  new_col2  new_col3  new_col4  new_col5  new_col6
0         1         0         0         0         0         0
1         2         0         0         0         0         0
2         0         4         0         0         0         0
3         0         0         7         0         0         0
4         0         0         0         5         0         0
5         0         0         0         7         0         0
6         0         0         0         9         0         0
7         0         0         0         0         6         0
8         0         0         0         0         8         0
9         0         0         0         0         0         3

   new_col1  new_col2  new_col3  new_col4  new_col5  new_col6
0         1         0         0         0         0         0
1         2         0         0         0         0         0
2         0         4         0         0         0         0
3         0         0         7         0         0         0
4         0         0         0         5         0         0
5         0         0         0         7         0         0
6         0         0         0         9         0         0
7         0         0         0         0         6         0
8         0         0         0         0         8         0
9         0         0         0         0         0         3

Psuedo逻辑：

查找每行非零值的所有列的列表。根据此列表对行进行分组，并使用cumcount和cumsum创建递增值。使用“追加”和“取消堆叠”将此递增值添加到索引中，以创建列。

哪些列的总和不一致。例如：为什么

以

df.loc[3'，new_col3']

结尾，而

以

df.loc[4'，new_col4']

结尾？因为7=4+2+1和5=3+2——这一行中所有元素的总和。是，但是为什么一个进入一个新的列而另一个没有呢？最终矩阵中的每一列代表每个非重叠区域。这就是为什么有些列包含不止一个值的原因。这些细节对于良好的沟通非常重要。我正在努力寻找这种关系。另一方面，你理解这种关系，但却没有帮助我们“得到它”。这是你作为提问者的责任。这非常有帮助！谢谢我将更好地学习群比。