Python 列名称列表，其中value>；选定行的X_Python_Python 3.x_Pandas

Python 列名称列表，其中value>；选定行的X

python python-3.x pandas

Python 列名称列表，其中value>；选定行的X,python,python-3.x,pandas,Python,Python 3.x,Pandas,我有一个11行17604列的数据帧。当我更改集群时，行数可能会有所不同 B42D2033/26 G02B27/2214 G02F1/133753 G02F1/133707 G02F1/1341 G02F1/1339 G02F1/133371 G02B6/005 C08G73/12 G02F1/1303 ... G06F17/30035 G06F21/629 B65B3/26 E04D13/00 G06F17/30952 G07C9/

我有一个11行17604列的数据帧。当我更改集群时，行数可能会有所不同

    B42D2033/26 G02B27/2214 G02F1/133753    G02F1/133707    G02F1/1341  G02F1/1339  G02F1/133371    G02B6/005   C08G73/12   G02F1/1303  ... G06F17/30035    G06F21/629  B65B3/26    E04D13/00   G06F17/30952    G07C9/00912 F02C9/28    G06F17/28   G06F17/30964    G06F21/82
Cluster                                                                                 
C1  0.000000    1.000000    0.000000    0.000000    0.000000    1.000000    0.000000    0.000000    0.000000    0.000000    ... 0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000
C10 0.000000    3.250000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    ... 0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000
C11 0.020619    1.149485    0.262887    0.829897    0.551546    1.030928    0.082474    1.175258    0.005155    0.216495    ... 0.005155    0.010309    0.005155    0.005155    0.005155    0.005155    0.005155    0.005155    0.005155    0.005155
C2  0.000000    1.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    ... 0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000
C3  0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    ... 0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000
C4  0.055556    13.500000   8.333333    24.555556   13.166667   26.666667   3.277778    4.222222    0.000000    2.388889    ... 0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000
C5  0.000000    0.750000    0.000000    0.000000    0.000000    0.500000    0.000000    0.250000    0.000000    0.000000    ... 0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000
C6  0.032258    3.451613    0.000000    0.000000    0.000000    0.387097    0.000000    0.064516    0.000000    0.000000    ... 0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000
C7  0.000000    0.000000    0.250000    0.000000    0.000000    0.250000    0.000000    0.000000    0.000000    1.500000    ... 0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000
C8  0.000000    0.076923    0.153846    0.346154    0.000000    0.884615    0.461538    0.192308    0.038462    0.076923    ... 0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000
C9  0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    ... 0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000

我想根据列中的值为每个集群生成一个字典或系列。例如，值

所在的所有列=0

可能以字典形式显示，如下所示：

{'C1', ['G02B27/2214', 'G02F1/1339']}

如果值等于“某个值”或某个值范围，如何为每个集群行生成一个序列

我确实看过，但该解决方案并不适用于一行中的所有列

编辑：我意识到我可以转换

df

并执行如下操作：

df_clusters.T[df_clusters.T['C1']>0]

对于“C1”大于0的每一行，返回一个

df

。我想我可以删除其他集群列，但我认为这不是最好的解决方案。

想法是创建每个条件的值索引，然后创建新的数据帧并在列表中获取每个

索引的列表，然后转换为dict
：
i, c = np.where(df > 0)
d = pd.DataFrame({'a':df.index[i], 'b':df.columns[i]}).groupby('a')['b'].apply(list).to_dict()
print (d)

s = df.stack()
d = s[s > 0].reset_index().groupby('Cluster')['level_1'].apply(list).to_dict()

另一种解决方案是使用或进行重塑，按或过滤，最后使用dict
创建list
s：
i, c = np.where(df > 0)
d = pd.DataFrame({'a':df.index[i], 'b':df.columns[i]}).groupby('a')['b'].apply(list).to_dict()
print (d)

s = df.stack()
d = s[s > 0].reset_index().groupby('Cluster')['level_1'].apply(list).to_dict()


尝试：
前两个解决方案效果很好，但最后一个解决方案返回了行.query（'value>0'）
@Britt的typeerror（typeerror:'>'在'str'和'int'实例之间不受支持）

@Britt-在pandas 0.24.2中测试过，但是可以在

melt

函数中明确设置新列，编辑后的答案。解决方案3的更新代码现在可以工作了。谢谢@JezraelYour的标题是

>0

，你的问题是

=0

。哪种情况？确切的情况无关紧要，可以=0，>=1等我正在使用pandas版本0.24.2，这行代码导致错误：

indexer:（“布尔索引与沿维度0的索引数组不匹配；维度为17607，但相应的布尔维度为17604”，“发生在索引C1上”）

很遗憾，我没有足够的数据帧信息。所有的值都是浮动的吗？