Python np.where在非零标记字典的pd.DataFrame上_Python_Numpy_Pandas_Network Programming_Where

Python np.where在非零标记字典的pd.DataFrame上

python numpy pandas network-programming

Python np.where在非零标记字典的pd.DataFrame上,python,numpy,pandas,network-programming,where,Python,Numpy,Pandas,Network Programming,Where,我想做一个超快速的近邻东西。现在我正在使用networkx然后迭代所有G.nodes（）然后S=set（G.neighbories（node））然后S.remove（node），这非常有效，但我想更好地索引和利用数据结构。如果可能的话，我希望不再进行迭代我最终希望得到一个dictionary对象，其中key是root\u节点，value是一组节点邻居（不包括root\u节点）下面是我的图形和DF_adj邻接矩阵的样子：当我执行np时，其中（DF_adj==1）输出为两个数组，如下所示：

我想做一个超快速的

近邻东西。现在我正在使用networkx
然后迭代所有G.nodes（）
然后S=set（G.neighbories（node））
然后S.remove（node）
，这非常有效，但我想更好地索引和利用数据结构。如果可能的话，我希望不再进行迭代
我最终希望得到一个dictionary对象，其中key是root\u节点，value是一组节点邻居（不包括root\u节点）
下面是我的图形和DF_adj邻接矩阵的样子：

当我执行np时，其中（DF_adj==1）
输出为两个数组，如下所示：
(array([ 0,  0,  0,  0,  0,  0,  1,  1,  1,  1,  1,  1,  1,  1,  1,  2,  2,
        3,  3,  3,  4,  4,  4,  5,  5,  5,  6,  6,  6,  7,  7,  7,  8,  8,
        8,  9,  9, 10, 10]), array([ 0,  1,  3,  4,  5,  7,  0,  1,  2,  3,  4,  6,  8,  9, 10,  1,  2,
        0,  1,  3,  0,  1,  4,  0,  5,  6,  1,  5,  6,  0,  7,  8,  1,  7,
        8,  1,  9,  1, 10]))

检查了这个，但它并没有完全帮助我

如何在整个pd.DataFrame
上使用np.where
来获得这种类型的输出？
defaultdict(set,
            {'a': {'b', 'd', 'e', 'f', 'h'},
             'b': {'a', 'c', 'd', 'e', 'g', 'i', 'j', 'k'},
             'c': {'b'},
             'd': {'a', 'b'},
             'e': {'a', 'b'},
             'f': {'a', 'g'},
             'g': {'b', 'f'},
             'h': {'a', 'i'},
             'i': {'b', 'h'},
             'j': {'b'},
             'k': {'b'}})

你可以通过理解口述完成。如果df
是：
   a  b  c  d  e  f  g  h  i  j  k
a  1  1  0  1  1  1  0  1  0  0  0
b  1  1  1  1  1  0  1  0  1  1  1
c  0  1  1  0  0  0  0  0  0  0  0
d  1  1  0  1  0  0  0  0  0  0  0
e  1  1  0  0  1  0  0  0  0  0  0
f  1  0  0  0  0  1  1  0  0  0  0
g  0  1  0  0  0  1  1  0  0  0  0
h  1  0  0  0  0  0  0  1  1  0  0
i  0  1  0  0  0  0  0  1  1  0  0
j  0  1  0  0  0  0  0  0  0  1  0
k  0  1  0  0  0  0  0  0  0  0  1

然后，{i:{j代表df.index中的j，如果df.ix[i，j]和i！=j}代表df.index中的i
是：
{'j': {'b'},
 'e': {'a', 'b'},
 'g': {'b', 'f'},
 'k': {'b'},
 'a': {'b', 'd', 'e', 'f', 'h'},
 'c': {'b'},
 'i': {'b', 'h'},
 'f': {'a', 'g'},
 'b': {'a', 'c', 'd', 'e', 'g', 'i', 'j', 'k'},
 'd': {'a', 'b'},
 'h': {'a', 'i'}}

或快2倍：
s=df.index        
d=collections.defaultdict(set)
for (k,v) in zip(*where(df==1)): 
    if k!=v:
        d[s[k]].add(s[v])

你不能，np.where（）
不会返回字典。我知道，但可能有一种更有效的方法，我在想，然后逐个遍历所有列1个循环，每个循环最好3:4.75秒1个循环，每个循环最好3:2.41秒在我最快的一个（nx.graph）上，你的速度是我的两倍。谢谢，伙计，这真的很有帮助
s=df.index        
d=collections.defaultdict(set)
for (k,v) in zip(*where(df==1)): 
    if k!=v:
        d[s[k]].add(s[v])