Python 创建奇怪的边矩阵

Python 创建奇怪的边矩阵,python,python-3.x,pandas,gephi,Python,Python 3.x,Pandas,Gephi,我有一个这样的数据帧: import pandas as pd df = pd.DataFrame(columns = ['id', 'tag']) df['id'] = (['1925782942580621034', '1925782942580621034', '1925782942580621034', '1925782942580621034', '1930659617975470678', '1930659617975470678', '193065961797

我有一个这样的数据帧:

import pandas as pd

df = pd.DataFrame(columns = ['id', 'tag'])

df['id'] = (['1925782942580621034', '1925782942580621034',
   '1925782942580621034', '1925782942580621034',
   '1930659617975470678', '1930659617975470678',
   '1930659617975470678', '1930659617975470678',
   '1930659617975470678', '1930659617975470678',
   '1930659617975470678', '1930659617975470678',
   '1971229370376634911', '1971229370376634911',
   '1971229370376634911', '1971229370376634911',
   '1971229370376634911', '1971229370376634911',
   '1971229370376634911', '1971229370376634911',
   '1971229370376634911'])

df['tag'] = (['nintendo', 'cosmetic', 'pen', 'office supplies', 'holding',
   'person', 'hand', 'text', 'design', 'pen', 'office supplies',
   'cosmetic', 'tool', 'office supplies', 'weapon', 'indoor',
   'everyday carry', 'pen', 'knife', 'electronics', 'case'])

df
我想在这方面努力,以获得如下成果:

df_wish = pd.DataFrame(columns = ['id_source', 'id_target', 'common_tags'])
其中:

df_with['id_source'] #is the "id" that we are taking care of
df_with['id_target'] #is the "id" that has at least one "tag" in common with "id_source"
df_with['common_tags'] #is the number of shared "tag" between "id_source" and "id_target"

你能帮我吗?非常感谢

如果您没有太多的标签/ID,您可以执行
交叉表
并广播:

s = pd.crosstab(df['id'], df['tag'])
idx = s.index

s = s.values
pd.DataFrame(np.sum(s[None,:] & s[:, None], axis=-1), 
             index=idx, columns=idx)
输出:

                       1925782942580621034    1930659617975470678    1971229370376634911
-------------------  ---------------------  ---------------------  ---------------------
1925782942580621034                      4                      3                      2
1930659617975470678                      3                      8                      2
1971229370376634911                      2                      2                      9

你有多少个标签/标识?也可以在我的回答后看到评论。比如15k个唯一标识和大约100k个唯一标签。我有32GB的RAM内存和i7CPU。谢谢你也可以只做
s=pd.crosstab(df['id'],df['tag']);s@s.T
@jdehesa这实际上是一个很好的解决方案。我刚才回答了:)关于完整的问题,我想完整的请求输出可以用类似于
df2=(s@s.t).unstack()的东西构成;df2.name=‘公共_标签’;df2.index.names=['id_source','id_target'];out=df2.reset_index()