Python 使用pandas统计关系表中的共同关注者_Python_Pandas_Count

Python 使用pandas统计关系表中的共同关注者

python pandas

Python 使用pandas统计关系表中的共同关注者,python,pandas,count,Python,Pandas,Count,我有一个熊猫数据框，如下所示： from_user to_user 0 123 456 1 894 135 2 179 890 3 456 123 其中，每一行包含两个ID，它们反映了来自用户的是否跟在到用户的之后。如何使用pandas计算数据帧中的共同关注者总数在上面的示例中，答案应该是1（用户123和456）。一种方法是使用多索引集操作： In [11]: i1 = df.set_i

我有一个熊猫数据框，如下所示：

   from_user  to_user
0        123      456
1        894      135
2        179      890
3        456      123

其中，每一行包含两个ID，它们反映了来自用户的

是否跟在到用户的之后。如何使用pandas计算数据帧中的共同关注者总数
在上面的示例中，答案应该是1（用户123和456）。
一种方法是使用多索引集操作：
In [11]: i1 = df.set_index(["from_user", "to_user"]).index

In [12]: i2 = df.set_index(["to_user", "from_user"]).index

In [13]: (i1 & i2).levels[0]
Out[13]: Int64Index([123, 456], dtype='int64')

要获得计数，必须将此索引的长度除以2：
In [14]: len(i1 & i2) // 2
Out[14]: 1

另一种方法是concat
值并将其排序为字符串。
然后计算这些值出现的次数：
# concat the values as string type
df['concat'] = df.from_user.astype(str) + df.to_user.astype(str)

# sort the string values of the concatenation
df['concat'] = df.concat.apply(lambda x: ''.join(sorted(x)))

# count the occurences of each and substract 1
count = (df.groupby('concat').size() -1).sum()

Out[64]: 1

下面是另一种更为简陋的方法：
df.loc[df.to_user.isin(df.from_user)]
  .assign(hacky=df.from_user * df.to_user)
  .drop_duplicates(subset='hacky', keep='first')
  .drop('hacky', 1)

   from_user  to_user
0        123      456

整个乘法黑客的存在是为了确保我们不返回123-->456
和456-->123
，因为这两种方法都是有效的，因为我们向loc
提供了条件。这听起来像是一个网络问题，我不知道pandas
是否是这里的最佳方法。您可能是对的。我正在使用pandas作为学习练习来解决一些传统的SQL问题。哇。这太棒了，完全奏效了。我会在2分钟内把这个标记为答案！