Python数据帧统计两列中的出现次数_Python_Python 3.x_Pandas_Dataframe

Python数据帧统计两列中的出现次数

python python-3.x pandas dataframe

Python数据帧统计两列中的出现次数,python,python-3.x,pandas,dataframe,Python,Python 3.x,Pandas,Dataframe,两个数据帧列： data['IP'] data['domain'] 10.20.30.40 example.org 10.20.30.40 example.org 10.20.30.40 example.org 10.20.30.40 example.org 1.2.3.4 google.com 1.2.3.4 google.com 1.2.3.4

两个数据帧列：

data['IP']          data['domain']
10.20.30.40         example.org 
10.20.30.40         example.org
10.20.30.40         example.org
10.20.30.40         example.org
1.2.3.4             google.com
1.2.3.4             google.com
1.2.3.4             google.com
200.100.200.100     yahoo.com
200.100.200.100     yahoo.com
9.8.7.6             random.com

我想找到一种有效的方法来计算每个域映射到同一IP地址的次数。然后，如果出现次数超过两（2）次，则获取特定域（但仅限于唯一值），并将它们移动到另一个数据帧或列

因此，输出可能类似于：

[Occurences]    [To be processed]
4               example.org
4               google.com
4
4
3               
3
3

我尝试过不同的方法，比如图，然后计算节点的度数，用透视表来表示数量，但我希望有一种有效的方法，可以让我在if occurrent>2语句之后继续处理域

所有这些都应该用python熊猫数据帧实现

下面对“域”执行一个操作，然后调用“IP”地址，然后我们对其进行筛选，重置索引并重命名列，使其更有意义：

In [58]:
gp = df.groupby('domain')['IP'].value_counts()
df1 = gp[gp > 2].reset_index()
df1.rename(columns={'level_1': 'IP', 0:'Occurences'}, inplace=True)
df1

Out[58]:
        domain           IP  Occurences
0  example.org  10.20.30.40           4
1   google.com      1.2.3.4           3

如果您想从原始数据帧中获得更多列，但不想计算它们的值，该怎么办

gp=df.groupby（'domain'）['length']，['ratio']，['IP'].value\u counts（）

gives:AttributeError:'list'对象没有属性'value\u counts'，你知道正确的表达方式吗？我认为

gp=df.groupby（'domain'）.agg（pd.Series.value\u counts）

应该可以