Python 使用参数统计给定集合中行值的出现次数_Python_Pandas_Count

Python 使用参数统计给定集合中行值的出现次数

python pandas

Python 使用参数统计给定集合中行值的出现次数,python,pandas,count,Python,Pandas,Count,我有一个类似于 a b c d e 0 36 38 27 12 35 1 45 33 8 41 18 4 32 14 4 14 9 5 43 1 31 11 3 6 16 8 3 17 39 ... 对于每一行，我要计算给定集合中出现的值我提出了以下代码（Python3），它似乎可以工作，但我希望提高效率，因为我真正的数据帧要复杂和大得多： import pandas as pd import

我有一个类似于

     a   b   c   d   e
0   36  38  27  12  35
1   45  33   8  41  18
4   32  14   4  14   9
5   43   1  31  11   3
6   16   8   3  17  39
...

对于每一行，我要计算给定集合中出现的值

我提出了以下代码（Python3），它似乎可以工作，但我希望提高效率，因为我真正的数据帧要复杂和大得多：

import pandas as pd
import numpy as np

def column():
    return [np.random.randint(0,49) for _ in range(20)]

df = pd.DataFrame({'a': column(),'b': column(),'c': column(),'d': column(),'e': column()})

given_set = {3,8,11,18,22,24,35,36,42,47}

def count_occurrences(row):
    return sum(col in given_set for col in (row.a,row.b,row.c,row.d,row.e))

df['count'] = df.apply(count_occurrences, axis=1)

print(df)

有没有一种方法可以获得与向量算子相同的结果？（代替Python函数）

提前感谢。

IIUC您可以使用以下方法：

数据：

解决方案：

In [44]: df['new'] = df.isin(given_set).sum(1)

In [45]: df
Out[45]:
    a   b   c   d   e  new
0  36  38  27  12  35    2
1  45  33   8  41  18    2
4  32  14   4  14   9    0
5  43   1  31  11   3    2
6  16   8   3  17  39    2

说明：

In [49]: df.isin(given_set)
Out[49]:
       a      b      c      d      e
0   True  False  False  False   True
1  False  False   True  False   True
4  False  False  False  False  False
5  False  False  False   True   True
6  False   True   True  False  False

In [50]: df.isin(given_set).sum(1)
Out[50]:
0    2
1    2
4    0
5    2
6    2
dtype: int64

更新：如果您想检查是否存在而不是计数，您可以这样做（感谢）：

或者

.any（axis=1）

（带或不带

.astype（int）

），如果OP只关心存在，而不关心计数。@DSM OP关心计数，在他/她身上说question@NoticeMeSenpai：虽然我的文盲有时会给我带来麻烦，但这里的情况并非如此。：-）有时人们计数是因为他们认为自己需要它，但却意识到计数对于他们的目的来说是过分的；知道布尔DF可以应用完整的内置集合是很有用的。不管你喜欢还是不喜欢。@DSM，这是一个很好的理由，谢谢！：-）我已经更新了答案

In [49]: df.isin(given_set)
Out[49]:
       a      b      c      d      e
0   True  False  False  False   True
1  False  False   True  False   True
4  False  False  False  False  False
5  False  False  False   True   True
6  False   True   True  False  False

In [50]: df.isin(given_set).sum(1)
Out[50]:
0    2
1    2
4    0
5    2
6    2
dtype: int64

In [6]: df.isin(given_set).any(1)
Out[6]:
0     True
1     True
4    False
5     True
6     True
dtype: bool

In [7]: df.isin(given_set).any(1).astype(np.uint8)
Out[7]:
0    1
1    1
4    0
5    1
6    1
dtype: uint8