Python 熊猫和元组检查_Python_Pandas_Tuples

Python 熊猫和元组检查

python pandas

Python 熊猫和元组检查,python,pandas,tuples,Python,Pandas,Tuples,其中num_legs，num_wings和num_sample_seen为列现在，我有一个类似（'num\u wings'，'num\u legs'）的元组，想检查是否有df列的值？如果是，则返回true，否则返回false （'num_wings'、'num_legs'）->这将返回true （'abc'，'num_legs'）->false您只需检查元组的所有元素是否都包含在df.columns： df=。。。 def检查（tup）：返回tup中e的全部（（df.columns中的e）

其中

num_legs

，

num_wings

和

num_sample_seen

为列

现在，我有一个类似

（'num\u wings'，'num\u legs'）

的元组，想检查是否有df列的值？如果是，则返回true，否则返回false

（'num_wings'、'num_legs'）

->这将返回true

（'abc'，'num_legs'）

->false

您只需检查元组的

所有元素是否都包含在df.columns
：
df=。。。
def检查（tup）：
返回tup中e的全部（（df.columns中的e）


性能比较
@user3483203使用get\u indexer
，提出了一个非常简洁的替代解决方案，因此我对两种解决方案进行了timeit
比较
随机导入
导入字符串
作为pd进口熊猫
def rnd_街（l）：
字母=字符串。ascii_小写
返回“”。在范围（l）内的i中加入（随机选择（字母））
唯一字符串=集合（范围内（20000）的rnd字符串（3））
cols=pd.Index（唯一字符串）
tup=tuple（rnd_str（3）表示范围内的（5000））
%timeit all（cols.get\u indexer（tup）>-1）
#每个回路714µs±12.6µs（7次运行的平均值±标准偏差，每个1000个回路）
%timeit all（e英寸cols代表e英寸tup）
#每个回路639纳秒±0.988纳秒（7次运行的平均值±标准偏差，每个回路1000000纳秒）
###
tup=tuple（范围（10000）内的rnd_str（3））
%timeit all（cols.get\u indexer（tup）>-1）
#每个回路1.29 ms±29.5µs（7次运行的平均值±标准偏差，每个1000个回路）
%timeit all（e英寸cols代表e英寸tup）
#每个回路1.23µs±20.3 ns（7次运行的平均值±标准偏差，每个1000000个回路）

事实证明，本文提出的解决方案要快得多。这种方法的主要优点是，只要发现不在df.columns
中的元组元素，all（）
函数就会提前退出。
您只需检查元组的所有元素是否都包含在df.columns
中：
df=。。。
def检查（tup）：
返回tup中e的全部（（df.columns中的e）


性能比较
@user3483203使用get\u indexer
，提出了一个非常简洁的替代解决方案，因此我对两种解决方案进行了timeit
比较
随机导入
导入字符串
作为pd进口熊猫
def rnd_街（l）：
字母=字符串。ascii_小写
返回“”。在范围（l）内的i中加入（随机选择（字母））
唯一字符串=集合（范围内（20000）的rnd字符串（3））
cols=pd.Index（唯一字符串）
tup=tuple（rnd_str（3）表示范围内的（5000））
%timeit all（cols.get\u indexer（tup）>-1）
#每个回路714µs±12.6µs（7次运行的平均值±标准偏差，每个1000个回路）
%timeit all（e英寸cols代表e英寸tup）
#每个回路639纳秒±0.988纳秒（7次运行的平均值±标准偏差，每个回路1000000纳秒）
###
tup=tuple（范围（10000）内的rnd_str（3））
%timeit all（cols.get\u indexer（tup）>-1）
#每个回路1.29 ms±29.5µs（7次运行的平均值±标准偏差，每个1000个回路）
%timeit all（e英寸cols代表e英寸tup）
#每个回路1.23µs±20.3 ns（7次运行的平均值±标准偏差，每个1000000个回路）

事实证明，本文提出的解决方案要快得多。这种方法的主要优点是，只要发现不在df.columns
中的元组元素，all（）
函数就会尽早退出。
您可以在此处使用获取索引器

性能
您可以在此处使用get\u indexer


性能
Y不能u迭代元组中的每个值&如果它们存在于数据帧中，则单独检查它们
cols = pd.Index(np.arange(10_000))
tup = tuple(np.arange(10_001))

%timeit all(cols.get_indexer(tup)>-1)
3.86 ms ± 87.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit all(e in cols for e in tup)
5.96 ms ± 69.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Y不能u迭代元组中的每个值&如果它们存在于数据帧中，则单独检查它们
cols = pd.Index(np.arange(10_000))
tup = tuple(np.arange(10_001))

%timeit all(cols.get_indexer(tup)>-1)
3.86 ms ± 87.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit all(e in cols for e in tup)
5.96 ms ± 69.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

cols = pd.Index(np.arange(10_000))
tup = tuple(np.arange(10_001))

%timeit all(cols.get_indexer(tup)>-1)
3.86 ms ± 87.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit all(e in cols for e in tup)
5.96 ms ± 69.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

>>> def check_presence(tuple):
...     for x in tuple:
...             if x not in df.columns:
...                     return False
...             return True

check_presence(('num_wings', 'num_legs')) # returns True
check_presence(('abc', 'num_legs')) # returns False