Python 如何在DataFrame中构建嵌套条件逻辑?
我有一个数据帧,其中每一行都包含一个字符串列表。我已经编写了一个函数,它对每个字符串执行伯努利类型的试验,如果试验成功,每个单词都有可能被删除(这里是0.5)。见下文:Python 如何在DataFrame中构建嵌套条件逻辑?,python,pandas,bernoulli-probability,Python,Pandas,Bernoulli Probability,我有一个数据帧,其中每一行都包含一个字符串列表。我已经编写了一个函数,它对每个字符串执行伯努利类型的试验,如果试验成功,每个单词都有可能被删除(这里是0.5)。见下文: import numpy as np import pandas as pd def bernoulli_trial (sublist, prob = 0.5): # create mask of trial outcomes per each object in sublist mask = np.rand
import numpy as np
import pandas as pd
def bernoulli_trial (sublist, prob = 0.5):
# create mask of trial outcomes per each object in sublist
mask = np.random.binomial(n=1, p=prob, size=len(sublist))
# perform transformation on bernoulli successes
transformed_sublist = [token for delete, token in zip(mask, sublist) if not delete]
return transformed_sublist
当我传递数据帧的每一行时,这将按预期工作,如下所示:
df = pd.DataFrame(data={'store': [1,2,3], 'colours': [['red','blue','yellow','green','brown','pink'],
['black','white'],
['purple','orange','cyan','mauve']]})
df['colours'] = df['colours'].apply(bernoulli_trial)
Out:
0 [red, green]
1 [black]
2 [orange, cyan]
Name: colours, dtype: object
但是,我现在要做的不是在每个子列表和每个字符串上统一应用函数,而是应用以下条件:(a)给定子列表是否将传递给函数(是/否),以及(b)将应用该子列表中的哪些字符串(即,通过指定我只想测试某些颜色)
我想对于第(a)部分我有一个可行的解决方案——将贝努利函数包装在一个函数中,该函数检查是否满足给定的条件(即子列表的长度是否大于2个对象?)——这是可行的(见下文),但我不确定是否有更有效的方法(读得更像pythonic)来做到这一点
def sublist_condition_check(sublist):
if len(sublist) > 2:
sublist = bernoulli_trial(sublist)
else:
sublist = sublist
return sublist
请注意,任何不符合条件的子列表都应保持不变
df['colours'].apply(sublist_condition_check)
Out:
0 [red, brown]
1 [black, white] # this sublist had only two elements so remains unchanged
2 [mauve]
Name: colours, dtype: object
然而,我完全被困在如何在每个单词上应用条件逻辑上。比如说,我只想将试验应用于预先指定的颜色列表[‘红色’、‘淡紫色’、‘黑色’]——前提是它通过了子列表条件检查——我该怎么做
我希望实现的伪代码如下所示:
for sublist in df:
if len(sublist) > 2: # check if sublist contains more than two objects
for colour in sublist: # cycle through each colour within the sublist
if colour in ['red','mauve','black']:
colour = bernoulli_trial (colour) # only run bernoulli if colour in list
else:
colour = colour # if colour not in list, colour remains unchanged
else:
sublist = sublist # if sublist <= 2, sublist remains unchanged
df中的子列表的:
如果len(子列表)>2:#检查子列表是否包含两个以上的对象
对于子列表中的颜色:#循环子列表中的每种颜色
如果颜色为[‘红色’、‘淡紫色’、‘黑色’]:
颜色=伯努利试验(颜色)#仅当颜色在列表中时运行伯努利
其他:
颜色=颜色#如果颜色不在列表中,则颜色保持不变
其他:
sublist=sublist#如果sublist不确定回答我自己问题的礼仪,但我想我会提供我确定的工作解决方案的一些细节,以防任何人遇到类似情况
我扩展了初始贝努利函数,根据每个字符串是否满足包含条件,添加了一个if语句
# internal function - bernoulli trial for each string in sublist
def bernoulli_trial (sublist, prob = 0.50):
# set token criteria for performing bernoulli trial
token_criteria = ['red','black','purple'] # perform trial only on these strings
# create mask of trial outcomes per each word in sublist
mask = np.random.binomial(n=1, p=prob, size=len(turn))
# perform transformation (deletion) on bernoulli successes
transformed_turn = []
for token, delete in zip(turn, mask):
if token not in token_criteria:
transformed_turn.append(token)
else:
if delete == 0: # retain only those strings not marked for deletion
transformed_turn.append(token)
return transformed_sublist
结合问题中描述的子列表\u条件\u检查
功能,该功能现在可以按预期执行