Python 将每一行与列表字典进行比较,并将新变量附加到数据帧
我想检查pandas dataframe string列的每一行,并附加一个新列,如果在列表字典中找到文本列的任何元素,该列将返回1 例如:Python 将每一行与列表字典进行比较,并将新变量附加到数据帧,python,pandas,dictionary,Python,Pandas,Dictionary,我想检查pandas dataframe string列的每一行,并附加一个新列,如果在列表字典中找到文本列的任何元素,该列将返回1 例如: # Data df = pd.DataFrame({'id': [1, 2, 3], 'text': ['This sentence may contain reference.', 'Orange, blue cow','Does the cow operate any heavy m
# Data
df = pd.DataFrame({'id': [1, 2, 3],
'text': ['This sentence may contain reference.',
'Orange, blue cow','Does the cow operate any heavy machinery?']},
columns=['numbers', 'text'])
# Rule dictionary
rule_dict = {'rule1': ['Does', 'the'],
'rule2':['Sentence','contain'],
'rule3': ['any', 'reference', 'words']}
# List of variable names to be appended to df
rule_list = ['has_rule1','has_rule2','has_rule3']
# Current for loop
for Key in rule_dict:
for i in rule_list:
df[i] = df.text.apply(lambda x: (
1 if any(ele in x for ele in rule_dict[Key]) == 1 and (len(str(x)) >= 3)
else 0))
# Current output, looks to be returning a 1 if text is found in ANY of the lists
df = pd.DataFrame({'id': [1, 2, 3],
'text': ['This sentence may contain reference.',
'Orange, blue cow','Does the cow operate any heavy machinery?'],
'has_rule1': [1,1,1],
'has_rule2': [0,0,0],
'has_rule3': [1,1,1]},
columns=['id', 'text','has_rule1','has_rule2','has_rule3'])
# Anticipated output
df = pd.DataFrame({'id': [1, 2, 3],
'text': ['This sentence may contain reference.',
'Orange, blue cow','Does the cow operate any heavy machinery?'],
'has_rule1': [0,0,1],
'has_rule2': [1,0,0],
'has_rule3': [1,0,1]},
columns=['id', 'text','has_rule1','has_rule2','has_rule3'])
假设您已经解决了评论中提到的有关dict理解的问题,您不应该使用嵌套的
for
循环。相反,使用单个for
循环和zip
:
for (k,v), n in zip(rule_dict.items(), rule_list):
pat = rf'\b{"|".join(v)}\b'
df[n] = df.text.str.contains(pat).astype(int)
输出:
id text has_rule1 has_rule2 has_rule3
-- ---- ----------------------------------------- ----------- ----------- -----------
0 1 This sentence may contain reference. 0 1 1
1 2 Orange, blue cow 0 0 0
2 3 Does the cow operate any heavy machinery? 1 0 1
请注意,当您为输入规则dict执行
时,无法保证规则dict的排序。也就是说,您不知道键
将按“规则1”、“规则2”、“规则3”或“规则2”、“规则3”、“规则1”等顺序出现。@Quanghaang我不知道,谢谢。我以为它是基于索引的。如何确保它保持我指定的顺序(即“rule1”、“rule2”、“rule3”)?一种方法是将规则列表
也作为字典:{'rule1':'has_rule1',}
并通过键访问这两个规则。另一种方法是使用。但这也取决于规则的名称。