要使用.isin()测试的列中的可选值(python)
考虑两个数据帧:要使用.isin()测试的列中的可选值(python),python,pandas,dataframe,Python,Pandas,Dataframe,考虑两个数据帧: df1 = pd.DataFrame(['apple and banana are sweet fruits','how fresh is the banana','cherry from japan'],columns=['fruits_names']) df2 = pd.DataFrame([['apple','red'],['banana','yellow'],['cherry','black']],columns=['fruits','colors']) 然后代码:
df1 = pd.DataFrame(['apple and banana are sweet fruits','how fresh is the banana','cherry from japan'],columns=['fruits_names'])
df2 = pd.DataFrame([['apple','red'],['banana','yellow'],['cherry','black']],columns=['fruits','colors'])
然后代码:
colors =[]
for f in df1.fruits_names.str.split().apply(set): #convert content in a set with splitted words
color = [df2[df2['fruits'].isin(f)]['colors']] #matching fruits in a list
colors.append(color)
我可以很容易地在df1中插入颜色
df1['color'] = colors
output:
fruits_names color
0 apple and banana are sweet fruits [[red, yellow]]
1 how fresh is the banana [[yellow]]
2 cherry from japan [[black]]
问题是,列“fruits”是否有其他值,如:
df2 = pd.DataFrame([[['green apple|opal apple'],'red'],[['banana|cavendish banana'],'yellow'],['cherry','black']],columns=['fruits','colors'])
如何保持此代码正常工作
我最后尝试的是创建一个新列,其中包含水果的分隔值:
df2['Types'] = cf['fruits'].str.split('|')
和。在此处应用(元组):
但它不匹配 我认为您需要:
print(df1)
fruits_names
0 green apple and banana are sweet fruits
1 how fresh is the banana
2 cherry and opal apple from japan
使用split
和df.explode()
输出:
fruits colors
0 green apple red
0 opal apple red
1 banana yellow
1 cavendish banana yellow
2 cherry black
将其转换为dict
d = {i:j for i,j in zip(df2["fruits"].values, df2["colors"].values)}
基于条件创建列
df1["colors"] = [[v for k,v in d.items() if k in x] for x in df1["fruits_names"]]
print(df1)
最终输出:
fruits_names colors
0 green apple and banana are sweet fruits [red, yellow]
1 how fresh is the banana [yellow]
2 cherry and opal apple from japan [red, black]
你好试试这个。您可以使用数据结构中的理解来进一步定制它
df1["colors"] = [[v for k,v in d.items() if k in x] for x in df1["fruits_names"]]
print(df1)
fruits_names colors
0 green apple and banana are sweet fruits [red, yellow]
1 how fresh is the banana [yellow]
2 cherry and opal apple from japan [red, black]
import pandas as pd
import numpy as np
df1 = pd.DataFrame(['green apple and banana are sweet fruits','how fresh is the banana','cherry from japan'],columns=['fruits_names'])
df2 = pd.DataFrame([['green apple|opal apple','red'],['banana|cavendish banana','yellow'],['cherry','black']],columns=['fruits','colors'])
df2['sep_colors'] = np.where(df2['fruits'], (df2['fruits'].str.split(pat='|')), df2['fruits'])
dic = dict(zip(df2['colors'].tolist(),df2['sep_colors'].tolist()))
final = []
for row in range(len(df1.fruits_names)):
list1 = []
for key, value in dic.items():
for item in value:
if item in df1.iloc[row][0]:
list1.append(key)
final.append(list1)
df1['colors'] = final