Python 当特定单词是列中列表中的值时,如何将其添加到新列中
假设我的数据集Python 当特定单词是列中列表中的值时,如何将其添加到新列中,python,pandas,dataframe,contains,Python,Pandas,Dataframe,Contains,假设我的数据集 name what A apple[red] B cucumber[green] C dog C orange D banana D monkey E cat F carrot . . 我想创建并指定一个列表,如果该列包含该列表中包含的值,我想使指定的值成为一个新列 列表值 fruit = ['apple', 'banana', 'orange'] animal = ['dog', 'monkey', 'cat'] vegetab
name what
A apple[red]
B cucumber[green]
C dog
C orange
D banana
D monkey
E cat
F carrot
.
.
我想创建并指定一个列表,如果该列包含该列表中包含的值,我想使指定的值成为一个新列
列表值
fruit = ['apple', 'banana', 'orange']
animal = ['dog', 'monkey', 'cat']
vegetable = ['cucumber', 'carrot']
结果是我想要的
name what class
A apple fruit
B cucumber vegetable
C dog animal
C orange fruit
D banana fruit
D monkey animal
E cat animal
F carrot vegetable
列表值和列值不“匹配”,必须包含它们。
感谢阅读。使用从列表中创建的字典,并使用平展值交换键:
fruit = ['apple', 'banana', 'orange']
animal = ['dog', 'monkey', 'cat']
vegetable = ['cucumber', 'carrot']
d = {'fruit':fruit, 'animal':animal,'vegetable':vegetable}
#http://stackoverflow.com/a/31674731/2901002
d1 = {k: oldk for oldk, oldv in d.items() for k in oldv}
字典理解的循环选择:
d1 = {}
for oldk, oldv in d.items():
for k in oldv:
d1[k] = oldk
然后:
df['class'] = df['what'].map(d1)
#if need values before first [
#df['class'] = df['what'].str.split('[').str[0].map(d1)
print (df)
name what class
0 A apple fruit
1 B cucumber vegetable
2 C dog animal
3 C orange fruit
4 D banana fruit
5 D monkey animal
6 E cat animal
7 F carrot vegetable
编辑:对于“按子字符串匹配”,您可以按字典d
循环,检查掩码的“按匹配”并设置新值:
d = {'fruit':fruit, 'animal':animal,'vegetable':vegetable}
for k, v in d.items():
mask = df['what'].str.contains('|'.join(v))
df.loc[mask, 'class'] = k
print (df)
name what class
0 A apple[red] fruit
1 B cucumber[green] vegetable
2 C dog animal
3 C orange fruit
4 D banana fruit
5 D monkey animal
6 E cat animal
7 F carrot vegetable
如果可能,可以使用多个单词边界:
for k, v in d.items():
pat = '|'.join(r"\b{}\b".format(x) for x in v)
df.loc[ df['what'].str.contains(pat), 'class'] = k
print (df)
name what class
0 A apple[red] fruit
1 B cucumber[green] vegetable
2 C dog animal
3 C orange fruit
4 D banana fruit
5 D monkey animal
6 E cat animal
7 F carrot vegetable
到目前为止,您尝试了什么?@Anwarvic df1=df['column anme'].str.contains(“|”。.join(listname)),它不能指定多个列表,也不能说出我指定的单词。我键入了相同的答案,但我无法击败回答熊猫问题的AI。@ybin-当然,它用于通过口述
d
进行迭代,oldk
和oldv
表示原始键和原始值。jezrael,我刚做了一个更改,但是what和list值不匹配,还有其他值,比如上面的apple[red],那么该列表的值是否可以是“包含”而不是“匹配”的条件?我所有的实际数据集都是由多个单词组成的。很抱歉给您添麻烦,@jezrael-oh我用list=[f“(?I){re.escape(k)}”为列表中的k解决了这个问题]
非常感谢!