Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/305.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 当特定单词是列中列表中的值时,如何将其添加到新列中_Python_Pandas_Dataframe_Contains - Fatal编程技术网

Python 当特定单词是列中列表中的值时,如何将其添加到新列中

Python 当特定单词是列中列表中的值时,如何将其添加到新列中,python,pandas,dataframe,contains,Python,Pandas,Dataframe,Contains,假设我的数据集 name what A apple[red] B cucumber[green] C dog C orange D banana D monkey E cat F carrot . . 我想创建并指定一个列表,如果该列包含该列表中包含的值,我想使指定的值成为一个新列 列表值 fruit = ['apple', 'banana', 'orange'] animal = ['dog', 'monkey', 'cat'] vegetab

假设我的数据集

name what
A    apple[red]
B    cucumber[green]
C    dog
C    orange
D    banana
D    monkey
E    cat
F    carrot
.
.
我想创建并指定一个列表,如果该列包含该列表中包含的值,我想使指定的值成为一个新列

列表值

fruit = ['apple', 'banana', 'orange']
animal = ['dog', 'monkey', 'cat']
vegetable = ['cucumber', 'carrot']
结果是我想要的

name what     class
A    apple    fruit
B    cucumber vegetable
C    dog      animal
C    orange   fruit
D    banana   fruit
D    monkey   animal
E    cat      animal
F    carrot   vegetable
列表值和列值不“匹配”,必须包含它们。

感谢阅读。

使用从列表中创建的字典,并使用平展值交换键:

fruit = ['apple', 'banana', 'orange']
animal = ['dog', 'monkey', 'cat']
vegetable = ['cucumber', 'carrot']

d = {'fruit':fruit, 'animal':animal,'vegetable':vegetable}
#http://stackoverflow.com/a/31674731/2901002
d1 = {k: oldk for oldk, oldv in d.items() for k in oldv}
字典理解的循环选择:

d1 = {}
for oldk, oldv in d.items():
    for k in oldv:
        d1[k] = oldk
然后:

df['class'] = df['what'].map(d1)
#if need values before first [
#df['class'] = df['what'].str.split('[').str[0].map(d1)
print (df)
  name      what      class
0    A     apple      fruit
1    B  cucumber  vegetable
2    C       dog     animal
3    C    orange      fruit
4    D    banana      fruit
5    D    monkey     animal
6    E       cat     animal
7    F    carrot  vegetable
编辑:对于“按子字符串匹配”,您可以按字典
d
循环,检查掩码的“按匹配”并设置新值:

d = {'fruit':fruit, 'animal':animal,'vegetable':vegetable}

for k, v in d.items():
    mask = df['what'].str.contains('|'.join(v))
    df.loc[mask, 'class'] = k
print (df)
  name             what      class
0    A       apple[red]      fruit
1    B  cucumber[green]  vegetable
2    C              dog     animal
3    C           orange      fruit
4    D           banana      fruit
5    D           monkey     animal
6    E              cat     animal
7    F           carrot  vegetable
如果可能,可以使用多个单词边界:

for k, v in d.items():
    pat = '|'.join(r"\b{}\b".format(x) for x in v)
    df.loc[ df['what'].str.contains(pat), 'class'] = k
print (df)
  name             what      class
0    A       apple[red]      fruit
1    B  cucumber[green]  vegetable
2    C              dog     animal
3    C           orange      fruit
4    D           banana      fruit
5    D           monkey     animal
6    E              cat     animal
7    F           carrot  vegetable

到目前为止,您尝试了什么?@Anwarvic df1=df['column anme'].str.contains(“|”。.join(listname)),它不能指定多个列表,也不能说出我指定的单词。我键入了相同的答案,但我无法击败回答熊猫问题的AI。@ybin-当然,它用于通过口述
d
进行迭代,
oldk
oldv
表示原始键和原始值。jezrael,我刚做了一个更改,但是what和list值不匹配,还有其他值,比如上面的apple[red],那么该列表的值是否可以是“包含”而不是“匹配”的条件?我所有的实际数据集都是由多个单词组成的。很抱歉给您添麻烦,@jezrael-oh我用
list=[f“(?I){re.escape(k)}”为列表中的k解决了这个问题]
非常感谢!