Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/string/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 基于字符串创建新列_Python_String_Pandas_Numpy_Substring - Fatal编程技术网

Python 基于字符串创建新列

Python 基于字符串创建新列,python,string,pandas,numpy,substring,Python,String,Pandas,Numpy,Substring,我有一个数据框,想根据column1_sport中的字符串创建一个列 import pandas as pd df = pd.read_csv('C:/Users/test/dataframe.csv', encoding = 'iso-8859-1') 数据包括: column1_sport baseball basketball tennis boxing golf 我想查找某些字符串(“ball”或“box”),并根据该列是否包含该单词创建一个新列。如果数据帧不包含该单词,请添加“

我有一个数据框,想根据column1_sport中的字符串创建一个列

import pandas as pd

df = pd.read_csv('C:/Users/test/dataframe.csv', encoding  = 'iso-8859-1')
数据包括:

column1_sport
baseball
basketball
tennis
boxing
golf
我想查找某些字符串(“ball”或“box”),并根据该列是否包含该单词创建一个新列。如果数据帧不包含该单词,请添加“其他”。见下文

column1_sport    column2_type
baseball         ball
basketball       ball
tennis           other 
boxing           box              
golf             other

可以使用嵌套的np.where

cond1 = df.column1_sport.str.contains('ball')
cond2 = df.column1_sport.str.contains('box')
df['column2_type'] = np.where(cond1, 'ball', np.where(cond2, 'box', 'other') )

    column1_sport   column2_type
0   baseball        ball
1   basketball      ball
2   tennis          other
3   boxing          box
4   golf            other
万一你有更复杂的情况

def func(a):
    if "ball" in a.lower():
        return "ball"
    elif "box" in a.lower():
        return "box"
    else:
        return "Other"

df["column2_type"] = df.column1_sport.apply(lambda x: func(x))

对于多种情况,我建议。例如:

values = ['ball', 'box']
conditions = list(map(df['column1_sport'].str.contains, values))

df['column2_type'] = np.select(conditions, values, 'other')

print(df)

#   column1_sport column2_type
# 0      baseball         ball
# 1    basketball         ball
# 2        tennis        other
# 3        boxing          box
# 4          golf        other

@nia4life,使用jpp的np。选择更多条件
values = ['ball', 'box']
conditions = list(map(df['column1_sport'].str.contains, values))

df['column2_type'] = np.select(conditions, values, 'other')

print(df)

#   column1_sport column2_type
# 0      baseball         ball
# 1    basketball         ball
# 2        tennis        other
# 3        boxing          box
# 4          golf        other