Python 基于其他列值标记行_Python_String_Pandas

Python 基于其他列值标记行

python string pandas

Python 基于其他列值标记行,python,string,pandas,Python,String,Pandas,我有一个数据框： street_name eircode Malborough Road BLT12 123 Fake Road NaN My Street NaN 我想根据以下条件创建另一个名为独特的专栏：如果它有eircode，则在unique列中返回“yes”，然后如果没有eircode，请检查街道名称中的第一个字符串：如果第一个字符串是数字，请在unique列中返回“yes” 如果不是，请在唯一列中返回“否” 我提出了以下解决方

我有一个数据框：

street_name        eircode
Malborough Road    BLT12
123 Fake Road      NaN
My Street          NaN

我想根据以下条件创建另一个名为独特的专栏：

如果它有eircode，则在unique列中返回“yes”，然后

如果没有eircode，请检查街道名称中的第一个字符串：

如果第一个字符串是数字，请在unique列中返回“yes”

如果不是，请在唯一列中返回“否”

我提出了以下解决方案：

我将street\u name和eircode两列的数据类型都更改为string

使用lambda函数获取第一个字符串

定义了要应用于数据框的标记函数

#更改数据类型 df['eircode']=df['eircode'].astype（'str'） df['street\u name']=df['street\u name'].astype（'str'）

#从street_name列获取第一个字符串 df['first_str']=df['street_name'].apply（lambda x:x.split（）[0]）

问题是我必须更改数据类型，然后必须创建单独的列。是否有更优雅的方法或更简洁的方法来实现相同的结果？
您可以使用
运算符提供这些单独的条件，然后将生成的布尔数组映射到
yes
和
no
。第一个条件只是查看
eircode
是否为空，第二个条件使用正则表达式检查
street\u name
是否以数字开头：

df['unique'] = ((~df.eircode.isnull()) | (df.street_name.str.match('^[0-9]'))).map({True:'yes',False:'no'}) >>> df street_name eircode unique 0 Malborough Road BLT12 yes 1 123 Fake Road NaN yes 2 My Street NaN no

对于熊猫，最好使用列式计算<带有自定义函数的code>apply表示一个低效的Python级行循环

df = pd.DataFrame({'street_name': ['Malborough Road', '123 Fake Road', 'My Street'], 'eircode': ['BLT12', None, None]}) cond1 = df['eircode'].isnull() cond2 = ~df['street_name'].str.split(n=1).str[0].str.isdigit() df['unique'] = np.where(cond1 & cond2, 'no', 'yes') print(df) eircode street_name unique 0 BLT12 Malborough Road yes 1 None 123 Fake Road yes 2 None My Street no

cond2的代码非常优秀，因为我尝试着做同样的事情（选择拆分后的第一个单词），但不知道正确的方法。np.where的大量使用，这是我以前从未使用过的。使用此解决方案时我遇到的一个问题是，当提取的数字是浮点型时，它将返回一个错误。接受此答案，因为使用正则表达式将计算任何数字，并且即使该数字是浮点型，也不会抛出错误。
df = pd.DataFrame({'street_name': ['Malborough Road', '123 Fake Road', 'My Street'], 'eircode': ['BLT12', None, None]}) cond1 = df['eircode'].isnull() cond2 = ~df['street_name'].str.split(n=1).str[0].str.isdigit() df['unique'] = np.where(cond1 & cond2, 'no', 'yes') print(df) eircode street_name unique 0 BLT12 Malborough Road yes 1 None 123 Fake Road yes 2 None My Street no