Python 如何将单词从一列中的括号中提取到另一列中的新列中？_Python_Pandas_Dataframe

Python 如何将单词从一列中的括号中提取到另一列中的新列中？

python pandas dataframe

Python 如何将单词从一列中的括号中提取到另一列中的新列中？,python,pandas,dataframe,Python,Pandas,Dataframe,我有一个如下所示的数据帧： Player Name Headline 1 LeBron James LeBron James suggests 5-10 games before playoff 2 LeBron James LeBron James (groin) probable for Thursday 3 LeBron James LeBron James overcomes Pelicans with 34

我有一个如下所示的数据帧：

      Player Name       Headline
1     LeBron James      LeBron James suggests 5-10 games before playoff
2     LeBron James      LeBron James (groin) probable for Thursday 
3     LeBron James      LeBron James overcomes Pelicans with 34/13/12
4     LeBron James      LeBron James (groin) plans to play on Tuesday   
5     LeBron James      LeBron James (rest) questionable Tuesday      
6     LeBron James      LeBron James (leg) will start on Saturday   
7     LeBron James      LeBron James (hip) is questionable 
8     Ryan Anderson     Anderson (flu) returns against Cavs on Sunday   
9     Ryan Anderson     Ryan Anderson out with respiratory infection   
10    Ryan Anderson     Anderson (rest) not playing

我想删除标题列中没有任何

（文本）

的任何行。另外，我希望有两个新列，分别标记为

Injury/Rest

和

Location

，如下所示

我想要的新数据帧输出：

      Player Name       Headline                           Injury/Rest  Location
2     LeBron James      LeBron James (groin) probable...   Injury       groin
4     LeBron James      LeBron James (groin) plans...      Injury       groin
5     LeBron James      LeBron James (rest) questionable.. Rest         rest
6     LeBron James      LeBron James (leg) will...         Injury       leg
7     LeBron James      LeBron James (sore hip) is...      Injury       sore hip
8     Ryan Anderson     Anderson (flu) returns...          Injury       flu
10    Ryan Anderson     Anderson (rest) not...             Rest         rest

如您所见，标题列中没有任何

（text）

的行已被删除。那些有

（text）

的被分类在一个新的列

伤害/休息

和

位置

中，如上文所述

我已经完成了

df1=df[df['Headline'].str.contains（“（rest）”）]

从Headline列中提取所有

（rest）

行。有超过10万行，所以我不知道如何在

（）

中处理每一个伤害，并在两个新列中添加数据

如何获得要清理数据帧的输出？

这是我要做的：

df['Location'] = df.Headline.str.extract('\((.*)\)')[0]
df = df[df['Location'].notnull()]
df['Injury/Rest'] = np.where(df['Location'].eq('rest'), 'Rest', 'Injury')

输出：

    Player Name    Headline                                       Location    Injury/Rest
--  -------------  ---------------------------------------------  ----------  -------------
 2  LeBron James   LeBron James (groin) probable for Thursday     groin       Injury
 4  LeBron James   LeBron James (groin) plans to play on Tuesday  groin       Injury
 5  LeBron James   LeBron James (rest) questionable Tuesday       rest        Rest
 6  LeBron James   LeBron James (leg) will start on Saturday      leg         Injury
 7  LeBron James   LeBron James (hip) is questionable             hip         Injury
 8  Ryan Anderson  Anderson (flu) returns against Cavs on Sunday  flu         Injury
10  Ryan Anderson  Anderson (rest) not playing                    rest        Rest

这就是我要做的：

df['Location'] = df.Headline.str.extract('\((.*)\)')[0]
df = df[df['Location'].notnull()]
df['Injury/Rest'] = np.where(df['Location'].eq('rest'), 'Rest', 'Injury')

输出：

    Player Name    Headline                                       Location    Injury/Rest
--  -------------  ---------------------------------------------  ----------  -------------
 2  LeBron James   LeBron James (groin) probable for Thursday     groin       Injury
 4  LeBron James   LeBron James (groin) plans to play on Tuesday  groin       Injury
 5  LeBron James   LeBron James (rest) questionable Tuesday       rest        Rest
 6  LeBron James   LeBron James (leg) will start on Saturday      leg         Injury
 7  LeBron James   LeBron James (hip) is questionable             hip         Injury
 8  Ryan Anderson  Anderson (flu) returns against Cavs on Sunday  flu         Injury
10  Ryan Anderson  Anderson (rest) not playing                    rest        Rest

您可以这样完成此任务：

import pandas as pd


def get_injury_rest(value):
    if "(rest)" in value.lower():
        return "Rest"
    elif "(" and ")" in value:
        return "Injury"


df = pd.read_csv("Players.csv")
df.loc[:, "Injury/Rest"] = [get_injury_rest(value) for value in df.loc[:, Headline"]]
df = df.dropna()
df.loc[:, "Location"] = [value.split("(")[1].split(")")[0] for value in df.loc[:, "Headline"]]

您可以这样完成此任务：

import pandas as pd


def get_injury_rest(value):
    if "(rest)" in value.lower():
        return "Rest"
    elif "(" and ")" in value:
        return "Injury"


df = pd.read_csv("Players.csv")
df.loc[:, "Injury/Rest"] = [get_injury_rest(value) for value in df.loc[:, Headline"]]
df = df.dropna()
df.loc[:, "Location"] = [value.split("(")[1].split(")")[0] for value in df.loc[:, "Headline"]]

#仅保留括号内有文本的行
res=（df.loc[df.Headline.str.contains（r“\（.+\）”））
#提取括号内的文本
.assign（Location=lambda x:x.Headline.str.extract（r）（（（？）#仅保留括号内有文本边界的行
res=（df.loc[df.Headline.str.contains（r“\（.+\）”））
#提取括号内的文本
.assign（Location=lambda x:x.Headline.str.extract（r）（？）？