Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/359.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何将单词从一列中的括号中提取到另一列中的新列中?_Python_Pandas_Dataframe - Fatal编程技术网

Python 如何将单词从一列中的括号中提取到另一列中的新列中?

Python 如何将单词从一列中的括号中提取到另一列中的新列中?,python,pandas,dataframe,Python,Pandas,Dataframe,我有一个如下所示的数据帧: Player Name Headline 1 LeBron James LeBron James suggests 5-10 games before playoff 2 LeBron James LeBron James (groin) probable for Thursday 3 LeBron James LeBron James overcomes Pelicans with 34

我有一个如下所示的数据帧:

      Player Name       Headline
1     LeBron James      LeBron James suggests 5-10 games before playoff
2     LeBron James      LeBron James (groin) probable for Thursday 
3     LeBron James      LeBron James overcomes Pelicans with 34/13/12
4     LeBron James      LeBron James (groin) plans to play on Tuesday   
5     LeBron James      LeBron James (rest) questionable Tuesday      
6     LeBron James      LeBron James (leg) will start on Saturday   
7     LeBron James      LeBron James (hip) is questionable 
8     Ryan Anderson     Anderson (flu) returns against Cavs on Sunday   
9     Ryan Anderson     Ryan Anderson out with respiratory infection   
10    Ryan Anderson     Anderson (rest) not playing 
我想删除标题列中没有任何
(文本)
的任何行。另外,我希望有两个新列,分别标记为
Injury/Rest
Location
,如下所示

我想要的新数据帧输出:

      Player Name       Headline                           Injury/Rest  Location
2     LeBron James      LeBron James (groin) probable...   Injury       groin
4     LeBron James      LeBron James (groin) plans...      Injury       groin
5     LeBron James      LeBron James (rest) questionable.. Rest         rest
6     LeBron James      LeBron James (leg) will...         Injury       leg
7     LeBron James      LeBron James (sore hip) is...      Injury       sore hip
8     Ryan Anderson     Anderson (flu) returns...          Injury       flu
10    Ryan Anderson     Anderson (rest) not...             Rest         rest
如您所见,标题列中没有任何
(text)
的行已被删除。那些有
(text)
的被分类在一个新的列
伤害/休息
位置
中,如上文所述

我已经完成了
df1=df[df['Headline'].str.contains(“(rest)”)]
从Headline列中提取所有
(rest)
行。有超过10万行,所以我不知道如何在
()
中处理每一个伤害,并在两个新列中添加数据


如何获得要清理数据帧的输出?

这是我要做的:

df['Location'] = df.Headline.str.extract('\((.*)\)')[0]
df = df[df['Location'].notnull()]
df['Injury/Rest'] = np.where(df['Location'].eq('rest'), 'Rest', 'Injury')
输出:

    Player Name    Headline                                       Location    Injury/Rest
--  -------------  ---------------------------------------------  ----------  -------------
 2  LeBron James   LeBron James (groin) probable for Thursday     groin       Injury
 4  LeBron James   LeBron James (groin) plans to play on Tuesday  groin       Injury
 5  LeBron James   LeBron James (rest) questionable Tuesday       rest        Rest
 6  LeBron James   LeBron James (leg) will start on Saturday      leg         Injury
 7  LeBron James   LeBron James (hip) is questionable             hip         Injury
 8  Ryan Anderson  Anderson (flu) returns against Cavs on Sunday  flu         Injury
10  Ryan Anderson  Anderson (rest) not playing                    rest        Rest

这就是我要做的:

df['Location'] = df.Headline.str.extract('\((.*)\)')[0]
df = df[df['Location'].notnull()]
df['Injury/Rest'] = np.where(df['Location'].eq('rest'), 'Rest', 'Injury')
输出:

    Player Name    Headline                                       Location    Injury/Rest
--  -------------  ---------------------------------------------  ----------  -------------
 2  LeBron James   LeBron James (groin) probable for Thursday     groin       Injury
 4  LeBron James   LeBron James (groin) plans to play on Tuesday  groin       Injury
 5  LeBron James   LeBron James (rest) questionable Tuesday       rest        Rest
 6  LeBron James   LeBron James (leg) will start on Saturday      leg         Injury
 7  LeBron James   LeBron James (hip) is questionable             hip         Injury
 8  Ryan Anderson  Anderson (flu) returns against Cavs on Sunday  flu         Injury
10  Ryan Anderson  Anderson (rest) not playing                    rest        Rest

您可以这样完成此任务:

import pandas as pd


def get_injury_rest(value):
    if "(rest)" in value.lower():
        return "Rest"
    elif "(" and ")" in value:
        return "Injury"


df = pd.read_csv("Players.csv")
df.loc[:, "Injury/Rest"] = [get_injury_rest(value) for value in df.loc[:, Headline"]]
df = df.dropna()
df.loc[:, "Location"] = [value.split("(")[1].split(")")[0] for value in df.loc[:, "Headline"]]

您可以这样完成此任务:

import pandas as pd


def get_injury_rest(value):
    if "(rest)" in value.lower():
        return "Rest"
    elif "(" and ")" in value:
        return "Injury"


df = pd.read_csv("Players.csv")
df.loc[:, "Injury/Rest"] = [get_injury_rest(value) for value in df.loc[:, Headline"]]
df = df.dropna()
df.loc[:, "Location"] = [value.split("(")[1].split(")")[0] for value in df.loc[:, "Headline"]]
#仅保留括号内有文本的行
res=(df.loc[df.Headline.str.contains(r“\(.+\)”))
#提取括号内的文本
.assign(Location=lambda x:x.Headline.str.extract(r)(((?
)#仅保留括号内有文本边界的行
res=(df.loc[df.Headline.str.contains(r“\(.+\)”))
#提取括号内的文本
.assign(Location=lambda x:x.Headline.str.extract(r)(?)?