Python 如何将单词从一列中的括号中提取到另一列中的新列中?
我有一个如下所示的数据帧:Python 如何将单词从一列中的括号中提取到另一列中的新列中?,python,pandas,dataframe,Python,Pandas,Dataframe,我有一个如下所示的数据帧: Player Name Headline 1 LeBron James LeBron James suggests 5-10 games before playoff 2 LeBron James LeBron James (groin) probable for Thursday 3 LeBron James LeBron James overcomes Pelicans with 34
Player Name Headline
1 LeBron James LeBron James suggests 5-10 games before playoff
2 LeBron James LeBron James (groin) probable for Thursday
3 LeBron James LeBron James overcomes Pelicans with 34/13/12
4 LeBron James LeBron James (groin) plans to play on Tuesday
5 LeBron James LeBron James (rest) questionable Tuesday
6 LeBron James LeBron James (leg) will start on Saturday
7 LeBron James LeBron James (hip) is questionable
8 Ryan Anderson Anderson (flu) returns against Cavs on Sunday
9 Ryan Anderson Ryan Anderson out with respiratory infection
10 Ryan Anderson Anderson (rest) not playing
我想删除标题列中没有任何(文本)
的任何行。另外,我希望有两个新列,分别标记为Injury/Rest
和Location
,如下所示
我想要的新数据帧输出:
Player Name Headline Injury/Rest Location
2 LeBron James LeBron James (groin) probable... Injury groin
4 LeBron James LeBron James (groin) plans... Injury groin
5 LeBron James LeBron James (rest) questionable.. Rest rest
6 LeBron James LeBron James (leg) will... Injury leg
7 LeBron James LeBron James (sore hip) is... Injury sore hip
8 Ryan Anderson Anderson (flu) returns... Injury flu
10 Ryan Anderson Anderson (rest) not... Rest rest
如您所见,标题列中没有任何(text)
的行已被删除。那些有(text)
的被分类在一个新的列伤害/休息
和位置
中,如上文所述
我已经完成了df1=df[df['Headline'].str.contains(“(rest)”)]
从Headline列中提取所有(rest)
行。有超过10万行,所以我不知道如何在()
中处理每一个伤害,并在两个新列中添加数据
如何获得要清理数据帧的输出?这是我要做的:
df['Location'] = df.Headline.str.extract('\((.*)\)')[0]
df = df[df['Location'].notnull()]
df['Injury/Rest'] = np.where(df['Location'].eq('rest'), 'Rest', 'Injury')
输出:
Player Name Headline Location Injury/Rest
-- ------------- --------------------------------------------- ---------- -------------
2 LeBron James LeBron James (groin) probable for Thursday groin Injury
4 LeBron James LeBron James (groin) plans to play on Tuesday groin Injury
5 LeBron James LeBron James (rest) questionable Tuesday rest Rest
6 LeBron James LeBron James (leg) will start on Saturday leg Injury
7 LeBron James LeBron James (hip) is questionable hip Injury
8 Ryan Anderson Anderson (flu) returns against Cavs on Sunday flu Injury
10 Ryan Anderson Anderson (rest) not playing rest Rest
这就是我要做的:
df['Location'] = df.Headline.str.extract('\((.*)\)')[0]
df = df[df['Location'].notnull()]
df['Injury/Rest'] = np.where(df['Location'].eq('rest'), 'Rest', 'Injury')
输出:
Player Name Headline Location Injury/Rest
-- ------------- --------------------------------------------- ---------- -------------
2 LeBron James LeBron James (groin) probable for Thursday groin Injury
4 LeBron James LeBron James (groin) plans to play on Tuesday groin Injury
5 LeBron James LeBron James (rest) questionable Tuesday rest Rest
6 LeBron James LeBron James (leg) will start on Saturday leg Injury
7 LeBron James LeBron James (hip) is questionable hip Injury
8 Ryan Anderson Anderson (flu) returns against Cavs on Sunday flu Injury
10 Ryan Anderson Anderson (rest) not playing rest Rest
您可以这样完成此任务:
import pandas as pd
def get_injury_rest(value):
if "(rest)" in value.lower():
return "Rest"
elif "(" and ")" in value:
return "Injury"
df = pd.read_csv("Players.csv")
df.loc[:, "Injury/Rest"] = [get_injury_rest(value) for value in df.loc[:, Headline"]]
df = df.dropna()
df.loc[:, "Location"] = [value.split("(")[1].split(")")[0] for value in df.loc[:, "Headline"]]
您可以这样完成此任务:
import pandas as pd
def get_injury_rest(value):
if "(rest)" in value.lower():
return "Rest"
elif "(" and ")" in value:
return "Injury"
df = pd.read_csv("Players.csv")
df.loc[:, "Injury/Rest"] = [get_injury_rest(value) for value in df.loc[:, Headline"]]
df = df.dropna()
df.loc[:, "Location"] = [value.split("(")[1].split(")")[0] for value in df.loc[:, "Headline"]]
#仅保留括号内有文本的行
res=(df.loc[df.Headline.str.contains(r“\(.+\)”))
#提取括号内的文本
.assign(Location=lambda x:x.Headline.str.extract(r)(((?)#仅保留括号内有文本边界的行
res=(df.loc[df.Headline.str.contains(r“\(.+\)”))
#提取括号内的文本
.assign(Location=lambda x:x.Headline.str.extract(r)(?)?