Pandas 熊猫-根据特定关键字提取所有内容
我试图从数据框中提取所有内容,直到出现一个特定的单词。我试图提取整个内容,直到出现以下文字: 高、中、低 数据框中文本的示例视图:Pandas 熊猫-根据特定关键字提取所有内容,pandas,string,Pandas,String,我试图从数据框中提取所有内容,直到出现一个特定的单词。我试图提取整个内容,直到出现以下文字: 高、中、低 数据框中文本的示例视图: text Ticket creation dropped in last 24 hours medium range for cust_a Calls dropped in last 3 months high range for cust_x text, new_text Ticket creation dropped in last 24 hours medi
text
Ticket creation dropped in last 24 hours medium range for cust_a
Calls dropped in last 3 months high range for cust_x
text, new_text
Ticket creation dropped in last 24 hours medium range for cust_a, Ticket creation dropped in last 24 hours
Calls dropped in last 3 months high range for cust_x, Calls dropped in last 3 months
预期输出:
text
Ticket creation dropped in last 24 hours medium range for cust_a
Calls dropped in last 3 months high range for cust_x
text, new_text
Ticket creation dropped in last 24 hours medium range for cust_a, Ticket creation dropped in last 24 hours
Calls dropped in last 3 months high range for cust_x, Calls dropped in last 3 months
IIUC,您需要
replace
和regex
这样做的目的是匹配列表中的任何单词,然后替换它和后面的任何单词
我们使用*
匹配任何内容,直到字符串结束
words = 'high, medium, low'
match_words = '|'.join(words.split(', '))
#'high|medium|low'
df['new_text'] = df['text'].str.replace(f"({match_words}).*",'',regex=True)
print(df['text_new'])
0 Ticket creation dropped in last 24 hours
1 Calls dropped in last 3 months
Name: text, dtype: object