在过滤后的数据框中搜索特定字符串,然后根据结果创建新列(Python/Pandas)
我试图过滤我的数据框(医院)中的“脑出血”列为真的情况。然后我想在brain_info列中搜索一个特定的单词(“癌症”),然后创建一个包含该单词(“癌症”)的新列 我以前在没有过滤组件的情况下做过这件事,但是在这种情况下我遇到了麻烦在过滤后的数据框中搜索特定字符串,然后根据结果创建新列(Python/Pandas),python,pandas,Python,Pandas,我试图过滤我的数据框(医院)中的“脑出血”列为真的情况。然后我想在brain_info列中搜索一个特定的单词(“癌症”),然后创建一个包含该单词(“癌症”)的新列 我以前在没有过滤组件的情况下做过这件事,但是在这种情况下我遇到了麻烦 #What I have | brain bleeding| brain info | |final diagnosis| |---------------|-------------|
#What I have
| brain bleeding| brain info | |final diagnosis|
|---------------|-------------| ----------------
| True | BlahBlahBlah| I want to add this column | |
| True | Cancer | |Cancer |
| False | Cancer | | |
#Creating an empty column in my dataframe for the final diagnosis.
hospital["final_diagnosis"] = ""
#Filter cases where brain cancer is True
filt = (hospital["brain_bleeding"] == True)
#Search for the filtered cases if the diagnosis contains "cancer" and adds it to the corresponding "final_diagnosis" cell, if it is there. This is where my error is?
hospital.loc[filt, 'brain_info'].str.contains("cancer", case=False, na=False), "final diagnosis"] = "cancer"
有人能帮我吗?谢谢假定您的文件是:
brain_bleeding brain_info
True BlahBlahBlah
True Cancer
False Cancer
您可以尝试以下操作:
#!/usr/bin/python
# -*- coding: utf-8 -*-
import pandas as pd
hospital = pd.read_csv('file.csv', sep='\t')
# add True to final_diagnosis column if brain is bleeding and brain info is cancer
hospital.loc[(hospital['brain_bleeding'] == True) &
(hospital['brain_info'] == 'Cancer'), 'final_diagnosis'] = True
hospital['final_diagnosis'].fillna('', inplace=True) # replace NaN with empty strings
print(hospital)
输出:
brain_bleeding brain_info final_diagnosis
0 True BlahBlahBlah
1 True Cancer True
2 False Cancer
注意:我根据您的示例中的final\u diagnosis
列添加了两个条件-看起来您可能只需要一个条件(如果需要,免费提供两个删除一个)