Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/python-3.x/19.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何从数据帧中筛选出值?_Python_Python 3.x_Pandas_Dataframe - Fatal编程技术网

Python 如何从数据帧中筛选出值?

Python 如何从数据帧中筛选出值?,python,python-3.x,pandas,dataframe,Python,Python 3.x,Pandas,Dataframe,我有两个数据帧。我需要从主数据框中筛选一些值。我需要一些帮助来做这件事。你能帮帮我吗 说明: df_main: kol_id|jnj_id|kol_full_nm|foc_area_id|thrc_cd|thrc_nm|dis_area|dis_area_nm|expert_score|pub_scor|rx_scor|refrl_scor|clincl_rsrchr_scor|is_kol 101152|7124166|Constance Ann Benson|1|VIR|VIR|HIV|HI

我有两个数据帧。我需要从主数据框中筛选一些值。我需要一些帮助来做这件事。你能帮帮我吗

说明:

df_main:

kol_id|jnj_id|kol_full_nm|foc_area_id|thrc_cd|thrc_nm|dis_area|dis_area_nm|expert_score|pub_scor|rx_scor|refrl_scor|clincl_rsrchr_scor|is_kol
101152|7124166|Constance Ann Benson|1|VIR|VIR|HIV|HIV|45.17|68.5|0|1.69|88|Y
251489|7822721|Mariam S Aziz|1|VIR|VIR|HIV|HIV|44.33|39.5|33|34.26|76|Y
100856|7356682|William Rodney Short|1|VIR|VIR|HIV|HIV|49.49|44|57.5|50.39|48|Y
251460|7933108|Laura A Guay|1|VIR|VIR|HIV|HIV|34.8|63|0|0|48|N
df2:

我必须在df2的帮助下从DFU main中过滤出值。 在df2中,它有3列-filter、filter\u value和columns。所以我必须这样创建匹配语句-

if(kol_id == '101152' and thrc_nm == 'VIR' and jnj_id == '7124166')
   Then extract only those column records from df_main which is present in df2['columns']
但问题是filter和filter\u值列记录不确定,这意味着它正在根据api\u名称进行更改。因此,我需要编写适用于所有api的代码。 如果你需要更多信息,请告诉我

意味着最终结果

df_result:

kol_id|jnj_id|kol_full_nm|thrc_cd|
101152|7124166|Constance Ann Benson|VIR

希望这能奏效-

## For this case you'll have to add these 2 lines to avoid comparing str to int
## and to avoid nans in last row of df2
df_final = df_main.copy().astype(str)
df2 = df2[:3].astype(str)

for i, row in df2.iterrows():
    df_final = df_final[df_final[row['filter']]==row['filter_value']]

首先,我从数据帧中获取了两列-filter和filter_value。创建了一个临时数据框。然后我转置了临时数据帧,重置了索引并删除了标题

filter_u = df['filter'].unique()
filter_u = [str(i) for i in filter_u]
filter_u = ' '.join(filter_u).split()
column_u = df['columns'].unique()
column_u = [str(i) for i in column_u]
column_u = ' '.join(column_u).split()
print(filter_u)
print(column_u)
df_t1 = df[['filter', 'filter_value']]
df_t1 = df_t1.transpose().reset_index(drop=True)
df_t1 = df_t1.astype(str)
df_t1.columns = df_t1.iloc[0]
df_t1 = df_t1.reindex(df_t1.index.drop(0)).reset_index(drop=True)
df_t1.columns.name = None
上述代码的输出:

   kol_id thrc_nm     jnj_id
0  101152     VIR  7124166.0
然后我将主文件作为数据帧读取,并与上面的数据帧合并,最后得到我想要的结果

df_main = pd.read_csv("/medaff/Scripts/python/vinooth/kol_scores.txt", delimiter = '|')
df_main = df_main.astype(str)
print(df_main.head())

df_3=pd.merge(df_main,df_t1,on=filter_u,how='inner')
df_3 = df_3[df_3.columns & column_u]
print(df_3)
df_3.to_json('/medaff/Scripts/python/vinooth/output/out.json', orient='records')
通过这种方式,我得到了我的最终输出:

   kol_id     jnj_id           kol_full_nm thrc_cd
0  101152  7124166.0  Constance Ann Benson     VIR

错误-'tuple'对象没有属性'filter'哦,是的,很抱歉ItErrors返回tuple i,row,我将编辑filter是一个特殊的单词,所以我将把它放在str中,以防我在df_finalal中变为空,或者用row['filter']而不是row.filter变为空?
   kol_id     jnj_id           kol_full_nm thrc_cd
0  101152  7124166.0  Constance Ann Benson     VIR