Python Pandas-屏蔽索引不共享的两个数据帧之间的行/列_Python_Pandas_Dataframe_Indexing_Data Masking

Python Pandas-屏蔽索引不共享的两个数据帧之间的行/列

python pandas dataframe indexing

Python Pandas-屏蔽索引不共享的两个数据帧之间的行/列,python,pandas,dataframe,indexing,data-masking,Python,Pandas,Dataframe,Indexing,Data Masking,问题我有两个数据集来描述，比方说，海洋特定深度和纬度的温度。数据集来自两个不同的模型，因此具有不同的分辨率，模型1具有更高的纬度分辨率，两个模型具有不同的深度维度级别。我已经将这两个数据集转换成了熊猫数据框，深度作为垂直索引，纬度作为列标签。我想屏蔽两个数据帧之间不共享的行（深度）和列（纬度），因为我将采用差异，不想插值数据。我已经找到了如何屏蔽行和列中的某些值，但我想屏蔽整个行和列我在深度列表中使用了np.intersect1d来查找模型之间不共享的深度，并且我使用一条条件语句创建了一个布

问题

我有两个数据集来描述，比方说，海洋特定深度和纬度的温度。数据集来自两个不同的模型，因此具有不同的分辨率，模型1具有更高的纬度分辨率，两个模型具有不同的深度维度级别。我已经将这两个数据集转换成了熊猫数据框，深度作为垂直索引，纬度作为列标签。我想屏蔽两个数据帧之间不共享的行（深度）和列（纬度），因为我将采用差异，不想插值数据。我已经找到了如何屏蔽行和列中的某些值，但我想屏蔽整个行和列

我在深度列表中使用了np.intersect1d来查找模型之间不共享的深度，并且我使用一条条件语句创建了一个布尔列表，其中每个索引的值对于该数据帧都是唯一的。然而，我不知道如何使用它作为一个面具，甚至如果我可以。mask说“数组条件必须与self的形状相同”，但数组条件是一维的，而数据帧是二维的。我不知道如何引用数据帧的索引来应用掩码。我觉得我走对了方向，但我不能完全确定，因为我对熊猫还是新手。（我尝试过寻找类似的问题，但没有一个与我所看到的问题完全匹配。）

代码（简化工作示例）

注意-这是在Jupyter笔记本环境中编写的

import numpy as np
import pandas as pd

# Model 1 data
depthmod1 = [5, 10, 15, 20, 30, 50, 60, 80, 100]  #depth in meters
latmod1 = [50, 50.5, 51, 51.5, 52, 52.5, 53] #latitude in degrees north
tmpumod1 = np.random.randint(273,303,size=(len(depthmod1),len(latmod1))) #temperature
dfmod1 = pd.DataFrame(tmpumod1,index=depthmod1,columns=latmod1)
print(dfmod1)

50.050.551.051.052.052.553.0
5     299   300   300   293   285   293   273
10    273   288   293   292   290   302   273
15    277   279   284   302   280   294   284
20    291   295   277   276   295   279   274
30    281   284   284   275   295   284   282
50    284   276   291   282   286   295   295
60    298   294   289   294   285   289   288
80    285   284   275   298   287   277   300
100   292   295   294   273   291   276   290

50 51 52 53
5    297  282  275  292
10   298  286  292  282
15   286  285  288  273
25   292  288  279  299
35   301  295  300  288
50   277  301  281  277
60   276  293  295  297
100  275  279  292  287

共享深度：[5 10 15 50 60 100]（6，）

Bool显示mod1索引不在mod2中的位置：[False]

---------------------------------------------------------------------------
ValueError回溯（最近一次调用上次）
在里面
---->1 DFMOD1MASK=dfmod1.mask（深度MASK1，np.nan）
2个打印（DFMOD1MASK）
[...]
ValueError:数组条件必须与self的形状相同

问题

我如何通过索引屏蔽行，从而只剩下两个数据帧中可用的行/索引[5 10 15 50 60 100]？我将对列（纬度）进行类似的掩蔽，希望行的解决方案也能对列起作用。我也不想合并数据帧。除非需要合并，否则它们应该保持分离。

depthxsect

返回所需索引的

np.array

。因此，您可以跳过创建布尔数组

depthmask

，只需使用

.loc

将np.array传递给datframe即可。如果试图保留所有行，但只返回其他索引上的

NaN

值，则应使用

.mask

获取

dfmod1

和

depthxsect

后，您只需使用：

dfmod1.loc[depthxsect]

完整可复制代码：

import pandas as pd
import numpy as np

# Model 1 data
depthmod1 = [5, 10, 15, 20, 30, 50, 60, 80, 100]  #depth in meters
latmod1 = [50, 50.5, 51, 51.5, 52, 52.5, 53] #latitude in degrees north
tmpumod1 = np.random.randint(273,303,size=(len(depthmod1),len(latmod1))) #temperature
dfmod1 = pd.DataFrame(tmpumod1,index=depthmod1,columns=latmod1)

depthmod2  = [5, 10, 15, 25, 35, 50, 60, 100]
latmod2  = [50, 51, 52, 53]
tmpumod2  = np.random.randint(273,303,size=(len(depthmod2), len(latmod2)))
dfmod2 = pd.DataFrame(tmpumod2,index=depthmod2,columns=latmod2)
depthxsect = np.intersect1d(depthmod1, depthmod2)
dfmod1.loc[depthxsect]
Out[2]: 
     50.0  50.5  51.0  51.5  52.0  52.5  53.0
5     284   291   280   287   297   286   277
10    294   279   302   283   284   298   291
15    278   296   286   298   279   275   286
50    284   281   297   290   302   299   280
60    290   301   302   298   283   286   287
100   285   283   297   287   289   282   283

我也包括了您尝试的方法。您必须在列上指定

mask

。您在整个数据帧上执行此操作：

import pandas as pd
import numpy as np
# Model 1 data
depthmod1 = [5, 10, 15, 20, 30, 50, 60, 80, 100]  #depth in meters
latmod1 = [50, 50.5, 51, 51.5, 52, 52.5, 53] #latitude in degrees north
tmpumod1 = np.random.randint(273,303,size=(len(depthmod1),len(latmod1))) #temperature
dfmod1 = pd.DataFrame(tmpumod1,index=depthmod1,columns=latmod1)
dfmod1
depthmod2  = [5, 10, 15, 25, 35, 50, 60, 100]
latmod2  = [50, 51, 52, 53]
tmpumod2  = np.random.randint(273,303,size=(len(depthmod2), len(latmod2)))
dfmod2 = pd.DataFrame(tmpumod2,index=depthmod2,columns=latmod2)
depthxsect = np.intersect1d(depthmod1, depthmod2)
depthmask = dfmod1.index.isin(depthxsect) == False
for col in dfmod1.columns:
    dfmod1[col] = dfmod1[col].mask(depthmask, np.nan)
dfmod1
Out[3]: 
      50.0   50.5   51.0   51.5   52.0   52.5   53.0
5    289.0  274.0  297.0  274.0  277.0  278.0  277.0
10   282.0  280.0  277.0  302.0  297.0  289.0  278.0
15   300.0  282.0  297.0  297.0  300.0  279.0  291.0
20     NaN    NaN    NaN    NaN    NaN    NaN    NaN
30     NaN    NaN    NaN    NaN    NaN    NaN    NaN
50   285.0  297.0  292.0  301.0  296.0  289.0  291.0
60   295.0  299.0  278.0  295.0  299.0  293.0  277.0
80     NaN    NaN    NaN    NaN    NaN    NaN    NaN
100  292.0  293.0  289.0  291.0  289.0  276.0  286.0

谢谢，我使用了loc方法，效果很好。它也适用于列，但在本例中，这些列的语法是dfmod1.loc[：，latxsect]

# Mask data
dfmod1masked = dfmod1.mask(depthmask1,np.nan)
print(dfmod1masked)

dfmod1.loc[depthxsect]

import pandas as pd
import numpy as np

# Model 1 data
depthmod1 = [5, 10, 15, 20, 30, 50, 60, 80, 100]  #depth in meters
latmod1 = [50, 50.5, 51, 51.5, 52, 52.5, 53] #latitude in degrees north
tmpumod1 = np.random.randint(273,303,size=(len(depthmod1),len(latmod1))) #temperature
dfmod1 = pd.DataFrame(tmpumod1,index=depthmod1,columns=latmod1)

depthmod2  = [5, 10, 15, 25, 35, 50, 60, 100]
latmod2  = [50, 51, 52, 53]
tmpumod2  = np.random.randint(273,303,size=(len(depthmod2), len(latmod2)))
dfmod2 = pd.DataFrame(tmpumod2,index=depthmod2,columns=latmod2)
depthxsect = np.intersect1d(depthmod1, depthmod2)
dfmod1.loc[depthxsect]
Out[2]: 
     50.0  50.5  51.0  51.5  52.0  52.5  53.0
5     284   291   280   287   297   286   277
10    294   279   302   283   284   298   291
15    278   296   286   298   279   275   286
50    284   281   297   290   302   299   280
60    290   301   302   298   283   286   287
100   285   283   297   287   289   282   283

import pandas as pd
import numpy as np
# Model 1 data
depthmod1 = [5, 10, 15, 20, 30, 50, 60, 80, 100]  #depth in meters
latmod1 = [50, 50.5, 51, 51.5, 52, 52.5, 53] #latitude in degrees north
tmpumod1 = np.random.randint(273,303,size=(len(depthmod1),len(latmod1))) #temperature
dfmod1 = pd.DataFrame(tmpumod1,index=depthmod1,columns=latmod1)
dfmod1
depthmod2  = [5, 10, 15, 25, 35, 50, 60, 100]
latmod2  = [50, 51, 52, 53]
tmpumod2  = np.random.randint(273,303,size=(len(depthmod2), len(latmod2)))
dfmod2 = pd.DataFrame(tmpumod2,index=depthmod2,columns=latmod2)
depthxsect = np.intersect1d(depthmod1, depthmod2)
depthmask = dfmod1.index.isin(depthxsect) == False
for col in dfmod1.columns:
    dfmod1[col] = dfmod1[col].mask(depthmask, np.nan)
dfmod1
Out[3]: 
      50.0   50.5   51.0   51.5   52.0   52.5   53.0
5    289.0  274.0  297.0  274.0  277.0  278.0  277.0
10   282.0  280.0  277.0  302.0  297.0  289.0  278.0
15   300.0  282.0  297.0  297.0  300.0  279.0  291.0
20     NaN    NaN    NaN    NaN    NaN    NaN    NaN
30     NaN    NaN    NaN    NaN    NaN    NaN    NaN
50   285.0  297.0  292.0  301.0  296.0  289.0  291.0
60   295.0  299.0  278.0  295.0  299.0  293.0  277.0
80     NaN    NaN    NaN    NaN    NaN    NaN    NaN
100  292.0  293.0  289.0  291.0  289.0  276.0  286.0