Python 使用熊猫按间隔分割长度（米）数据_Python_Pandas_Pivot Table_Intervals

Python 使用熊猫按间隔分割长度（米）数据

python pandas

Python 使用熊猫按间隔分割长度（米）数据,python,pandas,pivot-table,intervals,Python,Pandas,Pivot Table,Intervals,我有一个长度间隔数据（来自钻孔）的数据框，看起来像这样： df Out[46]: from to min intensity 0 0 10 py 2 1 5 15 cpy 3.5 2 14 27 spy 0.7 我需要透视这些数据，但也要以最短的公共长度间隔将其打断；导致“min”列作为列标题，值为“rank”。输出如下所示： df.somefunc(index=['from','to'], columns=

我有一个长度间隔数据（来自钻孔）的数据框，看起来像这样：

df
Out[46]: 
   from  to  min intensity
0     0  10   py        2
1     5  15  cpy       3.5
2    14  27  spy       0.7

我需要透视这些数据，但也要以最短的公共长度间隔将其打断；导致“min”列作为列标题，值为“rank”。输出如下所示：

df.somefunc(index=['from','to'], columns='min', values='intensity', fill_value=0)
Out[47]: 
   from  to  py  cpy  spy
0     0  5   2   0    0
1     5  10  2   3.5  0
2    10  14  0   3.5  0
3    14  15  0   3.5  0.7
4    15  27  0   0    0.7

因此，基本上，“从”和“到”描述了钻孔下的非重叠层段，其中层段按最小公分母分割——正如您可以看到的，原始表格中的“py”层段已被分割，第一个（0-5m）层段分为py:2，cpy:0，第二个（5-10m）层段分为py:2，cpy:3.5

一个基本pivot_table函数的结果如下：

pd.pivot_table(df, values='intensity', index=['from', 'to'], columns="min", aggfunc="first", fill_value=0)
Out[48]: 
min      cpy  py  spy
from to              
0    10    0   2    0
5    15  3.5   0    0
14   27    0   0    0.75

它只是将from和to列组合为索引。重要的一点是，我的输出不能有重叠的from和to值（即后续的“from”值不能小于之前的“to”值）

有没有一种优雅的方法可以用熊猫来完成这个任务？谢谢你的帮助

我不知道熊猫的自然区间算法，所以你需要去做。如果我正确理解了约束条件，这里有一种方法可以做到这一点。这可能是一个O（n^3）问题，它将为大条目创建一个巨大的表

# make the new bounds
bounds=np.unique(np.hstack((df["from"],df["to"])))
df2=pd.DataFrame({"from":bounds[:-1],"to":bounds[1:]})

#find inclusions 
isin=df.apply(lambda x :
df2['from'].between(x[0],x[1]-1)
| df2['to'].between(x[0]+1,x[1])
,axis=1).T

#data
data=np.where(isin,df.intensity,0)

#result
df3=pd.DataFrame(data,
pd.MultiIndex.from_arrays(df2.values.T),df["min"])

用于：

哇，这实际上比我想象的要少很多行代码。非常感谢！！！

In [26]: df3
Out[26]: 
min     py  cpy  spy
0  5   2.0  0.0  0.0
5  10  2.0  3.5  0.0
10 14  0.0  3.5  0.0
14 15  0.0  3.5  0.7
15 27  0.0  0.0  0.7