Python 使用pandas.IntervalIndex作为数据帧索引_Python_Pandas

Python 使用pandas.IntervalIndex作为数据帧索引

python pandas

Python 使用pandas.IntervalIndex作为数据帧索引,python,pandas,Python,Pandas,考虑以下数据集： cat > example_file.txt <<EOL bins|1,1|2,2|3,4|5,6|7,9|10,12|13,18|19,24|25,36|37,54|55,81 1,1||431|501|651|861|1081|1461|1711|2071|2371|2531 2,2|||261|401|631|871|1291|1611|2001|2291|2431 3,4|||121|271|551|1011|1191|1511|1901|2211|2

考虑以下数据集：

cat > example_file.txt <<EOL
bins|1,1|2,2|3,4|5,6|7,9|10,12|13,18|19,24|25,36|37,54|55,81
1,1||431|501|651|861|1081|1461|1711|2071|2371|2531
2,2|||261|401|631|871|1291|1611|2001|2291|2431
3,4|||121|271|551|1011|1191|1511|1901|2211|2351
5,6||||101|361|901|1251|1301|1691|2011|2221
7,9|||||181|461|841|1151|1511|1821|2061
10,12||||||161|591|931|1291|1621|1821
13,18|||||||351|691|1091|1401|1571
19,24||||||||301|861|1451|1201
25,36|||||||||371|851|961
37,54||||||||||371|621
55,81|||||||||||351
EOL

现在，这个表的行和列名实际上是区间界限。我可以让熊猫认出它们是这样的：

col_bin = pandas.IntervalIndex.from_tuples([tuple(list(map(int, x.split(',')))) for x in example.columns])
row_bin = pandas.IntervalIndex.from_tuples([tuple(list(map(int, x.split(',')))) for x in example.index])
example.columns = col_bin
example.index   = row_bin

现在，我想找到对应于值对的单元格值例如（11,13）

例如，对于（11，13），单元格值为591

这是因为

行索引值（11）属于区间（10,12）[这是第6行]，列索引值（13）属于区间（13,18）[这是第7列]，表中（行，列）（6,7）处的值是591。

您需要首先添加参数

closed='both'

，因为区间两边都是闭合的：

col_bin = pd.IntervalIndex.from_tuples([tuple(list(map(int, x.split(',')))) for x in example.columns], closed='both')
row_bin = pd.IntervalIndex.from_tuples([tuple(list(map(int, x.split(',')))) for x in example.index], closed='both')
example.columns = col_bin
example.index   = row_bin
print (example)
         [1, 1]  [2, 2]  [3, 4]  [5, 6]  [7, 9]  [10, 12]  [13, 18]  \
[1, 1]       NaN   431.0   501.0   651.0   861.0    1081.0    1461.0   
[2, 2]       NaN     NaN   261.0   401.0   631.0     871.0    1291.0   
[3, 4]       NaN     NaN   121.0   271.0   551.0    1011.0    1191.0   
[5, 6]       NaN     NaN     NaN   101.0   361.0     901.0    1251.0   
[7, 9]       NaN     NaN     NaN     NaN   181.0     461.0     841.0   
[10, 12]     NaN     NaN     NaN     NaN     NaN     161.0     591.0   
[13, 18]     NaN     NaN     NaN     NaN     NaN       NaN     351.0   
[19, 24]     NaN     NaN     NaN     NaN     NaN       NaN       NaN   
[25, 36]     NaN     NaN     NaN     NaN     NaN       NaN       NaN   
[37, 54]     NaN     NaN     NaN     NaN     NaN       NaN       NaN   
[55, 81]     NaN     NaN     NaN     NaN     NaN       NaN       NaN   

          [19, 24]  [25, 36]  [37, 54]  [55, 81]  
[1, 1]      1711.0    2071.0    2371.0      2531  
[2, 2]      1611.0    2001.0    2291.0      2431  
[3, 4]      1511.0    1901.0    2211.0      2351  
[5, 6]      1301.0    1691.0    2011.0      2221  
[7, 9]      1151.0    1511.0    1821.0      2061  
[10, 12]     931.0    1291.0    1621.0      1821  
[13, 18]     691.0    1091.0    1401.0      1571  
[19, 24]     301.0     861.0    1451.0      1201  
[25, 36]       NaN     371.0     851.0       961  
[37, 54]       NaN       NaN     371.0       621  
[55, 81]       NaN       NaN       NaN       351

然后通过获取位置并选择：

非常感谢，我能感觉到我不远，但我错过了最后一英里；）

col_bin = pd.IntervalIndex.from_tuples([tuple(list(map(int, x.split(',')))) for x in example.columns], closed='both')
row_bin = pd.IntervalIndex.from_tuples([tuple(list(map(int, x.split(',')))) for x in example.index], closed='both')
example.columns = col_bin
example.index   = row_bin
print (example)
         [1, 1]  [2, 2]  [3, 4]  [5, 6]  [7, 9]  [10, 12]  [13, 18]  \
[1, 1]       NaN   431.0   501.0   651.0   861.0    1081.0    1461.0   
[2, 2]       NaN     NaN   261.0   401.0   631.0     871.0    1291.0   
[3, 4]       NaN     NaN   121.0   271.0   551.0    1011.0    1191.0   
[5, 6]       NaN     NaN     NaN   101.0   361.0     901.0    1251.0   
[7, 9]       NaN     NaN     NaN     NaN   181.0     461.0     841.0   
[10, 12]     NaN     NaN     NaN     NaN     NaN     161.0     591.0   
[13, 18]     NaN     NaN     NaN     NaN     NaN       NaN     351.0   
[19, 24]     NaN     NaN     NaN     NaN     NaN       NaN       NaN   
[25, 36]     NaN     NaN     NaN     NaN     NaN       NaN       NaN   
[37, 54]     NaN     NaN     NaN     NaN     NaN       NaN       NaN   
[55, 81]     NaN     NaN     NaN     NaN     NaN       NaN       NaN   

          [19, 24]  [25, 36]  [37, 54]  [55, 81]  
[1, 1]      1711.0    2071.0    2371.0      2531  
[2, 2]      1611.0    2001.0    2291.0      2431  
[3, 4]      1511.0    1901.0    2211.0      2351  
[5, 6]      1301.0    1691.0    2011.0      2221  
[7, 9]      1151.0    1511.0    1821.0      2061  
[10, 12]     931.0    1291.0    1621.0      1821  
[13, 18]     691.0    1091.0    1401.0      1571  
[19, 24]     301.0     861.0    1451.0      1201  
[25, 36]       NaN     371.0     851.0       961  
[37, 54]       NaN       NaN     371.0       621  
[55, 81]       NaN       NaN       NaN       351

tup = (11, 13)
pos1 = example.index.get_loc(tup[0])
pos2 = example.columns.get_loc(tup[1])

print (example.iat[pos1, pos2])
591.0