Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/292.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python:无法执行切片索引_Python_Pandas_Dataframe_Multi Index - Fatal编程技术网

Python:无法执行切片索引

Python:无法执行切片索引,python,pandas,dataframe,multi-index,Python,Pandas,Dataframe,Multi Index,我正在尝试使用熊猫多索引数据帧,该数据帧如下所示: end ref|alt chrom start chr1 3000714 3000715 T|G 3001065 3001066 G|T 3001110 3001111 G|C 3001131 3001132 G|A 我希望能够做到这一点: df.loc[('chr1', slice(3000714, 3001110))] 该

我正在尝试使用熊猫多索引数据帧,该数据帧如下所示:

                   end ref|alt
chrom start
chr1  3000714  3000715     T|G
      3001065  3001066     G|T
      3001110  3001111     G|C
      3001131  3001132     G|A
我希望能够做到这一点:

df.loc[('chr1', slice(3000714, 3001110))]
该操作失败,并出现以下错误:

无法使用的这些索引器[1204741]对执行切片索引

df.index.levels[1].dtype
返回
dtype('int64')
,所以它应该可以处理整数切片,对吗


此外,任何关于如何有效执行此操作的注释都是有价值的,因为数据帧有1200万行,我需要使用这种切片查询进行大约7000万次的查询。

我认为您需要在末尾添加
,:
-这意味着您需要切片行,但需要所有列:

print (df.loc[('chr1', slice(3000714, 3001110)),:])
                   end ref|alt
chrom start                   
chr1  3000714  3000715     T|G
      3001065  3001066     G|T
      3001110  3001111     G|C
另一种解决方案是将轴=0添加到:

但如果只需要
3000714
3001110

print (df.loc[('chr1', [3000714, 3001110]),:])
                   end ref|alt
chrom start                   
chr1  3000714  3000715     T|G
      3001110  3001111     G|C

idx = pd.IndexSlice
print (df.loc[idx['chr1', [3000714, 3001110]],:])
                   end ref|alt
chrom start                   
chr1  3000714  3000715     T|G
      3001110  3001111     G|C
计时

In [21]: %timeit (df.loc[('chr1', slice(3000714, 3001110)),:])
1000 loops, best of 3: 757 µs per loop

In [22]: %timeit (df.loc(axis=0)[('chr1', slice(3000714, 3001110))])
1000 loops, best of 3: 743 µs per loop

In [23]: %timeit (df.loc[('chr1', [3000714, 3001110]),:])
1000 loops, best of 3: 824 µs per loop

In [24]: %timeit (df.loc[pd.IndexSlice['chr1', [3000714, 3001110]],:])
The slowest run took 5.35 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 826 µs per loop

太棒了,效果很好。谢谢你的解释。我还意识到,在我的例子中,由于我的第一级索引比第二级索引小得多(在
level[0]
索引中有23项,在
level[1]
索引中有1260万项),我通过在第一级索引上将数据框拆分成一个字典来获得更快的速度。在我的完整数据帧上,
df.loc(axis=0)[('chr1',slice(3000714,3001110))
方法每循环花费218毫秒,而制作字典和执行
dfs['chr1'].loc[3000714:3001110]
每循环只花费95.7微秒。再次感谢@jezrael,我如何从一个索引到另一个索引选择一个数据帧..在这个范围内..我有一个函数,users.index=np.arange(0,len(users))这不返回任何内容…users.loc[start:end:]空的数据帧,但是users.dataframe有内容
In [21]: %timeit (df.loc[('chr1', slice(3000714, 3001110)),:])
1000 loops, best of 3: 757 µs per loop

In [22]: %timeit (df.loc(axis=0)[('chr1', slice(3000714, 3001110))])
1000 loops, best of 3: 743 µs per loop

In [23]: %timeit (df.loc[('chr1', [3000714, 3001110]),:])
1000 loops, best of 3: 824 µs per loop

In [24]: %timeit (df.loc[pd.IndexSlice['chr1', [3000714, 3001110]],:])
The slowest run took 5.35 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 826 µs per loop