Python 熊猫loc vs.iloc vs.at vs.iat？_Python_Pandas_Performance_Indexing_Lookup

Python 熊猫loc vs.iloc vs.at vs.iat？

python pandas performance indexing

Python 熊猫loc vs.iloc vs.at vs.iat？,python,pandas,performance,indexing,lookup,Python,Pandas,Performance,Indexing,Lookup,最近开始从我的安全地带（R）扩展到Python，我对Pandas中的单元格定位/选择感到有点困惑。我已经阅读了文档，但我很难理解各种本地化/选择选项的实际含义我是否有理由使用.loc或.iloc而不是at，以及iat，反之亦然在什么情况下我应该使用哪种方法？注意：未来的读者应该知道，这个问题很老，是在pandas v0.20之前编写的，当时有一个名为.ix的函数。该方法后来被分为两部分-loc和iloc，以明确区分位置索引和基于标签的索引。请注意，ix由于行为不一致和难以理解而被终止，并且

最近开始从我的安全地带（R）扩展到Python，我对

Pandas

中的单元格定位/选择感到有点困惑。我已经阅读了文档，但我很难理解各种本地化/选择选项的实际含义

我是否有理由使用

.loc

或

.iloc

而不是

at

，以及

iat

，反之亦然在什么情况下我应该使用哪种方法？

注意：未来的读者应该知道，这个问题很老，是在pandas v0.20之前编写的，当时有一个名为

.ix

的函数。该方法后来被分为两部分-

loc

和

iloc

，以明确区分位置索引和基于标签的索引。请注意，

ix

由于行为不一致和难以理解而被终止，并且在当前版本的pandas（>=1.0）中不再存在

loc:仅在索引上工作
iloc:在岗工作
at:获取标量值。这是一个非常快的loc
iat:获取标量值。这是一个非常快速的iloc

而且

at

和

iat

用于访问标量，即单个元素在数据帧中，而

loc

和

iloc

是访问多个元素同时执行向量化行动

针对

pandas

0.20

更新，因为

ix

已被弃用。这不仅演示了如何使用

loc

，

iloc

，

at

，

iat

，

设置值

，还演示了如何实现基于位置/标签的混合索引

-基于标签的
允许您将一维数组作为索引器传递。数组可以是索引或列的切片（子集），也可以是长度等于索引或列的布尔数组

特别注意：当传递标量索引器时，

loc

可以指定以前不存在的新索引或列值

# label based, but we can use position values
# to get the labels from the index object
df.loc[df.index[2], 'ColName'] = 3

-基于职位的
类似于

loc

，除了位置而不是索引值。但是，不能指定新列或索引

# position based, but we can get the position
# from the columns object via the `get_loc` method
df.iloc[2, df.columns.get_loc('ColName')] = 3

-基于标签的
对于标量索引器，其工作原理非常类似于

loc

无法对数组索引器进行操作可以指定新索引和列

比
loc
的优势在于速度更快。
缺点是不能将数组用于索引器

# label based, but we can use position values # to get the labels from the index object df.at[df.index[2], 'ColName'] = 3

# position based, but we can get the position # from the columns object via the `get_loc` method IBM.iat[2, IBM.columns.get_loc('PNL')] = 3

-基于职位的
工作原理与iloc类似无法在数组索引器中工作<不能指定新索引和列
比
iloc
的优点是速度更快。
缺点是不能将数组用于索引器

# label based, but we can use position values # to get the labels from the index object df.at[df.index[2], 'ColName'] = 3

# position based, but we can get the position # from the columns object via the `get_loc` method IBM.iat[2, IBM.columns.get_loc('PNL')] = 3

-基于标签的
对于标量索引器，其工作原理非常类似于
loc
无法对数组索引器进行操作可以指定新索引和列
优势超快，因为开销非常小
缺点开销很小，因为
pandas
没有进行一系列安全检查使用风险自担。此外，这并非用于公共用途

# label based, but we can use position values # to get the labels from the index object df.set_value(df.index[2], 'ColName', 3)

# position based, but we can get the position # from the columns object via the `get_loc` method df.set_value(2, df.columns.get_loc('ColName'), 3, takable=True)

-基于职位的
工作原理与iloc类似无法在数组索引器中工作<不能指定新索引和列
优势超快，因为开销非常小
缺点开销很小，因为
pandas
没有进行一系列安全检查使用风险自担。此外，这并非用于公共用途

# label based, but we can use position values # to get the labels from the index object df.set_value(df.index[2], 'ColName', 3)

# position based, but we can get the position # from the columns object via the `get_loc` method df.set_value(2, df.columns.get_loc('ColName'), 3, takable=True)

熊猫从数据帧进行选择有两种主要方式

df['food'] Jane Steak Nick Lamb Aaron Mango Penelope Apple Dean Cheese Christina Melon Cornelia Beans Name: food, dtype: object

通过标签

按整数位置

本文档使用术语位置表示整数位置。我不喜欢这个术语，因为我觉得它令人困惑。整数位置更具描述性，正是
.iloc
所代表的。这里的关键字是整数-按整数位置选择时必须使用整数
在显示摘要之前，让我们确保
.ix已弃用且不明确，不应使用熊猫有三个主要的索引器。我们有索引运算符本身（括号[]
）、.loc
和
.iloc。让我们总结一下： []-主要选择列的子集，但也可以选择行。不能同时选择行和列 .loc-仅按标签选择行和列的子集 .iloc-仅按整数位置选择行和列的子集我几乎从不使用.at或.iat，因为它们不添加任何附加功能，只增加了少量性能。我不鼓励使用它们，除非你有一个非常时间敏感的应用程序。无论如何，我们有他们的总结： .at仅按标签选择数据帧中的单个标量值 .iat仅按整数位置选择数据帧中的单个标量值我 df = pd.DataFrame({'age':[30, 2, 12, 4, 32, 33, 69], 'color':['blue', 'green', 'red', 'white', 'gray', 'black', 'red'], 'food':['Steak', 'Lamb', 'Mango', 'Apple', 'Cheese', 'Melon', 'Beans'], 'height':[165, 70, 120, 80, 180, 172, 150], 'score':[4.6, 8.3, 9.0, 3.3, 1.8, 9.5, 2.2], 'state':['NY', 'TX', 'FL', 'AL', 'AK', 'TX', 'TX'] }, index=['Jane', 'Nick', 'Aaron', 'Penelope', 'Dean', 'Christina', 'Cornelia']) df.loc['Penelope'] age 4 color white food Apple height 80 score 3.3 state AL Name: Penelope, dtype: object df.loc[['Cornelia', 'Jane', 'Dean']] df.loc['Aaron':'Dean'] df.iloc[4] age 32 color gray food Cheese height 180 score 1.8 state AK Name: Dean, dtype: object df.iloc[[2, -2]] df.iloc[:5:3] df.loc[['Jane', 'Dean'], 'height':] df.iloc[[1,4], 2] Nick Lamb Dean Cheese Name: food, dtype: object col_names = df.columns[[2, 4]] df.loc[['Nick', 'Cornelia'], col_names] labels = ['Nick', 'Cornelia'] index_ints = [df.index.get_loc(label) for label in labels] df.iloc[index_ints, [2, 4]] df.loc[df['age'] > 30, ['food', 'score']] df.iloc[(df['age'] > 30).values, [2, 4]] df.loc[:, 'color':'score':2] df['food'] Jane Steak Nick Lamb Aaron Mango Penelope Apple Dean Cheese Christina Melon Cornelia Beans Name: food, dtype: object df[['food', 'score']] df['Penelope':'Christina'] # slice rows by label df[2:6:2] # slice rows by integer location df[3:5, 'color'] TypeError: unhashable type: 'slice' df.at['Christina', 'color'] 'black' df.iat[2, 5] 'FL' import pandas as pd import time as tm import numpy as np n=10 a=np.arange(0,n**2) df=pd.DataFrame(a.reshape(n,n)) df Out[25]: 0 1 2 3 4 5 6 7 8 9 0 0 1 2 3 4 5 6 7 8 9 1 10 11 12 13 14 15 16 17 18 19 2 20 21 22 23 24 25 26 27 28 29 3 30 31 32 33 34 35 36 37 38 39 4 40 41 42 43 44 45 46 47 48 49 5 50 51 52 53 54 55 56 57 58 59 6 60 61 62 63 64 65 66 67 68 69 7 70 71 72 73 74 75 76 77 78 79 8 80 81 82 83 84 85 86 87 88 89 9 90 91 92 93 94 95 96 97 98 99 df.iloc[3,3] Out[33]: 33 df.iat[3,3] Out[34]: 33 df.iloc[:3,:3] Out[35]: 0 1 2 3 0 0 1 2 3 1 10 11 12 13 2 20 21 22 23 3 30 31 32 33 df.iat[:3,:3] Traceback (most recent call last): ... omissis ... ValueError: At based indexing on an integer index can only have integer indexers # -*- coding: utf-8 -*- """ Created on Wed Feb 7 09:58:39 2018 @author: Fabio Pomi """ import pandas as pd import time as tm import numpy as np n=1000 a=np.arange(0,n**2) df=pd.DataFrame(a.reshape(n,n)) t1=tm.time() for j in df.index: for i in df.columns: a=df.iloc[j,i] t2=tm.time() for j in df.index: for i in df.columns: a=df.iat[j,i] t3=tm.time() loc=t2-t1 at=t3-t2 prc = loc/at *100 print('\nloc:%f at:%f prc:%f' %(loc,at,prc)) loc:10.485600 at:7.395423 prc:141.784987