Python pandas.Series.apply中的访问索引_Python_Pandas

Python pandas.Series.apply中的访问索引

python pandas

Python pandas.Series.apply中的访问索引,python,pandas,Python,Pandas,假设我有一个多索引系列s： >>> s values a b 1 2 0.1 3 6 0.3 4 4 0.7 我想应用一个函数，它使用行的索引： def f(x): # conditions or computations using the indexes if x.index[0] and ...: other = sum(x.index) + ... return something 我怎样才能s.apply（f）实现这样的

假设我有一个多索引系列

：

>>> s
     values
a b
1 2  0.1 
3 6  0.3
4 4  0.7

我想应用一个函数，它使用行的索引：

def f(x):
   # conditions or computations using the indexes
   if x.index[0] and ...: 
   other = sum(x.index) + ...
   return something

我怎样才能

s.apply（f）

实现这样的功能？进行此类操作的推荐方法是什么？我希望获得一个新的序列，该序列的值由应用于每一行和同一个多索引的函数产生。

我不相信

apply

可以访问该索引；它将每一行视为一个numpy对象，而不是一个系列，如您所见：

In [27]: s.apply(lambda x: type(x))
Out[27]: 
a  b
1  2    <type 'numpy.float64'>
3  6    <type 'numpy.float64'>
4  4    <type 'numpy.float64'>

其他方法可能会使用

s.get_level_values

，在我看来这通常会有点难看，或者

s.iterrows（）

，这可能会更慢——可能取决于

的具体功能。

将其设为一个帧，如果需要，返回标量（因此结果是一个系列）

设置

打印功能

In [13]: def f(x):
    print type(x), x
    return x
   ....: 

In [14]: pd.DataFrame(s).apply(f)
<class 'pandas.core.series.Series'> a    1
b    2
c    3
Name: 0, dtype: float64
<class 'pandas.core.series.Series'> a    1
b    2
c    3
Name: 0, dtype: float64
Out[14]: 
   0
a  1
b  2
c  3

您可能会发现在此处使用

where

比

apply

更快：

In [11]: s = pd.Series([1., 2., 3.], index=['a' ,'b', 'c'])

In [12]: s.where(s.index != 'a', 5)
Out[12]: 
a    5
b    2
c    3
dtype: float64

此外，您还可以将numpy样式的逻辑/函数用于任何零件：

In [13]: (2 * s + 1).where((s.index == 'b') | (s.index == 'c'), -s)
Out[13]: 
a   -1
b    5
c    7
dtype: float64

In [14]: (2 * s + 1).where(s.index != 'a', -s)
Out[14]: 
a   -1
b    5
c    7
dtype: float64

我建议测试速度（因为应用的效率取决于功能）。尽管如此，我发现

apply

s更具可读性…

如果使用DataFrame.apply（）而不是Series.apply（），则可以在函数中作为参数访问整行

def f1（行）：
如果行['I']<0.5：
返回0
其他：
返回1
def f2（世界其他地区）：
如果行['N1']==1：
返回0
其他：
返回1
作为pd进口熊猫
将numpy作为np导入
df4=pd.DataFrame（np.random.rand（6,1），columns=list（'I'））
df4['N1']=df4.apply（f1，轴=1）
df4['N2']=df4.应用（f2，轴=1）

转换为

数据帧

并沿行应用。您可以通过

x.name

访问索引

也是一个

系列

现在有1个值

s.to_frame(0).apply(f, axis=1)[0]

使用

reset_index（）

将序列转换为数据帧，将索引转换为列，然后

将您的函数应用于数据帧
棘手的部分是知道如何为列命名，这里有几个例子
使用单索引序列
输出：
idx1    I made this with index idx1 and value val1
idx2    I made this with index idx2 and value val2
dtype: object

idx(0,0)  idx(0,1)    made with index: idx(0,0),idx(0,1) & value: val1
idx(1,0)  idx(1,1)    made with index: idx(1,0),idx(1,1) & value: val2
dtype: object

使用多索引序列
这里的概念相同，但您需要以row['level\u*']
的形式访问索引值，因为它们是通过Series.reset\u index（）
放置的
输出：
idx1    I made this with index idx1 and value val1
idx2    I made this with index idx2 and value val2
dtype: object

idx(0,0)  idx(0,1)    made with index: idx(0,0),idx(0,1) & value: val1
idx(1,0)  idx(1,1)    made with index: idx(1,0),idx(1,1) & value: val2
dtype: object

如果您的系列或索引有名称，则需要进行相应的调整。
还值得注意的是，向量化f和使用&|等可能也会更快。目前我使用重置索引方法，将稍等片刻，看看是否有人提出了更干净的解决方案。+1用于摆脱多索引。
。虽然这些偶尔有用，但我发现自己越来越多地将索引转换为列。在我的情况下（数据帧，轴=1），当我应用函数lambda x:x时，x.name（）返回索引的值…这完全是愚蠢的行为，但是你说的完全正确，但是你的解决方案并不理想，对于大多数用例，Jeff的答案DataFrame.apply（x）
要简单得多，应该是IMHO接受的答案！嗯。现在我想知道是否应该有一个Series.eval
/query
方法……我将在pandas上提出这个问题。@PhillipCloud，+1，我需要大量使用索引（添加/分段、对齐和缺少数据）我越来越经常地发现，如果我把我的多索引转换成专栏，我会更快乐，生活也会更轻松。与使用多索引的系列相比，使用数据框中的列可以做的事情多得多，事实上，它们本质上是一样的，只是数据框中的查询比使用系列中的查询要快得多，他们真的应该是头等公民（而不是相反）。这并没有回答“访问pandas.Series.apply中的索引”的问题，所以在DataFrame
上调用apply
时，它的索引可以通过每个系列的name
访问？我认为这对于DateTimeIndex
也是如此，但是使用类似于x.name==Time（2015-06-27 20:08:32.097333+00:00）的东西有点奇怪这应该是答案，采用x.name是解决这个问题最干净、最灵活的方法。参见本讨论，似乎x.name就是您要寻找的@PabloJadzinsky，我认为讨论的是数据帧而不是系列
s=pd.Series({'idx1': 'val1', 'idx2': 'val2'})

def use_index_and_value(row):
    return 'I made this with index {} and value {}'.format(row['index'], row[0])

s2 = s.reset_index().apply(use_index_and_value, axis=1)

# The new Series has an auto-index;
# You'll want to replace that with the index from the original Series
s2.index = s.index
s2

idx1    I made this with index idx1 and value val1
idx2    I made this with index idx2 and value val2
dtype: object

s=pd.Series({
    ('idx(0,0)', 'idx(0,1)'): 'val1',
    ('idx(1,0)', 'idx(1,1)'): 'val2'
})

def use_index_and_value(row):
    return 'made with index: {},{} & value: {}'.format(
        row['level_0'],
        row['level_1'],
        row[0]
    )

s2 = s.reset_index().apply(use_index_and_value, axis=1)

# Replace auto index with the index from the original Series
s2.index = s.index
s2

idx(0,0)  idx(0,1)    made with index: idx(0,0),idx(0,1) & value: val1
idx(1,0)  idx(1,1)    made with index: idx(1,0),idx(1,1) & value: val2
dtype: object