Python 熊猫：彼此分开两排_Python_Pandas

Python 熊猫：彼此分开两排

python pandas

Python 熊猫：彼此分开两排,python,pandas,Python,Pandas,以下是我的数据框中的两行： >>> test.loc[test.index.year == 2009] 0 1 2 3 4 \ date 2009-01-01 252.855283 353.6261 556.295659

以下是我的数据框中的两行：

>>> test.loc[test.index.year == 2009]
                     0         1           2           3           4  \
date                                                                   
2009-01-01  252.855283  353.6261  556.295659  439.558188  432.936844   

                     5           6  employment  
date                                            
2009-01-01  439.437132  433.269903   64.116667 

>>> test.loc[test.index.year == 2007]
                     0           1           2           3           4  \
date                                                                     
2007-01-01  269.277757  380.608002  401.765546  491.893821  433.864499   

                     5           6  employment  
date                                            
2007-01-01  492.396073  489.260588     69.1375

当我尝试划分时，我得到了

>>> test.loc[test.index.year == 2009].divide(test.loc[test.index.year == 2007])
             0   1   2   3   4   5   6  employment
date                                              
2007-01-01 NaN NaN NaN NaN NaN NaN NaN         NaN
2009-01-01 NaN NaN NaN NaN NaN NaN NaN         NaN

它来自

pandas

尝试划分比较索引的列。但是，

axis=

中的选项对我没有帮助。我可以通过这样做得到正确的结果

test.loc[test.index.year == 2009].values / test.loc[test.index.year == 2007].values
array([[ 0.93901288,  0.92910842,  1.38462759,  0.8936038 ,  0.99786188,
         0.89244646,  0.88556061,  0.92737902]])

没有比这更好的方法了吗？我想保持索引

2007-01-01

与记录相对应-当然，我可以将其重新附加到值，但通常当我尝试做这类事情时，有我的方法，然后有正确的方法。那么：我还能做什么呢？

也许：

test = pd.DataFrame(np.random.randn(2,5), index=[pd.Timestamp('2007-01-01'), pd.Timestamp('2008-01-01')])

>>> test.loc[test.index.year == 2007].divide(test.loc[test.index.year == 2008].values)
               0         1         2         3         4
2007-01-01  0.496822 -1.198635  0.222452  0.688838  0.256559

如果你想保留2007年的指数，我想你可以：

df.loc[df.index.year == 2007]/df.loc[df.index.year == 2009].values

df.loc[df.index.year==2007]/df.loc[df.index.year==2009]

或

df.loc[df.index.year==2007].divide（df.loc[df.index.year==2009]）

不起作用的原因是

pandas

试图按索引对齐数据。在这种情况下，2007年的数据将除以指数值为2007的数据（同样适用于2009年）。这就是为什么你会得到，2，而不仅仅是1行

Nan

因此，我们需要将它们中的一个转换为各自的

np.array

，以使其工作。（

df.loc[df.index.year==2007]/df.loc[df.index.year==2009]。值

）。分子的索引，因为它未被触及，所以被保留

@EdChum，我不认为这是一个bug，我认为这是布尔索引的预期行为，考虑到这一点：

df.iloc[df.index.year>=2007]/df.loc[df.index.year == 2007]
             0   1   2   3   4   5   6  employment
date                                              
2007-01-01   1   1   1   1   1   1   1           1
2009-01-01 NaN NaN NaN NaN NaN NaN NaN         NaN

但是您应该注意这种方法，因为您可能会从布尔索引中获得多行，请参见以下两个示例：

In [128]:

print df
                     0           1           2           3           4  \
2007-12-31  252.855283  353.626100  556.295659  439.558188  432.936844   
2008-12-31  269.277757  380.608002  401.765546  491.893821  433.864499   
2009-12-31  269.277757  380.608002  401.765546  491.893821  433.864499   

                     5           6          7  
2007-12-31  439.437132  433.269903  64.116667  
2008-12-31  492.396073  489.260588  69.137500  
2009-12-31  492.396073  489.260588  69.137500  
In [130]:

print df.iloc[df.index.year==2007]/df.loc[df.index.year >= 2007]
#divide one row by 3 rows? Dimension mismatch? No, it will work just fine.
             0   1   2   3   4   5   6   7
2007-12-31   1   1   1   1   1   1   1   1
2008-12-31 NaN NaN NaN NaN NaN NaN NaN NaN
2009-12-31 NaN NaN NaN NaN NaN NaN NaN NaN
In [131]:

df.iloc[df.index.year==2007]/df.loc[df.index.year >= 2007].values
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
**************
ValueError: Shape of passed values is (8, 3), indices imply (8, 1)
#basically won't work due to dimension mismatch

不仅仅是

test.loc[test.index.year==2009]/test.loc[test.index.year==2007]

有效吗？@EdChum:不，这给了我与

divide（）

test.div（test.shift（））

一样的效果，但可能方便也可能不方便，这取决于你到底想做什么。这对我来说是个bug，您的索引是数据类型datetimeindex，当执行loc并尝试对其进行除法时，会生成NaN，如果您这样做，那么它将工作

df.iloc[3]/df.iloc[5]

，并生成正确的结果。因此，您可以重置索引并找到具有该值的行，然后执行除法，或者尝试使用整数索引值来选择该行OP已经声明他们已经尝试了该操作，并且works@EdChum--不同的是，Alexander只在第二部分使用

值，而FooBar在这两部分都使用。因此，Alexander的方法将其保持为一个系列/数据帧。他应该显示输出，结果会更明显。公平地说，他没有发现这一点，但我认为这是一个错误，因为如果索引的数据类型是Int64，或者如果您使用iloc
，那么这是一个错误，但在该示例中，您显式地选择了多个值，这里我们为除法选择了单行值，天真的假设是，这将执行一个简单的除法，我想这是出乎意料的，但我的猜测是，布尔索引可能返回多行，而不是显式索引，如df.iloc[3]/df.iloc[5]
，它只返回一行。如果这是设计师的意图，我认为有两种不同的行为更有意义。但无论如何，这是他们的要求。。。OP的方法可能仍然是危险的，请参见编辑。我已经评估了每个人对此的贡献，我认为你是正确的，这是有意义的，当使用唯一的int索引和使用loc
选择行时，这是一个微妙的事情，不会伤害你。你能看看我的问题吗[