Warning: file_get_contents(/data/phpspider/zhask/data//catemap/7/sqlite/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
pandas groupby.apply 0.23.4和0.24.2之间的差异与深度复制_Pandas_Pandas Groupby - Fatal编程技术网

pandas groupby.apply 0.23.4和0.24.2之间的差异与深度复制

pandas groupby.apply 0.23.4和0.24.2之间的差异与深度复制,pandas,pandas-groupby,Pandas,Pandas Groupby,当我将pandas的版本从0.23.4更新到0.24.2时,我注意到了一个奇怪的行为。以下代码段演示了这一点: CSV文件:(文件名:my_data_new.CSV) 片段: import pandas as pd print("PANDAS-VERSION:", pd.__version__) def my_func(d): d_copy = d.copy(deep=True) return d_copy data = pd.read_csv(&quo

当我将pandas的版本从0.23.4更新到0.24.2时,我注意到了一个奇怪的行为。以下代码段演示了这一点: CSV文件:(文件名:my_data_new.CSV)

片段:

import pandas as pd

print("PANDAS-VERSION:", pd.__version__)

def my_func(d):
    d_copy = d.copy(deep=True)
    return d_copy

data = pd.read_csv("~/my_data_new.csv", parse_dates=['date'], index_col=['date']).sort_index()
result = data.groupby('name').apply(my_func)
print(result)
输出: 在版本-0.23.4中:

PANDAS-VERSION: 0.23.4
                         name id  roll    sub_1    sub_2    sub_3
name date                                                        
AAA  2016-11-30 08:00:00  AAA  A     1  123.456  123.456  123.456
     2016-11-30 09:00:00  AAA  A     1  123.457  123.457  123.457
     2016-11-30 10:00:00  AAA  A     1  123.458  123.458  123.458
     2016-11-30 11:00:00  AAA  A     1  123.459  123.459  123.459
BBB  2016-11-30 12:00:00  BBB  B     2  123.451  123.456  123.456
     2016-11-30 13:00:00  BBB  B     2  123.452  123.457  123.457
     2016-11-30 14:00:00  BBB  B     2  123.453  123.458  123.458
     2016-11-30 15:00:00  BBB  B     2  123.454  123.459  123.459
在版本-0.24.2中:

PANDAS-VERSION: 0.24.2
                         name id  roll    sub_1    sub_2    sub_3
name date                                                        
AAA  2016-11-30 12:00:00  AAA  A     1  123.456  123.456  123.456
     2016-11-30 13:00:00  AAA  A     1  123.457  123.457  123.457
     2016-11-30 14:00:00  AAA  A     1  123.458  123.458  123.458
     2016-11-30 15:00:00  AAA  A     1  123.459  123.459  123.459
BBB  2016-11-30 12:00:00  BBB  B     2  123.451  123.456  123.456
     2016-11-30 13:00:00  BBB  B     2  123.452  123.457  123.457
     2016-11-30 14:00:00  BBB  B     2  123.453  123.458  123.458
     2016-11-30 15:00:00  BBB  B     2  123.454  123.459  123.459
我的意见如下: 在pandas-v0.24.2中,最后一组df的索引(在当前情况下“
BBB
”)将应用于所有先前的组df(在当前情况下“
AAA
”),而在pandas-0.23.4中,保留先前的索引


这是一种记录在案的行为吗?如果是这样,请告诉我回购中发行说明/代码中的修改。

此问题已在此处报告: 还有一个观察结果是,只有当索引的数据类型为datetime64[ns]时,才会发生这种情况,而如果索引的数据类型为obj,则不会发生这种情况

data = pd.read_csv("~/my_data_new.csv")
data['date'] = pd.to_datetime(data['date'])
data = data.set_index(['date']).sort_index()
result = data.groupby('name').apply(my_func)
print(result)
上述结果将是:

PANDAS-VERSION: 0.24.2
                         name id  roll    sub_1    sub_2    sub_3
name date                                                        
AAA  2016-11-30 12:00:00  AAA  A     1  123.456  123.456  123.456
     2016-11-30 13:00:00  AAA  A     1  123.457  123.457  123.457
     2016-11-30 14:00:00  AAA  A     1  123.458  123.458  123.458
     2016-11-30 15:00:00  AAA  A     1  123.459  123.459  123.459
BBB  2016-11-30 12:00:00  BBB  B     2  123.451  123.456  123.456
     2016-11-30 13:00:00  BBB  B     2  123.452  123.457  123.457
     2016-11-30 14:00:00  BBB  B     2  123.453  123.458  123.458
     2016-11-30 15:00:00  BBB  B     2  123.454  123.459  123.459
PANDAS-VERSION: 0.24.2
                         name id  roll    sub_1    sub_2    sub_3
name date                                                        
AAA  2016-11-30 08:00:00  AAA  A     1  123.456  123.456  123.456
     2016-11-30 09:00:00  AAA  A     1  123.457  123.457  123.457
     2016-11-30 10:00:00  AAA  A     1  123.458  123.458  123.458
     2016-11-30 11:00:00  AAA  A     1  123.459  123.459  123.459
BBB  2016-11-30 12:00:00  BBB  B     2  123.451  123.456  123.456
     2016-11-30 13:00:00  BBB  B     2  123.452  123.457  123.457
     2016-11-30 14:00:00  BBB  B     2  123.453  123.458  123.458
     2016-11-30 15:00:00  BBB  B     2  123.454  123.459  123.459
如果我执行以下操作,则不会发生这种情况:

data = pd.read_csv("~/my_data_new.csv")
data = data.set_index(['date']).sort_index()
result = data.groupby('name').apply(my_func)
print(result)
上述代码的结果将是:

PANDAS-VERSION: 0.24.2
                         name id  roll    sub_1    sub_2    sub_3
name date                                                        
AAA  2016-11-30 12:00:00  AAA  A     1  123.456  123.456  123.456
     2016-11-30 13:00:00  AAA  A     1  123.457  123.457  123.457
     2016-11-30 14:00:00  AAA  A     1  123.458  123.458  123.458
     2016-11-30 15:00:00  AAA  A     1  123.459  123.459  123.459
BBB  2016-11-30 12:00:00  BBB  B     2  123.451  123.456  123.456
     2016-11-30 13:00:00  BBB  B     2  123.452  123.457  123.457
     2016-11-30 14:00:00  BBB  B     2  123.453  123.458  123.458
     2016-11-30 15:00:00  BBB  B     2  123.454  123.459  123.459
PANDAS-VERSION: 0.24.2
                         name id  roll    sub_1    sub_2    sub_3
name date                                                        
AAA  2016-11-30 08:00:00  AAA  A     1  123.456  123.456  123.456
     2016-11-30 09:00:00  AAA  A     1  123.457  123.457  123.457
     2016-11-30 10:00:00  AAA  A     1  123.458  123.458  123.458
     2016-11-30 11:00:00  AAA  A     1  123.459  123.459  123.459
BBB  2016-11-30 12:00:00  BBB  B     2  123.451  123.456  123.456
     2016-11-30 13:00:00  BBB  B     2  123.452  123.457  123.457
     2016-11-30 14:00:00  BBB  B     2  123.453  123.458  123.458
     2016-11-30 15:00:00  BBB  B     2  123.454  123.459  123.459

您应该真正更新到当前版本(1.1.1)。这是许多版本的背后。这在当前版本中不是问题。也许这是相关的。谢谢你的评论。但我的问题是索引。为什么将最后一个组DF的索引复制到其他组DF?