Python正在使用CopyWarning进行设置-iloc与loc-无法找出原因_Python_Pandas_View_Copy_Warnings

Python正在使用CopyWarning进行设置-iloc与loc-无法找出原因

python pandas view

Python正在使用CopyWarning进行设置-iloc与loc-无法找出原因,python,pandas,view,copy,warnings,Python,Pandas,View,Copy,Warnings,我对设置CopyWarning有基本的了解，但我不知道为什么我会在这个特殊的案例中得到警告我正在遵循来自的代码当我运行下面的代码（使用.loc）时，我没有得到copywarning的设置但是，如果我使用.iloc运行代码，我确实会收到警告有人能帮我理解吗从sklearn.model_选择导入分层hufflesplit 剥离=分层剥离剥离（n\u剥离=1，测试大小=0.2，随机状态=42）对于序列指数，在分割中测试分割指数（住房，住房[“收入”类]）： strat\u train\u

我对设置CopyWarning有基本的了解，但我不知道为什么我会在这个特殊的案例中得到警告

我正在遵循来自的代码

当我运行下面的代码（使用.loc）时，我没有得到copywarning的设置

但是，如果我使用.iloc运行代码，我确实会收到警告

有人能帮我理解吗

从sklearn.model_选择导入分层hufflesplit
剥离=分层剥离剥离（n\u剥离=1，测试大小=0.2，随机状态=42）
对于序列指数，在分割中测试分割指数（住房，住房[“收入”类]）：
strat\u train\u set=外壳.loc[列索引]
strat_test_set=外壳.loc[测试索引]
对于设置（层列组、层测试组）：
设置下降（“收入猫”，轴=1，原地=真）

这里的问题不是因为索引，

iloc

和

loc

在这里也会以同样的方式为您工作。问题出在

set.drop（“income\u cat”，axis=1，inplace=True）

。看起来在

set\u

数据帧和

strat\u train\u集

和

strat\u test\u集

之间存在弱引用

for set_ in (strat_train_set, strat_test_set):
         print(set_._is_copy)

通过这一点，您可以：

<weakref at 0x128b30598; to 'DataFrame' at 0x128b355c0>
<weakref at 0x128b30598; to 'DataFrame' at 0x128b355c0>

这可能导致

设置CopyWarning

，因为它试图转换数据帧的副本，并将这些更改应用于原始数据帧。

我做了一些探索，根据我的理解，这就是

设置CopyWarning

：每次当数据帧

df

是从另一个帧

df_orig

创建的，

pandas

采用一些启发式方法来确定数据是否可以从
df_orig
隐式复制，而经验较少的用户可能不知道。如果是这样，则
\u is\u copy
字段的
df
设置为
df\u orig
。稍后，当尝试对
df
进行就地更新时，
pandas
将根据
df.\u is\u copy
以及
df
的一些其他字段确定是否应显示带有copywarning的
设置。然而，由于一些方法在不同的场景中共享，因此启发法并不完美，有些情况可能会处理不当在post的代码中，housing.loc[train\u index] 和housing.iloc[train\u index] 返回数据帧的隐式副本 for df in (housing.loc[train_index], housing.iloc[train_index]): print(df._is_view, df._is_copy) 上述检查产生以下结果： False None False <weakref at 0x0000019BFDF37958; to 'DataFrame' at 0x0000019BFDF26550> 方法（2）如下所示： # Updates the housing data frame in-place before slicing income_cat = housing["income_cat"] housing.drop("income_cat", axis=1, inplace=True) for train_index, test_index in split.split(housing, income_cat): strat_train_set = housing.loc[train_index] strat_test_set = housing.loc[test_index] feature_cols = housing.columns.difference(["income_cat"]) for train_index, test_index in split.split(housing, housing["income_cat"]): # Filter columns at the same time as slicing the rows strat_train_set = housing.loc[train_index, feature_cols] strat_test_set = housing.loc[test_index, feature_cols] for train_index, test_index in split.split(housing, housing["income_cat"]): ... for set_ in (strat_train_set, strat_test_set): # Remove "inplace=True" results in a copy being made set_.drop("income_cat", axis=1) 方法（3）如下所示： # Updates the housing data frame in-place before slicing income_cat = housing["income_cat"] housing.drop("income_cat", axis=1, inplace=True) for train_index, test_index in split.split(housing, income_cat): strat_train_set = housing.loc[train_index] strat_test_set = housing.loc[test_index] feature_cols = housing.columns.difference(["income_cat"]) for train_index, test_index in split.split(housing, housing["income_cat"]): # Filter columns at the same time as slicing the rows strat_train_set = housing.loc[train_index, feature_cols] strat_test_set = housing.loc[test_index, feature_cols] for train_index, test_index in split.split(housing, housing["income_cat"]): ... for set_ in (strat_train_set, strat_test_set): # Remove "inplace=True" results in a copy being made set_.drop("income_cat", axis=1) 除了更改更新方法的in place 设置外，还有一种方法可用于生成“显式”副本。如果要更改df 的一列或多列，请使用创建副本，而不是df[“col”]=… 使用iloc 后是否尝试重置索引？如果您正在分析一个子集，尤其是在同一数据帧中创建/更新/计算新值（该数据帧已被剪切为原始数据的一个子集），则该警告会显示出来。