有了python中的熊猫,为什么';我不能用一个变量来改变多个数据帧吗?

有了python中的熊猫,为什么';我不能用一个变量来改变多个数据帧吗?,python,pandas,dataframe,Python,Pandas,Dataframe,前兆代码,两者相同: import pandas as pd train = pd.read_csv('train.csv') holdout = pd.read_csv('test.csv') def process_age(df): df["Age"] = df["Age"].fillna(-0.5) cut_points = [-1,0,5,12,18,35,60,100] label_names = ["Mis

前兆代码,两者相同:

import pandas as pd

train = pd.read_csv('train.csv')
holdout = pd.read_csv('test.csv')

def process_age(df):
    df["Age"] = df["Age"].fillna(-0.5)
    cut_points = [-1,0,5,12,18,35,60,100]
    label_names = ["Missing","Infant","Child","Teenager","Young Adult","Adult","Senior"]
    df["Age_categories"] = pd.cut(df["Age"],cut_points,labels=label_names)
    return df

def create_dummies(df,column_name):
    dummies = pd.get_dummies(df[column_name],prefix=column_name)
    df = pd.concat([df,dummies],axis=1)
    return df

train = process_age(train)
holdout = process_age(holdout)
for x in ["Age_categories", "Pclass", "Sex"]:
    train = create_dummies(train, x)
    holdout = create_dummies(holdout, x)
“正确”代码:

我想做的是:

from sklearn.preprocessing import minmax_scale
# The holdout set has a missing value in the Fare column which
# we'll fill with the mean.
holdout["Fare"] = holdout["Fare"].fillna(train["Fare"].mean())

columns = ["SibSp","Parch","Fare"]

for x in [train, holdout]:
    x['Embarked'] = x['Embarked'].fillna('S')
    x = create_dummies(x,'Embarked')
    for y in columns:
        x[y + '_scaled']= minmax_scale(x[y])
        x[y + '_scaled']= minmax_scale(x[y])

执行我想要使用的代码不会分配给我试图修改的数据帧。我在过去尝试过这种方法,但它当时不起作用,因此我只能假设您不能使用变量来代替数据帧名称。

首先,请记住,如果您想使用
for
访问和修改
列表中的元素,必须将此列表分配给一个变量,否则,您以后将无法访问修改的元素。所以,首先你需要这样的东西:

my_dataframes = [train, holdout]
接下来,当运行
for
循环时,Python会创建iterable项的副本。例如,如果您运行

my_words = ['hello', 'my', 'friend']
for word in my_words:
    word=word.upper()
    print('Modification:', word)
print(my_words)
您的输出将是:

Modification: HELLO
Modification: MY
Modification: FRIEND
['hello', 'my', 'friend']
无论列表项是什么,从字符串到数据帧,都会发生这种情况。如果要真正修改列表中的项目,必须访问它们的索引,或者创建一个新列表,将修改后的项目附加到该列表中

访问您的项目索引

my_dataframes = [train, holdout]
for i, df in enumerate(my_dataframes):
    df['Embarked'] = df['Embarked'].fillna('S')
    df = create_dummies(df,'Embarked')
    for y in columns:
        df[y + '_scaled']= minmax_scale(df[y])
        df[y + '_scaled']= minmax_scale(df[y])
    my_dataframes[i] = df
运行此代码将在原始列表中为您提供所需的数据帧

创建新列表

my_dataframes = [train, holdout]
modified_dataframes = []

for df in my_dataframes:
    df['Embarked'] = df['Embarked'].fillna('S')
    df = create_dummies(df,'Embarked')
    for y in columns:
        df[y + '_scaled']= minmax_scale(df[y])
        df[y + '_scaled']= minmax_scale(df[y])
    modified_dataframes.append(df)
在本例中,您将在
my\u DataFrames
中保留原始数据帧,并在
modified\u DataFrames
中获取新数据帧

希望这对你有用。如果您还有其他问题,请告诉我们

my_dataframes = [train, holdout]
modified_dataframes = []

for df in my_dataframes:
    df['Embarked'] = df['Embarked'].fillna('S')
    df = create_dummies(df,'Embarked')
    for y in columns:
        df[y + '_scaled']= minmax_scale(df[y])
        df[y + '_scaled']= minmax_scale(df[y])
    modified_dataframes.append(df)