Python 操作后在循环中保存数据帧
我有一个循环,它获取一系列现有的数据帧并操纵它们的格式和值。我需要知道如何在循环结束时创建包含修改内容的新数据帧 示例如下:Python 操作后在循环中保存数据帧,python,pandas,dataframe,Python,Pandas,Dataframe,我有一个循环,它获取一系列现有的数据帧并操纵它们的格式和值。我需要知道如何在循环结束时创建包含修改内容的新数据帧 示例如下: import pandas as pd # Create datasets First = {'GDP':[200,175,150,100]} Second = {'GDP':[550,200,235,50]} # Create old_dataframes old_df_1 = pd.DataFrame(First) old_df_2 = pd.DataFrame(
import pandas as pd
# Create datasets
First = {'GDP':[200,175,150,100]}
Second = {'GDP':[550,200,235,50]}
# Create old_dataframes
old_df_1 = pd.DataFrame(First)
old_df_2 = pd.DataFrame(Second)
# Define references and dictionary
old_dfs = [old_df_1, old_df_2]
new_dfs = ['new_df_1','new_df_2']
dictionary = {}
# Begin Loop
for df, name in zip(old_dfs, new_dfs):
# Multiply all GDP values by 1.5 in both dataframes
df = df * 1.5
# ISSUE HERE - Supposed to Create new data frames 'new_df_1' & 'new_df_2' containing df*1.5 values: Only appends to dictionary. Does not create new_df_1 & new_df_2
dictionary[name] = df
# Check for the existance of 'new_df_1 & new_df_2' (They will not appear)
%who_ls DataFrame
问题:我已标记了上述问题。我的代码不会创建“新数据帧1”和“新数据帧2”。它只是将它们添加到字典中。我需要能够创建新的_df_1和新的_df_2作为单独的数据帧。
from copy import deepcopy # to copy old dataframes appropriately
# create 2 lists, first holds old dataframes and second holds modified ones
old_dfs_list, new_dfs_list = [pd.DataFrame(First), pd.DataFrame(Second)], []
# process old dfs one by one by iterating over old_dfs_list,
# copy, modify each and append it to list of new_dfs_list with same index as
# old df ... so old_dfs_list[1] is mapped to new_dfs_list[1]
for i in range(len(old_dfs_list)):
# a deep copy prevent changing old dfs by reference
df_deep_copy = deepcopy(old_dfs_list[i])
df_deep_copy['GDP'] *= 1.5
new_dfs_list.append(df_deep_copy)
print( old_dfs_list[0] ) # to check that old dfs are not changed
print( new_dfs_list[0] )
您也可以尝试使用字典而不是列表来使用您喜欢的名称:
import pandas as pd
datadicts_dict = {
'first' :{'GDP':[200,175,150,100]},
'second':{'GDP':[550,200,235,50]},
'third' :{'GDP':[600,400,520,100, 800]}
}
# Create datasets and store it in a python dictionary
old_dfs_dict, new_dfs_dict = {}, {} # initialize 2 dicts to hold original and modified dataframes
# process datasets one by one by iterating over datadicts_dict,
# convert to df save it in old_dfs_dict with same name as the key
# copy, modify each and put it in new_dfs_dict with same key
# so dataset of key 'first' in datadicts_dict is saved as old_dfs_dict['first']
# modified and mapped to new_dfs_dict['first']
for dataset_name, data_dict in datadicts_dict.items():
old_dfs_dict[dataset_name] = pd.DataFrame({'GDP':data_dict['GDP']})
new_dfs_dict[dataset_name] = pd.DataFrame({'GDP':data_dict['GDP']}) * 1.5
print( old_dfs_dict['third'] ) # to check that old dfs are not changed
print( new_dfs_dict['third'] )
通过思考上面的答案,我最终无意中找到了一个可行的解决方案。我面临的问题是从字典中提取附加数据。我真的没有想到我可以从循环之外的字典中提取数据,然后形成数据帧
.
.
.
# Begin Loop
for df, name in zip(old_dfs, new_dfs):
# Multiply all GDP values by 1.5 in both dataframes
df = df * 1.5
# ISSUE HERE - Supposed to Create new data frames 'new_df_1' & 'new_df_2' containing df*1.5 values: Only appends to dictionary. Does not create new_df_1 & new_df_2
dictionary[name] = df
# Solution - Extract from Dictionary and form Dataframe
new_df_1 = pd.DataFrame.from_dict(dictionary['new_df_1'])
new_df_2 = pd.DataFrame.from_dict(dictionary['new_df_2'])
# Check for the existance of 'new_df_1 & new_df_2'
%who_ls DataFrame
你能给出你的输入和预期输出的样本吗?这样一来,当循环被说出来时,你就可以更清楚地看到你在dfs中需要什么,并且不再担心我的问题。我刚刚创建了一个例子供大家学习。我希望这是清楚的。如果运行正确,最后一个命令%who_ls DataFrame应该返回['df'、'old_df_1'、'old_df_2'、'new_df_1'、'new_df_2']您发布的循环代码有什么问题?我的代码的问题是我的代码没有创建'new_df_1'和'new_df_2'作为数据帧。行字典[name]=df只是将它们全部附加到字典中。我不知道如何在循环中创建dfs。如果您运行我的代码并使用%who_ls_DataFrame检查新的数据帧,您将找不到“new_df_1&new_df_2使用数据帧字典有什么问题?实际上,这是首选的方法,而不是用许多数据帧淹没您的全局环境,而是使用一个数据帧的索引容器。如果数据帧存储在dict、list、tuple等中,则不会丢失任何功能。这种方法的问题在于它将数据帧附加在一起。我需要创建单独的数据帧“new_df_1”和“new_df_2”。你能告诉我如何调整/重写我的命令[dictionary[name]=df]吗。我只使用这种方法来引用数据帧的名称。我实际上不想将数据存储在字典中。如果需要,您有新的\u dfs\u列表[0]、新的\u dfs\u列表[1]和更多。数据帧不会附加在一起,而是保存在数据帧列表中:new_dfs_list。例如,如果您使用4个数据帧初始化了old_dfs_列表:old_dfs_列表[0]、old_dfs_列表[1]、old_dfs_列表[2]、old_dfs_列表[3]。。。您的新列表还将包含新的4个修改数据帧。假设您有100个旧数据帧。。。创建100个名为new_df_1,new_df_2,new_df_100而是创建一个列表new_dframes并将其索引为:new_dframes[0],…,new_dframes[99]您不需要来自dict的
,,但可以简单地分配:new_df_1=dictionary['new_df_1']
。但正如上面所评论的,数据帧字典是存储许多类似的结构化数据帧的首选方法,在这些数据帧中不会丢失任何功能:dictionary['new_df_1'].head()
,dictionary['new_df_1'].tail()
,dictionary['new_df_1'].description()
。。。