Python 如何创建在集合上迭代的数据帧？_Python_Pandas_Loops_Set

Python 如何创建在集合上迭代的数据帧？

python pandas loops

Python 如何创建在集合上迭代的数据帧？,python,pandas,loops,set,Python,Pandas,Loops,Set,我有这个数据框： d = {'city':['Barcelona','Madrid','Rome','Torino','London','Liverpool','Manchester','Paris'], 'country': ['ES','ES','IT','IT','UK','UK','UK','FR'], 'revenue': [1,2,3,4,5,6,7,8], 'amount': [8,7,6,5,4,3,2,1] df = pd.DataFrame(d) 我想为每个国家获得以下信息

我有这个数据框：

d = {'city':['Barcelona','Madrid','Rome','Torino','London','Liverpool','Manchester','Paris'],
'country': ['ES','ES','IT','IT','UK','UK','UK','FR'],
'revenue': [1,2,3,4,5,6,7,8],
'amount': [8,7,6,5,4,3,2,1]
df = pd.DataFrame(d)

我想为每个国家获得以下信息：

españa = {'city':['Barcelona','Madrid']
          'revenue':[1,2]
          'amount':[8,7]}
 ES = pd.DataFrame(españa)

因此，最后我将有4个数据帧，分别命名为ES、IT、UK和FR

到目前为止，我已经尝试过：

a = set(df.loc[:]["country"])
for country in a:
    country = df.loc[(df["country"]== country),['date','sum']]

但这只给了我一个数据帧和一个值。

Country是一个被过度写入的迭代器变量

要生成4个不同的数据帧，请尝试使用生成器函数

def国家/地区df发生器（数据）：
对于数据中的国家['country']unique（）：
收益率df.loc[（df[“国家]==国家），[“日期”，“总和]]
countries=country\u df\u生成器（数据）

countries是一个过度写入的迭代器变量

要生成4个不同的数据帧，请尝试使用生成器函数

def国家/地区df发生器（数据）：
对于数据中的国家['country']unique（）：
收益率df.loc[（df[“国家]==国家），[“日期”，“总和]]
countries=country\u df\u生成器（数据）

您可以通过

groupby

使用字典理解：

res = {k: v.drop('country', 1) for k, v in df.groupby('country')}

print(res)

{'ES':    amount       city  revenue
       0       8  Barcelona        1
       1       7     Madrid        2,
 'FR':    amount   city  revenue
       7       1  Paris        8,
 'IT':    amount    city  revenue
       2       6    Rome        3
       3       5  Torino        4,
 'UK':    amount        city  revenue
       4       4      London        5
       5       3   Liverpool        6
       6       2  Manchester        7}

您可以将字典理解与

groupby

一起使用：

res = {k: v.drop('country', 1) for k, v in df.groupby('country')}

print(res)

{'ES':    amount       city  revenue
       0       8  Barcelona        1
       1       7     Madrid        2,
 'FR':    amount   city  revenue
       7       1  Paris        8,
 'IT':    amount    city  revenue
       2       6    Rome        3
       3       5  Torino        4,
 'UK':    amount        city  revenue
       4       4      London        5
       5       3   Liverpool        6
       6       2  Manchester        7}

循环提供了所有四个数据帧，但将前三个数据帧扔进了垃圾箱

使用变量

country

迭代

，然后在下一条语句

country=…

中销毁该值。然后返回循环的顶部，将

国家

重置为下一个两个字母的缩写，并在所有四个国家继续此冲突

如果需要四个数据帧，则需要将每个数据帧保存在单独的位置。例如：

a = set(df.loc[:]["country"])
df_dict = {}

for country in a:
    df_dict[country] = df.loc[(df["country"]== country),['date','sum']]

现在您有了一个包含四个数据帧的字典，每个数据帧都按其国家代码编制索引。

这有帮助吗？

循环提供了所有四个数据帧，但您将前三个数据帧扔进了垃圾箱

使用变量

country

迭代

，然后在下一条语句

country=…

中销毁该值。然后返回循环的顶部，将

国家

重置为下一个两个字母的缩写，并在所有四个国家继续此冲突

如果需要四个数据帧，则需要将每个数据帧保存在单独的位置。例如：

a = set(df.loc[:]["country"])
df_dict = {}

for country in a:
    df_dict[country] = df.loc[(df["country"]== country),['date','sum']]

现在您有了一个包含四个数据帧的字典，每个数据帧都按其国家代码编制索引。

这有用吗？

我已经尝试过你的解决方案，但它不起作用。我可以运行代码，但没有获得任何数据帧（或变量）。如果我打印国家，我会得到。Countrie的类型是generator。是的，它返回一个generator对象。如果您迭代生成器，它将生成所需的对象

countries=list（country\u df\u generator（data））

将为您提供一个有形的列表，因为这是您喜欢的。我尝试了您的解决方案，但它不起作用。我可以运行代码，但没有获得任何数据帧（或变量）。如果我打印国家，我会得到。Countrie的类型是generator。是的，它返回一个generator对象。如果您迭代生成器，它将生成所需的对象

countries=list（country\u df\u generator（data））

将为您提供一个有形的列表，因为这是您喜欢的。