Python 从聚合的数据帧创建新的数据帧_Python_Pandas_Dataframe

Python 从聚合的数据帧创建新的数据帧

python pandas dataframe

Python 从聚合的数据帧创建新的数据帧,python,pandas,dataframe,Python,Pandas,Dataframe,我有一个数据框架，它按位置聚合了人，就像这样 location_id | score | number_of_males | number_of_females 1 | 20 | 2 | 1 2 | 45 | 1 | 2 我想创建一个新的数据帧，它未聚合这个数据帧，所以我得到如下结果 location_id | score | number_of_male

我有一个数据框架，它按位置聚合了人，就像这样

location_id | score | number_of_males | number_of_females
     1      |  20   |        2        |         1
     2      |  45   |        1        |         2

我想创建一个新的数据帧，它未聚合这个数据帧，所以我得到如下结果

location_id | score | number_of_males | number_of_females
     1      |  20   |        1        |         0
     1      |  20   |        1        |         0
     1      |  20   |        0        |         1
     2      |  45   |        1        |         0
     2      |  45   |        0        |         1
     2      |  45   |        0        |         0

甚至更好

location_id | score |       sex 
     1      |  20   |       male       
     1      |  20   |       male    
     1      |  20   |       female
     2      |  45   |       male
     2      |  45   |       female
     2      |  45   |       female

我想做一些像

import pandas as pd
aggregated_df = pd.DataFrame.from_csv(SOME_PATH)
unaggregated_df = df = pd.DataFrame(columns=['location_id', 'score', 'sex'])

for row in aggregated_df:
  for column in ['number_of_males', 'number_of_females']:
    for number_of_people in range(0, row[column]):
      if column == 'number_of_males':
        sex = 'male'
      else:
        sex = 'female'
      unaggregated_df.append([{'location_id': row['location_id'],
                              'score': row['score'],
                              'sex': sex}],
                             ignore_index=True)

我很难让dict追加，即使这似乎在中得到了支持

有没有一种更具pandthonic（熊猫版的pythonic）的方法来实现这一点？

直到我可以在一个大熊猫中实现这一点

print df
location_id  score  number_of_males  number_of_females
     1        20           2                 1
     2        45           1                 2

将两列转换为一列

df.set_index(['location_id','score']).stack().reset_index()
Out[102]: 
   location_id  score            level_2  0
0            1     20    number_of_males  2
1            1     20  number_of_females  1
2            2     45    number_of_males  1
3            2     45  number_of_females  2

但是我必须使用python循环进行迭代以增加行数：（

下面是一种使用

group\u by

获得结果的方法：

ids = ['location_id','score']

def foo(d):
    return pd.Series(d['number_of_males'].values*['male'] + 
                     d['number_of_females'].values*['female'])

pd.melt(df.groupby(ids).apply(foo).reset_index(), id_vars=ids).drop('variable', 1)

#Out[13]:
#   location_id  score   value
#0            1     20    male
#1            2     45    male
#2            1     20    male
#3            2     45  female
#4            1     20  female
#5            2     45  female

@对不起，我一直在旅行，没有机会测试。我没有忘记：）谢谢你花时间回复。不用担心，没有火灾：）测试并接受/拒绝解决方案！我最终在我的真实数据集中使用了其中的一个子集。groupby（ids）.apply（foo）.reset_index（），删除垃圾列并重命名最后一列。你的是玩具数据，但不是我的真实数据。谢谢你给我指明了正确的方向。将此注释留给其他人（或未来的我）的后代。将数字数组乘以列表与字符串相乘，现在会出现错误ufunc“multiply”不包含签名匹配类型dtype（'S21'）dtype（'S21'）dtype（'S21'）的循环。为了解决这个问题，我更新了expand函数，使其具有一个总的三重for循环：def expand_student（d）：list_student_rows=[]对于组标签，zip中的student_标签（组标签，学生标签）：对于d中的值[group_label]。值：对于范围内的x（0，值）：list_student_rows.extend（[student_label]）返回pd.序列（列表\学生\行）