Python 熊猫透视并为重复项创建额外列
所以我有一些重复索引的数据和我想要的列。范例Python 熊猫透视并为重复项创建额外列,python,pandas,Python,Pandas,所以我有一些重复索引的数据和我想要的列。范例 df = pd.DataFrame({ "id":[1,1,1,2,2,3,3,3], "contact_type":["email","phone","phone","email","mobile","email","phone","mobile"], "contact":["a@a.ca","123","456","b@b.com","78432","
df = pd.DataFrame({
"id":[1,1,1,2,2,3,3,3],
"contact_type":["email","phone","phone","email","mobile","email","phone","mobile"],
"contact":["a@a.ca","123","456","b@b.com","78432","c@c.ca","12","12"]
})
我要做的是使每个ID都是一行。我的理想输出是
ID email phone phone.1 mobile
1 a@a.ca 123 456 NaN
2 b@b.com NaN NaN 78432
3 c@c.ca 12 NaN 12
尝试使用df.pivotid,联系人类型,联系人给我一个错误索引,包含重复项,无法重塑。问题是它似乎不像ID 1有两个电话的联系方式。那么,有没有其他方法可以将数据转换成这种格式 我认为您必须逐个组装最终的数据帧pd.concat,因为您事先不知道一个ID最多可以有多少个不同的电话号码。假设每个ID最多只有一个电子邮件或手机号码:
In [130]:
df_mail = df.ix[df.contact_type=='email', ['contact', 'id']].set_index('id')
In [131]:
df_mobile = df.ix[df.contact_type=='mobile', ['contact', 'id']].set_index('id')
In [132]:
df_phone = df.ix[df.contact_type=='phone', ['contact', 'id']]
In [133]:
# make a columns stores 'phone0', 'phone1' and so on:
df_phone['field'] = 'Phone' + df_phone.groupby('id').transform(lambda x: range(len(x))).contact.map(str)
In [134]:
df_phone = df_phone.pivot('id', 'field', 'contact')
In [135]:
df_mail.columns = ['Email']
df_mobile.columns = ['Mobile']
In [136]:
print pd.concat((df_mail, df_phone, df_mobile), axis=1)
Email Phone0 Phone1 Mobile
id
1 a@a.ca 123 456 NaN
2 b@b.com NaN NaN 78432
3 c@c.ca 12 NaN 12