Python支持包含重复值的多个字符串列而不进行聚合
我有以下数据帧,希望以一种方式将列Python支持包含重复值的多个字符串列而不进行聚合,python,pandas,dataframe,Python,Pandas,Dataframe,我有以下数据帧,希望以一种方式将列Imprv\u属性转换为每个键的单个列,并且值应该是Imprv\u Attr\u Desc。我还需要improv\u Attr\u Units信息,对于每个新创建的列,例如Bathrooms的improv\u Attr\u Units应该有自己的名为Bathrooms\u improv\u Attr\u Units的列 | | Parcel | Imprv_Attribute | Imprv_Attr_Desc | Imprv_Attr
Imprv\u属性
转换为每个键的单个列,并且值应该是Imprv\u Attr\u Desc
。我还需要improv\u Attr\u Units
信息,对于每个新创建的列,例如Bathrooms
的improv\u Attr\u Units
应该有自己的名为Bathrooms\u improv\u Attr\u Units
的列
| | Parcel | Imprv_Attribute | Imprv_Attr_Desc | Imprv_Attr_Units |
| --- | ------------- | --------------- | ----------------- | ---------------- |
| 0 | 00002-000-000 | Bathrooms | 2.0-Baths | 1.0 |
| 1 | 00002-000-000 | Bedrooms | 2-2 BEDROOMS | 1.0 |
| 2 | 00002-000-000 | Exterior Wall | 13-PRE-FAB PANEL | 100.0 |
| 3 | 00002-000-000 | Floor Cov | 08-SHEET VINYL | 20.0 |
| 4 | 00002-000-000 | Floor Cov | 14-CARPET | 80.0 |
| 5 | 00011-000-000 | Bathrooms | 3.0-Baths | 1.0 |
| 6 | 00011-000-000 | Bedrooms | 3-3 BEDROOMS | 1.0 |
| 7 | 00011-000-000 | Exterior Wall | 15-CONCRETE BLOCK | 60.0 |
| 8 | 00011-000-000 | Exterior Wall | 20-FACE BRICK | 40.0 |
| 9 | 00011-000-000 | Floor Cov | 14-CARPET | 100.0 |
我的最终结果应该如下所示:
| Parcel | Bathrooms | Bathrooms_Imprv_Attr_Units | Bedrooms | Bedrooms_Imprv_Attr_Units | Exterior Wall | Exterior Wall_Imprv_Attr_Units | Floor Cov | Floor Cov_Imprv_Attr_Unit |
| ------------- | --------- | -------------------------- | ------------ | ------------------------- | ----------------- | ------------------------------ | -------------- | ------------------------- |
| 00002-000-000 | 2.0-Baths | 1.0 | 2-2 BEDROOMS | 1.0 | 13-PRE-FAB PANEL | 100.0 | 08-SHEET VINYL | 20.0 |
| 00002-000-000 | | | | | | | 14-CARPET | 80.0 |
| 00011-000-000 | 3.0-Baths | 1.0 | 3-3 BEDROOMS | 1.0 | 15-CONCRETE BLOCK | 60.0 | 14-CARPET | 100.0 |
| 00011-000-000 | | | | | 20-FACE BRICK | 40.0 | | |
到目前为止,我:
从io导入StringIO
作为pd进口熊猫
数据=StringIO(
"""
地块;改善属性;改善属性描述;改善属性单位
00002-000-000;浴室;2.0-浴室;1.0
00002-000-000;卧室;2-2间卧室;1.0
00002-000-000;外墙;13-预制板;100.0
00002-000-000;地板Cov;08片式乙烯基;20.0
00002-000-000;地板罩;14-地毯;80.0
00011-000-000;浴室;3.0-浴室;1.0
00011-000-000;卧室;3-3间卧室;1.0
00011-000-000;外墙;15-混凝土砌块;60.0
00011-000-000;外墙;20面砖;40.0
00011-000-000;地板罩;14-地毯;100.0
"""
)
df=pd.read_csv(数据,sep=“;”)
df=df.pivot\u表(values=“Imprv\u Attr\u Desc”,index=“Parcel”,columns=“Imprv\u Attribute”,aggfunc=“first”)
打印(df)
这将导致此数据框中,由于聚合功能首先
,我将丢失有关地板Cov
和外墙
的信息
| Parcel | Bathrooms | Bedrooms | Exterior Wall | Floor Cov |
| ------------- | --------- | ------------ | ----------------- | -------------- |
| 00002-000-000 | 2.0-Baths | 2-2 BEDROOMS | 13-PRE-FAB PANEL | 08-SHEET VINYL |
| 00011-000-000 | 3.0-Baths | 3-3 BEDROOMS | 15-CONCRETE BLOCK | 14-CARPET |
我也试过了
df=df.pivot\u表(index=[df.index,“地块”],columns=“Imprv\u属性”,values=“Imprv\u Attr\u Desc”)
打印(df)
这将导致pandas.core.base.DataError:没有要聚合的数值类型
。我也尝试过groupby,但这也没有达到我想要的效果:
df_group=df.groupby([“地块”])
对于键,df_组中的项目:
df=df_组。获取_组(键)
df=df.pivot(columns=“Imprv\u Attribute”,values=“Imprv\u Attr\u Desc”)
打印(df,“\n\n”)
改善属性浴室卧室外墙地板Cov HC&V HVAC供暖系统内墙数量Res装置屋顶型屋顶
0.2.0-Baths楠楠楠楠楠楠楠楠楠楠楠楠楠楠楠楠楠楠楠楠楠楠楠楠楠楠楠楠楠
南南1间2-2间卧室
2楠楠13-预制板楠楠
3楠楠08片乙烯楠楠
4楠楠14-地毯楠楠楠楠
5楠楠04-强制通风楠楠
6楠楠04-电气楠楠
7楠楠01-NONE楠楠楠
8楠楠04-面板楠楠楠
9南南南南南南南南南南南南南南南南南南南南南南南南南南南南南南南南南南南南南南南南南南南南南南南南南南南南南南南南南南南南南南南南
10楠03-山墙/协楠
11南03-沥青
改善属性浴室卧室外墙地板Cov HC&V HVAC供暖系统内墙数量Res装置屋顶型屋顶
12 3.0-Baths楠楠
13南3-3间卧室南
14楠楠15-楠楠混凝土砌块
15楠楠20面砖楠楠
16楠楠14地毯楠楠楠楠
17楠楠04-强制通风楠楠
18楠楠04-电气楠楠
19南03-中南
20南05-干墙南
根据该解决方案,可能是pd.DataFrame.groupby
和df['N'] = df.groupby(['Parcel', 'Imprv_Attribute']).cumcount()
df1 = df.pivot_table(index=['Parcel', 'N'],
columns='Imprv_Attribute',
values=['Imprv_Attr_Desc', 'Imprv_Attr_Units'],
aggfunc='first')
df1.columns = [x[1] if x[0] == 'Imprv_Attr_Desc' else '_'.join(x[::-1]) for x in df1.columns]
df1 = df1.sort_index(axis=1).reset_index().drop(columns='N')
Parcel Bathrooms Bathrooms_Imprv_Attr_Units Bedrooms Bedrooms_Imprv_Attr_Units Exterior Wall Exterior Wall_Imprv_Attr_Units Floor Cov Floor Cov_Imprv_Attr_Units
0 00002-000-000 2.0-Baths 1.0 2-2 BEDROOMS 1.0 13-PRE-FAB PANEL 100.0 08-SHEET VINYL 20.0
1 00002-000-000 NaN NaN NaN NaN NaN NaN 14-CARPET 80.0
2 00011-000-000 3.0-Baths 1.0 3-3 BEDROOMS 1.0 15-CONCRETE BLOCK 60.0 14-CARPET 100.0
3 00011-000-000 NaN NaN NaN NaN 20-FACE BRICK 40.0 NaN NaN