Python支持包含重复值的多个字符串列而不进行聚合

Python支持包含重复值的多个字符串列而不进行聚合,python,pandas,dataframe,Python,Pandas,Dataframe,我有以下数据帧,希望以一种方式将列Imprv\u属性转换为每个键的单个列,并且值应该是Imprv\u Attr\u Desc。我还需要improv\u Attr\u Units信息,对于每个新创建的列,例如Bathrooms的improv\u Attr\u Units应该有自己的名为Bathrooms\u improv\u Attr\u Units的列 | | Parcel | Imprv_Attribute | Imprv_Attr_Desc | Imprv_Attr

我有以下数据帧,希望以一种方式将列
Imprv\u属性
转换为每个键的单个列,并且值应该是
Imprv\u Attr\u Desc
。我还需要
improv\u Attr\u Units
信息,对于每个新创建的列,例如
Bathrooms
improv\u Attr\u Units
应该有自己的名为
Bathrooms\u improv\u Attr\u Units
的列

|     | Parcel        | Imprv_Attribute | Imprv_Attr_Desc   | Imprv_Attr_Units |
| --- | ------------- | --------------- | ----------------- | ---------------- |
| 0   | 00002-000-000 | Bathrooms       | 2.0-Baths         | 1.0              |
| 1   | 00002-000-000 | Bedrooms        | 2-2 BEDROOMS      | 1.0              |
| 2   | 00002-000-000 | Exterior Wall   | 13-PRE-FAB PANEL  | 100.0            |
| 3   | 00002-000-000 | Floor Cov       | 08-SHEET VINYL    | 20.0             |
| 4   | 00002-000-000 | Floor Cov       | 14-CARPET         | 80.0             |
| 5   | 00011-000-000 | Bathrooms       | 3.0-Baths         | 1.0              |
| 6   | 00011-000-000 | Bedrooms        | 3-3 BEDROOMS      | 1.0              |
| 7   | 00011-000-000 | Exterior Wall   | 15-CONCRETE BLOCK | 60.0             |
| 8   | 00011-000-000 | Exterior Wall   | 20-FACE BRICK     | 40.0             |
| 9   | 00011-000-000 | Floor Cov       | 14-CARPET         | 100.0            |
我的最终结果应该如下所示:

| Parcel        | Bathrooms | Bathrooms_Imprv_Attr_Units | Bedrooms     | Bedrooms_Imprv_Attr_Units | Exterior Wall     | Exterior Wall_Imprv_Attr_Units | Floor Cov      | Floor Cov_Imprv_Attr_Unit |
| ------------- | --------- | -------------------------- | ------------ | ------------------------- | ----------------- | ------------------------------ | -------------- | ------------------------- |
| 00002-000-000 | 2.0-Baths | 1.0                        | 2-2 BEDROOMS | 1.0                       | 13-PRE-FAB PANEL  | 100.0                          | 08-SHEET VINYL | 20.0                      |
| 00002-000-000 |           |                            |              |                           |                   |                                | 14-CARPET      | 80.0                      |
| 00011-000-000 | 3.0-Baths | 1.0                        | 3-3 BEDROOMS | 1.0                       | 15-CONCRETE BLOCK | 60.0                           | 14-CARPET      | 100.0                     |
| 00011-000-000 |           |                            |              |                           | 20-FACE BRICK     | 40.0                           |                |                           |
到目前为止,我:

从io导入StringIO
作为pd进口熊猫
数据=StringIO(
"""
地块;改善属性;改善属性描述;改善属性单位
00002-000-000;浴室;2.0-浴室;1.0
00002-000-000;卧室;2-2间卧室;1.0
00002-000-000;外墙;13-预制板;100.0
00002-000-000;地板Cov;08片式乙烯基;20.0
00002-000-000;地板罩;14-地毯;80.0
00011-000-000;浴室;3.0-浴室;1.0
00011-000-000;卧室;3-3间卧室;1.0
00011-000-000;外墙;15-混凝土砌块;60.0
00011-000-000;外墙;20面砖;40.0
00011-000-000;地板罩;14-地毯;100.0
"""
)
df=pd.read_csv(数据,sep=“;”)
df=df.pivot\u表(values=“Imprv\u Attr\u Desc”,index=“Parcel”,columns=“Imprv\u Attribute”,aggfunc=“first”)
打印(df)
这将导致此数据框中,由于聚合功能
首先
,我将丢失有关
地板Cov
外墙
的信息

| Parcel        | Bathrooms | Bedrooms     | Exterior Wall     | Floor Cov      |
| ------------- | --------- | ------------ | ----------------- | -------------- |
| 00002-000-000 | 2.0-Baths | 2-2 BEDROOMS | 13-PRE-FAB PANEL  | 08-SHEET VINYL |
| 00011-000-000 | 3.0-Baths | 3-3 BEDROOMS | 15-CONCRETE BLOCK | 14-CARPET      |
我也试过了

df=df.pivot\u表(index=[df.index,“地块”],columns=“Imprv\u属性”,values=“Imprv\u Attr\u Desc”)
打印(df)
这将导致
pandas.core.base.DataError:没有要聚合的数值类型
。我也尝试过groupby,但这也没有达到我想要的效果:

df_group=df.groupby([“地块”])
对于键,df_组中的项目:
df=df_组。获取_组(键)
df=df.pivot(columns=“Imprv\u Attribute”,values=“Imprv\u Attr\u Desc”)
打印(df,“\n\n”)

改善属性浴室卧室外墙地板Cov HC&V HVAC供暖系统内墙数量Res装置屋顶型屋顶
0.2.0-Baths楠楠楠楠楠楠楠楠楠楠楠楠楠楠楠楠楠楠楠楠楠楠楠楠楠楠楠楠楠
南南1间2-2间卧室
2楠楠13-预制板楠楠
3楠楠08片乙烯楠楠
4楠楠14-地毯楠楠楠楠
5楠楠04-强制通风楠楠
6楠楠04-电气楠楠
7楠楠01-NONE楠楠楠
8楠楠04-面板楠楠楠
9南南南南南南南南南南南南南南南南南南南南南南南南南南南南南南南南南南南南南南南南南南南南南南南南南南南南南南南南南南南南南南南南
10楠03-山墙/协楠
11南03-沥青
改善属性浴室卧室外墙地板Cov HC&V HVAC供暖系统内墙数量Res装置屋顶型屋顶
12 3.0-Baths楠楠
13南3-3间卧室南
14楠楠15-楠楠混凝土砌块
15楠楠20面砖楠楠
16楠楠14地毯楠楠楠楠
17楠楠04-强制通风楠楠
18楠楠04-电气楠楠
19南03-中南
20南05-干墙南
根据该解决方案,可能是
pd.DataFrame.groupby
的组合
df['N'] = df.groupby(['Parcel', 'Imprv_Attribute']).cumcount()

df1 = df.pivot_table(index=['Parcel', 'N'], 
                     columns='Imprv_Attribute', 
                     values=['Imprv_Attr_Desc', 'Imprv_Attr_Units'],
                     aggfunc='first')

df1.columns = [x[1] if x[0] == 'Imprv_Attr_Desc' else '_'.join(x[::-1]) for x in df1.columns]
df1 = df1.sort_index(axis=1).reset_index().drop(columns='N')
          Parcel  Bathrooms  Bathrooms_Imprv_Attr_Units      Bedrooms  Bedrooms_Imprv_Attr_Units      Exterior Wall  Exterior Wall_Imprv_Attr_Units       Floor Cov  Floor Cov_Imprv_Attr_Units
0  00002-000-000  2.0-Baths                         1.0  2-2 BEDROOMS                        1.0   13-PRE-FAB PANEL                           100.0  08-SHEET VINYL                        20.0
1  00002-000-000        NaN                         NaN           NaN                        NaN                NaN                             NaN       14-CARPET                        80.0
2  00011-000-000  3.0-Baths                         1.0  3-3 BEDROOMS                        1.0  15-CONCRETE BLOCK                            60.0       14-CARPET                       100.0
3  00011-000-000        NaN                         NaN           NaN                        NaN      20-FACE BRICK                            40.0             NaN                         NaN