Python 从dataframe中的特定行创建新列
我有一个csv文件,其中每一行代表一个属性,后面是反映属性中房间的可变数量的后续行。我想为每个属性创建一个列,该列汇总每个房间的总建筑面积。数据的非结构化特性使得在Python 从dataframe中的特定行创建新列,python,pandas,dataframe,Python,Pandas,Dataframe,我有一个csv文件,其中每一行代表一个属性,后面是反映属性中房间的可变数量的后续行。我想为每个属性创建一个列,该列汇总每个房间的总建筑面积。数据的非结构化特性使得在pandas中很难实现这一点。以下是我目前的表格示例: id ba store_desc floor_area 0 1 Toy Shop NaN 1 2 Retail Zone A 29.42 2 2 Retail Zone B 31.29 3 1 Grocery S
pandas
中很难实现这一点。以下是我目前的表格示例:
id ba store_desc floor_area
0 1 Toy Shop NaN
1 2 Retail Zone A 29.42
2 2 Retail Zone B 31.29
3 1 Grocery Store NaN
4 2 Retail Zone A 68.00
5 2 Outside Garden 83.50
6 2 Office 7.30
以下是我正在尝试创建的表:
id ba store_desc floor_area gross_floor_area
0 1 Toy Shop NaN 60.71
3 1 Grocery Store NaN 158.8
有人对如何达到这个结果有什么建议吗?我完全迷路了
SamIIUC
df1=df[df['floor_area'].isnull()]
df1['gross_floor_area']=df.groupby(df['floor_area'].isnull().cumsum())['floor_area'].sum().values
df1
Out[463]:
id ba store_desc floor_area gross_floor_area
0 0 1 ToyShop NaN 60.71
3 3 1 GroceryStore NaN 158.80
IIUC
首先创建一个名为category的临时列,然后将其向前填充,按该列分组以获得总和,然后将其映射回相关的store_desc值
df['category'] = df[df.floor_area.isnull()]['store_desc']
df['category'].fillna(method='ffill',inplace=True)
df['gross_floor_area'] = df.store_desc.map(df.groupby('category').sum().floor_area)
df.drop('category',axis=1,inplace=True)
df[df.gross_floor_area.notnull()]
首先创建一个名为category的临时列,然后将其向前填充,按该列分组以获得总和,然后将其映射回相关的store_desc值
df['category'] = df[df.floor_area.isnull()]['store_desc']
df['category'].fillna(method='ffill',inplace=True)
df['gross_floor_area'] = df.store_desc.map(df.groupby('category').sum().floor_area)
df.drop('category',axis=1,inplace=True)
df[df.gross_floor_area.notnull()]