Python 在pandas中转置非结构化行
我有这样一个数据集:Python 在pandas中转置非结构化行,python,pandas,transpose,Python,Pandas,Transpose,我有这样一个数据集: category UK US Germany sales 100000 48000 36000 budget 50000 20000 14000 n_employees 300 123 134 d
category UK US Germany
sales 100000 48000 36000
budget 50000 20000 14000
n_employees 300 123 134
diversified 1 0 1
sustainability_score 22.8 38.9 34.5
e_commerce 37000 7000 11000
budget 25000 10000 10000
n_employees 18 22 7
traffic 150 mil 38 mil 12500
subsidy 33000 26000 23000
budget 14000 6000 6000
own_marketing 0 0 1
UK_main_sales
UK_main_budget
UK_main_n_employees
UK_main_diversified
UK_main_sustainability_score
UK_e_commerce (we could also add sales but I think it is simpler without sales)
UK_e_commerce_budget
UK_e_commerce_n_employees
UK_e_commerce_traffic
UK_subsidy
UK_subsidy_budget
UK_subsidy_own_marketing
在数据集中,sales变量对应于总部的销售额。电子商务
是电子商务
的销售,电子商务
之后的预算
实际上是公司电子商务
部门的预算。同样的情况也适用于补贴Y
,补贴
变量对应于补贴
的销售,补贴后的预算
变量是补贴
的预算。我想将dataset转换为类似的内容(如果我们以英国为例):
等等。我试图通过跟踪预算
变量对不同部门的变量进行分类,因为它总是在离职后出现,但我没有成功。
英国变量的完整列表应如下所示:
category UK US Germany
sales 100000 48000 36000
budget 50000 20000 14000
n_employees 300 123 134
diversified 1 0 1
sustainability_score 22.8 38.9 34.5
e_commerce 37000 7000 11000
budget 25000 10000 10000
n_employees 18 22 7
traffic 150 mil 38 mil 12500
subsidy 33000 26000 23000
budget 14000 6000 6000
own_marketing 0 0 1
UK_main_sales
UK_main_budget
UK_main_n_employees
UK_main_diversified
UK_main_sustainability_score
UK_e_commerce (we could also add sales but I think it is simpler without sales)
UK_e_commerce_budget
UK_e_commerce_n_employees
UK_e_commerce_traffic
UK_subsidy
UK_subsidy_budget
UK_subsidy_own_marketing
有什么想法吗?我认为需要:
#get boolean mask for rows for split
mask = df['category'].isin(['subsidy', 'e_commerce'])
#create NaNs for non match values by where
#replace NaNs by forward fill, first NaNs replace by fillna
#create mask for match values by mask and replace by empty string
#join together
df['category'] = (df['category'].where(mask).ffill().fillna('main').mask(mask).fillna('')
+ '_' + df['category']).str.strip('_')
#reshape by unstack
df = df.set_index('category').unstack().to_frame().T
#flatten MultiIndex
df.columns = df.columns.map('_'.join)
非常感谢,我怎样才能将“main”添加到第一个变量中,将补贴和电子商务添加到其他变量中?哎呀,给我一秒钟你想只更改budget
s的名称吗?哦,不,我们所有人都不想them@edyvedy13-我好像迷路了(对不起,更多列(可能是下一个6列)的预期输出是什么?