Python 合并连续列中相同变量的级别
我有一个csv数据文件,它有两个标题,一个标题是问题,另一个标题是子标题,主标题有多个级别或答案。当前csv如下表所示 Header Which country do you live? Which country you previously visited? Users Canada USA UK Mexico Norway India Singapore Pakistan User 1 Canada Singapore User 2 UK India User 3 Mexico Pakistan User 4 Norway India Users Which country do you live? Which country you previously visited? User 1 Canada Singapore User 2 UK India User 3 Norway Pakistan User 4 Mexico India 你住在哪个国家?你以前去过哪个国家? 用户加拿大美国英国墨西哥挪威印度新加坡巴基斯坦 用户1加拿大新加坡 用户2英国印度 用户3墨西哥-巴基斯坦 用户4挪威-印度 我需要把它转换成下表 Header Which country do you live? Which country you previously visited? Users Canada USA UK Mexico Norway India Singapore Pakistan User 1 Canada Singapore User 2 UK India User 3 Mexico Pakistan User 4 Norway India Users Which country do you live? Which country you previously visited? User 1 Canada Singapore User 2 UK India User 3 Norway Pakistan User 4 Mexico India 用户你住在哪个国家?你以前去过哪个国家? 用户1加拿大新加坡 用户2英国印度 用户3挪威-巴基斯坦 用户4墨西哥-印度 有人能帮我吗 这就是我的数据的样子 我的输入文件如下所示 这就是我最终输出的样子Python 合并连续列中相同变量的级别,python,pandas,csv,data-transform,Python,Pandas,Csv,Data Transform,我有一个csv数据文件,它有两个标题,一个标题是问题,另一个标题是子标题,主标题有多个级别或答案。当前csv如下表所示 Header Which country do you live? Which country you previously visited? Users Canada USA UK Mexico Norway India Singapore Pakistan User 1 Canada
首先通过
b填充来填充缺少的值,然后选择第一列,并通过以下方式删除第二级多索引
:
编辑:
评论不用于扩展讨论;这段对话已经结束。
df = df.groupby(level=0, axis=1).apply(lambda x: x.bfill(axis=1).iloc[:, 0])
print (df)
Header Which country do you live? Which country you previously visited?
0 User 1 Canada Singapore
1 User 2 UK India
2 User 3 Mexico Pakistan
3 User 4 Norway India