Pandas 在Dataframe中拆分列标题并复制行值
在下面的示例df中,我试图找到一种基于“;”拆分列标题('1;2'、'4'、'5;6')的方法存在并复制这些拆分列中的行值。(我的实际df来自导入的csv文件,因此通常我有大约50-80个需要拆分的列标题) 下面是我下面的代码和输出Pandas 在Dataframe中拆分列标题并复制行值,pandas,split,co,Pandas,Split,Co,在下面的示例df中,我试图找到一种基于“;”拆分列标题('1;2'、'4'、'5;6')的方法存在并复制这些拆分列中的行值。(我的实际df来自导入的csv文件,因此通常我有大约50-80个需要拆分的列标题) 下面是我下面的代码和输出 import pandas as pd import numpy as np # data = np.array([['Market','Product Code','1;2','4','5;6'], ['Total Custo
import pandas as pd
import numpy as np
#
data = np.array([['Market','Product Code','1;2','4','5;6'],
['Total Customers',123,1,500,400],
['Total Customers',123,2,400,320],
['Major Customer 1',123,1,100,220],
['Major Customer 1',123,2,230,230],
['Major Customer 2',123,1,130,30],
['Major Customer 2',123,2,20,10],
['Total Customers',456,1,500,400],
['Total Customers',456,2,400,320],
['Major Customer 1',456,1,100,220],
['Major Customer 1',456,2,230,230],
['Major Customer 2',456,1,130,30],
['Major Customer 2',456,2,20,10]])
df =pd.DataFrame(data)
df.columns = df.iloc[0]
df = df.reindex(df.index.drop(0))
print (df)
0 Market Product Code 1;2 4 5;6
1 Total Customers 123 1 500 400
2 Total Customers 123 2 400 320
3 Major Customer 1 123 1 100 220
4 Major Customer 1 123 2 230 230
5 Major Customer 2 123 1 130 30
6 Major Customer 2 123 2 20 10
7 Total Customers 456 1 500 400
8 Total Customers 456 2 400 320
9 Major Customer 1 456 1 100 220
10 Major Customer 1 456 2 230 230
11 Major Customer 2 456 1 130 30
12 Major Customer 2 456 2 20 10
下面是我想要的输出
0 Market Product Code 1 2 4 5 6
1 Total Customers 123 1 1 500 400 400
2 Total Customers 123 2 2 400 320 320
3 Major Customer 1 123 1 1 100 220 220
4 Major Customer 1 123 2 2 230 230 230
5 Major Customer 2 123 1 1 130 30 30
6 Major Customer 2 123 2 2 20 10 10
7 Total Customers 456 1 1 500 400 400
8 Total Customers 456 2 2 400 320 320
9 Major Customer 1 456 1 1 100 220 220
10 Major Customer 1 456 2 2 230 230 230
11 Major Customer 2 456 1 1 130 30 30
12 Major Customer 2 456 2 2 20 10 10
理想情况下,我希望在“读取csv”级别执行此任务。有什么想法吗?用
重复
s=df.columns.str.split(';')
df=df.reindex(columns=df.columns.repeat(s.str.len()))
df.columns=sum(s.tolist(),[])
df
Out[247]:
Market Product Code 1 2 4 5 6
1 Total Customers 123 1 1 500 400 400
2 Total Customers 123 2 2 400 320 320
3 Major Customer 1 123 1 1 100 220 220
4 Major Customer 1 123 2 2 230 230 230
5 Major Customer 2 123 1 1 130 30 30
6 Major Customer 2 123 2 2 20 10 10
7 Total Customers 456 1 1 500 400 400
8 Total Customers 456 2 2 400 320 320
9 Major Customer 1 456 1 1 100 220 220
10 Major Customer 1 456 2 2 230 230 230
11 Major Customer 2 456 1 1 130 30 30
12 Major Customer 2 456 2 2 20 10 10
可以使用“;”拆分列然后重建df:
pd.DataFrame({c:df[t] for t in df.columns for c in t.split(';')})
Out[157]:
1 2 4 5 6 Market Product Code
1 1 1 500 400 400 Total Customers 123
2 2 2 400 320 320 Total Customers 123
3 1 1 100 220 220 Major Customer 1 123
4 2 2 230 230 230 Major Customer 1 123
5 1 1 130 30 30 Major Customer 2 123
6 2 2 20 10 10 Major Customer 2 123
7 1 1 500 400 400 Total Customers 456
8 2 2 400 320 320 Total Customers 456
9 1 1 100 220 220 Major Customer 1 456
10 2 2 230 230 230 Major Customer 1 456
11 1 1 130 30 30 Major Customer 2 456
12 2 2 20 10 10 Major Customer 2 456
或者,如果您想保留列顺序:
pd.concat([df[t].to_frame(c) for t in df.columns for c in t.split(';')],1)
Out[167]:
Market Product Code 1 2 4 5 6
1 Total Customers 123 1 1 500 400 400
2 Total Customers 123 2 2 400 320 320
3 Major Customer 1 123 1 1 100 220 220
4 Major Customer 1 123 2 2 230 230 230
5 Major Customer 2 123 1 1 130 30 30
6 Major Customer 2 123 2 2 20 10 10
7 Total Customers 456 1 1 500 400 400
8 Total Customers 456 2 2 400 320 320
9 Major Customer 1 456 1 1 100 220 220
10 Major Customer 1 456 2 2 230 230 230
11 Major Customer 2 456 1 1 130 30 30
12 Major Customer 2 456 2 2 20 10 10
完美地工作@Wen。谢谢@jwlon81 yw~:-)快乐编码