使用panda'合并行数据;python中的s
我正在尝试编写一个小型python应用程序,创建一个包含配方系统数据的csv文件 想象一下excel数据的以下结构使用panda'合并行数据;python中的s,python,pandas,Python,Pandas,我正在尝试编写一个小型python应用程序,创建一个包含配方系统数据的csv文件 想象一下excel数据的以下结构 Manufacturer Product Data 1 Data 2 Data 3 Test 1 Product 1 1 2 3 Test 1 Product 2 4 5 6 Test 2 Product 1 1 2 3 Test 3 Product 1 1 2 3 Test 3 Product 1 4 5
Manufacturer Product Data 1 Data 2 Data 3
Test 1 Product 1 1 2 3
Test 1 Product 2 4 5 6
Test 2 Product 1 1 2 3
Test 3 Product 1 1 2 3
Test 3 Product 1 4 5 6
Test 3 Product 1 7 8 9
合并时,我希望数据以以下格式显示:
Test 1 Product 1 1 2 3 0 0 0 0 0 0
Test 2 Product 2 4 5 6 0 0 0 0 0 0
Test 2 Product 1 1 2 3 0 0 0 0 0 0
Test 3 Product 1 1 2 3 4 5 6 7 8 9
任何帮助都会非常感激,到目前为止,我可以阅读panda数据集并将其转换为CSV
问候
Lee使用melt、groupby、pd.Series和unstack:
(df.melt(['Manufacturer','Product'])
.groupby(['Manufacturer','Product'])['value']
.apply(lambda x: pd.Series(x.tolist()))
.unstack(fill_value=0)
.reset_index())
输出:
Manufacturer Product 0 1 2 3 4 5 6 7 8
0 Test 1 Product 1 1 2 3 0 0 0 0 0 0
1 Test 1 Product 2 4 5 6 0 0 0 0 0 0
2 Test 2 Product 1 1 2 3 0 0 0 0 0 0
3 Test 3 Product 1 1 4 7 2 5 8 3 6 9
抓住我
Data 1 Data 2 Data 3
0 1 2 0 1 2 0 1 2
Manufacturer Product
Test 1 Product 1 1 0 0 2 0 0 3 0 0
Product 2 4 0 0 5 0 0 6 0 0
Test 2 Product 1 1 0 0 2 0 0 3 0 0
Test 3 Product 1 1 4 7 2 5 8 3 6 9
跟进wtih
d.sort_index(1, 1).pipe(lambda d: d.set_axis(range(d.shape[1]), 1, False).reset_index())
Manufacturer Product 0 1 2 3 4 5 6 7 8
0 Test 1 Product 1 1 2 3 0 0 0 0 0 0
1 Test 1 Product 2 4 5 6 0 0 0 0 0 0
2 Test 2 Product 1 1 2 3 0 0 0 0 0 0
3 Test 3 Product 1 1 2 3 4 5 6 7 8 9
或
使用
defaultdict
和itertools.count
from itertools import count
from collections import defaultdict
c = defaultdict(count)
pd.Series({(
m, p, next(c[(m, p)])): v
for _, m, p, *V in df.itertuples()
for v in V
}).unstack(fill_value=0)
0 1 2 3 4 5 6 7 8
Test 1 Product 1 1 2 3 0 0 0 0 0 0
Product 2 4 5 6 0 0 0 0 0 0
Test 2 Product 1 1 2 3 0 0 0 0 0 0
Test 3 Product 1 1 2 3 4 5 6 7 8 9
使用
groupby
df.groupby(['Manufacturer','Product']).agg(tuple).sum(1).apply(pd.Series).fillna(0)
Out[85]:
0 1 2 3 4 5 6 7 8
Manufacturer Product
Test1 Product1 1.0 2.0 3.0 0.0 0.0 0.0 0.0 0.0 0.0
Product2 4.0 5.0 6.0 0.0 0.0 0.0 0.0 0.0 0.0
Test2 Product1 1.0 2.0 3.0 0.0 0.0 0.0 0.0 0.0 0.0
Test3 Product1 1.0 4.0 7.0 2.0 5.0 8.0 3.0 6.0 9.0
如果我要发送一个包含示例数据的示例电子表格,因为我仍在努力展示你们提供的惊人帮助,那么有人愿意接受一条私人消息来进一步帮助我吗。
from itertools import count
from collections import defaultdict
c = defaultdict(count)
pd.Series({(
m, p, next(c[(m, p)])): v
for _, m, p, *V in df.itertuples()
for v in V
}).unstack(fill_value=0)
0 1 2 3 4 5 6 7 8
Test 1 Product 1 1 2 3 0 0 0 0 0 0
Product 2 4 5 6 0 0 0 0 0 0
Test 2 Product 1 1 2 3 0 0 0 0 0 0
Test 3 Product 1 1 2 3 4 5 6 7 8 9
df.groupby(['Manufacturer','Product']).agg(tuple).sum(1).apply(pd.Series).fillna(0)
Out[85]:
0 1 2 3 4 5 6 7 8
Manufacturer Product
Test1 Product1 1.0 2.0 3.0 0.0 0.0 0.0 0.0 0.0 0.0
Product2 4.0 5.0 6.0 0.0 0.0 0.0 0.0 0.0 0.0
Test2 Product1 1.0 2.0 3.0 0.0 0.0 0.0 0.0 0.0 0.0
Test3 Product1 1.0 4.0 7.0 2.0 5.0 8.0 3.0 6.0 9.0