Python 基于其他列值填充和移动列值
我有一个这样的数据集Python 基于其他列值填充和移动列值,python,pandas,Python,Pandas,我有一个这样的数据集 number Shipment Date service desc amount 182692345 2/12/19 DUTIES & TAXES IMPORT EXPORT DUTIES 561.01 IMPORT EXPORT TAXES 600.47 1827975839 2/12/19 DUTIES & TAXES IMPORT E
number Shipment Date service desc amount
182692345 2/12/19 DUTIES & TAXES
IMPORT EXPORT DUTIES 561.01
IMPORT EXPORT TAXES 600.47
1827975839 2/12/19 DUTIES & TAXES
IMPORT EXPORT DUTIES 160.19
3229475633 2/12/19 DUTIES & TAXES
IMPORT EXPORT TAXES 600.47
IMPORT EXPORT DUTIES 561.01
5733894261 29/04/2020 Express
DUTIES TAXES PAID 25
FUEL SURCHARGE 3.28
1826995520 2/12/19 DUTIES & TAXES
IMPORT EXPORT TAXES 600.47
IMPORT EXPORT DUTIES 561.01
2998455062 4/5/20 Express
FUEL SURCHARGE 0.72
在pic格式中,它如下所示:
number Shipment Date service desc amount
182692345 2/12/19 DUTIES & TAXES
IMPORT EXPORT DUTIES 561.01
IMPORT EXPORT TAXES 600.47
1827975839 2/12/19 DUTIES & TAXES
IMPORT EXPORT DUTIES 160.19
3229475633 2/12/19 DUTIES & TAXES
IMPORT EXPORT TAXES 600.47
IMPORT EXPORT DUTIES 561.01
5733894261 29/04/2020 Express FUEL SURCHARGE 3.28
DUTIES TAXES PAID 25
1826995520 2/12/19 DUTIES & TAXES
IMPORT EXPORT TAXES 600.47
IMPORT EXPORT DUTIES 561.01
2998455062 4/5/20 Express FUEL SURCHARGE 0.72
df1=df.copy()
df1[['number', 'shipment_date']]=df1[['number', 'shipment_date']].ffill()
df1.desc=df1.desc.fillna('')
df1.amount= df1.amount.fillna('')
s= df1.groupby(['number', 'shipment_date']).amount.transform(lambda x: ' '.join(str(x)))
df.loc[df.shipment_date.notnull(),'amount']=s
df.loc[df.shipment_date.isnull(),'amount']=''
我想要的是,对于有编号和装运日期的行,我们检查“Express”所在的服务。然后,对于这些行,我想将desc col中的“燃油附加费”行与number和shipping\u date
以及相应的金额值拉到同一行
下面是这样的:
number Shipment Date service desc amount
182692345 2/12/19 DUTIES & TAXES
IMPORT EXPORT DUTIES 561.01
IMPORT EXPORT TAXES 600.47
1827975839 2/12/19 DUTIES & TAXES
IMPORT EXPORT DUTIES 160.19
3229475633 2/12/19 DUTIES & TAXES
IMPORT EXPORT TAXES 600.47
IMPORT EXPORT DUTIES 561.01
5733894261 29/04/2020 Express FUEL SURCHARGE 3.28
DUTIES TAXES PAID 25
1826995520 2/12/19 DUTIES & TAXES
IMPORT EXPORT TAXES 600.47
IMPORT EXPORT DUTIES 561.01
2998455062 4/5/20 Express FUEL SURCHARGE 0.72
df1=df.copy()
df1[['number', 'shipment_date']]=df1[['number', 'shipment_date']].ffill()
df1.desc=df1.desc.fillna('')
df1.amount= df1.amount.fillna('')
s= df1.groupby(['number', 'shipment_date']).amount.transform(lambda x: ' '.join(str(x)))
df.loc[df.shipment_date.notnull(),'amount']=s
df.loc[df.shipment_date.isnull(),'amount']=''
下面是图片格式
最后,我只关心服务是“Express”的行,所以如果我们去掉所有服务不是Express的行,得到上面的格式(只针对Express值),那将是理想的
我认为pandasffill()
和transform将是主要的工具。因此,我正在尝试以下内容:
number Shipment Date service desc amount
182692345 2/12/19 DUTIES & TAXES
IMPORT EXPORT DUTIES 561.01
IMPORT EXPORT TAXES 600.47
1827975839 2/12/19 DUTIES & TAXES
IMPORT EXPORT DUTIES 160.19
3229475633 2/12/19 DUTIES & TAXES
IMPORT EXPORT TAXES 600.47
IMPORT EXPORT DUTIES 561.01
5733894261 29/04/2020 Express FUEL SURCHARGE 3.28
DUTIES TAXES PAID 25
1826995520 2/12/19 DUTIES & TAXES
IMPORT EXPORT TAXES 600.47
IMPORT EXPORT DUTIES 561.01
2998455062 4/5/20 Express FUEL SURCHARGE 0.72
df1=df.copy()
df1[['number', 'shipment_date']]=df1[['number', 'shipment_date']].ffill()
df1.desc=df1.desc.fillna('')
df1.amount= df1.amount.fillna('')
s= df1.groupby(['number', 'shipment_date']).amount.transform(lambda x: ' '.join(str(x)))
df.loc[df.shipment_date.notnull(),'amount']=s
df.loc[df.shipment_date.isnull(),'amount']=''
在空行中填入
fillna(method='ffill')
,由服务提取,并通过shift(-1)
获得。这是否符合问题的意图
df['service'] = df['service'].fillna(method='ffill')
df = df[df['service'] == 'Express']
df[['number','Shipment Date']] = df[['number','Shipment Date']].fillna(method='ffill')
df[['desc','amount']] = df[['desc','amount']].shift(-1)
df
number Shipment Date service desc amount
8 5.733894e+09 29/04/2020 Express DUTIES TAXES PAID 25.00
9 5.733894e+09 29/04/2020 Express FUEL SURCHARGE 3.28
10 5.733894e+09 29/04/2020 Express NaN NaN
14 2.998455e+09 4/5/20 Express FUEL SURCHARGE 0.72
15 2.998455e+09 4/5/20 Express NaN NaN
从逻辑上讲,您有一个经典的主/详细数据集。您的详细信息数据集没有主记录的外键。添加FK,然后您可以在主机上执行过滤条件,在细节上执行过滤条件,并将FK加入PK
fillna填充明细记录的FK
您可以在
service
列中向前填充缺少的值,然后比较Express
和lastshift
在列表中仅匹配的行和列,并且:
在您当前的解决方案尝试中,什么不起作用?我无法将存在“express”且存在装运日期和编号的行的燃油附加费值移动到与它们相同的级别。这将使燃油附加费行保持在第二位。我希望它被翻转,燃油附加费行与装运日期和编号在同一行。检查我上面显示的输出代码已更新为“燃油附加费”,需要调整日期。