使用python合并子组值
我想根据事务组将会计订单号连接到输出文件中的第一个。 输入文件使用python合并子组值,python,pandas,Python,Pandas,我想根据事务组将会计订单号连接到输出文件中的第一个。 输入文件 01 2019-03-01 Travel 1500 DCA CR 04 2019-03-01 Allowance 300 ATC DR 05 2019-03-02 Local Trip 100 TCO CR Accounting Order 190291 22 2019-02-01 Charges 2500 DCA CR 98 2019-02-08 Allowance 900
01 2019-03-01 Travel 1500 DCA CR
04 2019-03-01 Allowance 300 ATC DR
05 2019-03-02 Local Trip 100 TCO CR
Accounting Order 190291
22 2019-02-01 Charges 2500 DCA CR
98 2019-02-08 Allowance 900 ATC DR
36 2019-01-30 Local Trip 50 TCO CR
74 2019-02-09 Court fees 300 ATC DR
Accounting Order 195297
33 2019-03-01 Travel 1500 DCA CR
97 2019-03-01 Allowance 300 ATC DR
Accounting Order 180876
输出应该是
190291 01 2019-03-01 Travel 1500 DCA CR
190291 04 2019-03-01 Allowance 300 ATC DR
190291 05 2019-03-02 Local Trip 100 TCO CR
195297 22 2019-02-01 Charges 2500 DCA CR
195297 98 2019-02-08 Allowance 900 ATC DR
195297 36 2019-01-30 Local Trip 50 TCO CR
195297 74 2019-02-09 Court fees 300 ATC DR
180876 33 2019-03-01 Travel 1500 DCA CR
180876 97 2019-03-01 Allowance 300 ATC DR
有没有一种方法可以像那样连接帐号值?感谢您的帮助或建议 例如,下面的代码基本上是根据是否包含“会计订单”将所有行拆分为两个列表z[0]和z[1],然后从z[0]读取非会计订单行上的fwf,同时从会计订单列表z[1]添加填充的会计订单号: 输出:
Accounting Order 0 1 2 3
0 190291 1.0 2019-03-01 Travel 1500 DCA CR
1 190291 4.0 2019-03-01 Allowance 300 ATC DR
2 190291 5.0 2019-03-02 Local Trip 100 TCO CR
5 195297 22.0 2019-02-01 Charges 2500 DCA CR
6 195297 98.0 2019-02-08 Allowance 900 ATC DR
7 195297 36.0 2019-01-30 Local Trip 50 TCO CR
8 195297 74.0 2019-02-09 Court fees 300 ATC DR
11 180876 33.0 2019-03-01 Travel 1500 DCA CR
12 180876 97.0 2019-03-01 Allowance 300 ATC DR
按照要求使用回填方法:
# reads the file with positional reference
cols = [(0,2),(2,13),(14,24),(25,29),(30,34),(34,37)]
names = ['id','date','desc','value','type1','type2']
df = pd.read_fwf('my_file_22.txt', header=None, colspecs = cols, names = names)
# creates the new column
df['Accounting Order'] = df[df.desc == 'Accounting']['type1'] + df[df.desc == 'Accounting']['type2']
nans = (df.desc == 'Accounting') | df.id.isna()
df = df.fillna(method='backfill')
df = df[~nans]
它产生以下输出:
id date desc value type1 type2 Accounting Order
0 1.0 2019-03-01 Travel 1500 DCA CR 190291
1 4.0 2019-03-01 Allowance 300 ATC DR 190291
2 5.0 2019-03-02 Local Trip 100 TCO CR 190291
5 22.0 2019-02-01 Charges 2500 DCA CR 195297
6 98.0 2019-02-08 Allowance 900 ATC DR 195297
7 36.0 2019-01-30 Local Trip 50 TCO CR 195297
8 74.0 2019-02-09 Court fees 300 ATC DR 195297
11 33.0 2019-03-01 Travel 1500 DCA CR 180876
12 97.0 2019-03-01 Allowance 300 ATC DR 180876
意见:
1考虑到位置读数,如果列的宽度不相同,是什么导致了一些问题
2考虑的解决方案数据为:
01 2019-03-01 Travel 1500 DCA CR
04 2019-03-01 Allowance 300 ATC DR
05 2019-03-02 Local Trip 100 TCO CR
Accounting Order 190291
22 2019-02-01 Charges 2500 DCA CR
98 2019-02-08 Allowance 900 ATC DR
36 2019-01-30 Local Trip 50 TCO CR
74 2019-02-09 Court fees 300 ATC DR
Accounting Order 195297
33 2019-03-01 Travel 1500 DCA CR
97 2019-03-01 Allowance 300 ATC DR
Accounting Order 180876
你的原始格式是什么?有3个不同的文件或数据帧?单源文件确定,但您的文件没有分隔符?会计顺序在第列?是。我将使用pd.read_fwf进行加载。会计顺序是第5列或第6列这是您的文件的复制品吗?如果可能的话,还可以使用您用来读取的代码,这样就可以在没有错误和假设的情况下执行解决方案。
01 2019-03-01 Travel 1500 DCA CR
04 2019-03-01 Allowance 300 ATC DR
05 2019-03-02 Local Trip 100 TCO CR
Accounting Order 190291
22 2019-02-01 Charges 2500 DCA CR
98 2019-02-08 Allowance 900 ATC DR
36 2019-01-30 Local Trip 50 TCO CR
74 2019-02-09 Court fees 300 ATC DR
Accounting Order 195297
33 2019-03-01 Travel 1500 DCA CR
97 2019-03-01 Allowance 300 ATC DR
Accounting Order 180876