Python 3.x 熊猫数据清理
因此,我正在将表格从PDF读入pandas数据框,但我对pandas还是相当陌生,阅读文档非常令人畏惧。我相信有一个相当简单的方法来做我需要做的事情,但我只是不知道如何做Python 3.x 熊猫数据清理,python-3.x,pandas,dataframe,pdf,Python 3.x,Pandas,Dataframe,Pdf,因此,我正在将表格从PDF读入pandas数据框,但我对pandas还是相当陌生,阅读文档非常令人畏惧。我相信有一个相当简单的方法来做我需要做的事情,但我只是不知道如何做 0 1 2 3 4 5 6 7 8 9 10
0 1 2 3 4 5 6 7 8 9 10 11 12 13
0 NaN col0 col1 col2 col3 col4 col5 col6 col7 col8 col9 col10 col11 NaN
1 NaN Location Date NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 NaN measure1 1** 40** 30** 20** 20 0.02** 3** 10** 5** 100** 15** NaN
3 NaN measure2 100 400 300 200 200 2 300 100 50 1,000 150 NaN
4 NaN location1 1/15/1994 5900 28000 7600 25000 150 --- --- --- --- --- ---
5 NaN NaN 3/16/1994 4900 12000 4400 11000 60 --- --- --- --- --- ---
6 NaN NaN 1/4/1995 1 1 1 1 8 --- --- --- --- --- ---
7 NaN NaN 4/12/2004 8400 34000 4600 17000 <1000 --- --- --- --- --- ---
8 NaN NaN 7/28/2008 3200 15400 4430 17100 172 I --- --- --- --- --- ---
9 NaN NaN 5/19/2011 2000 11000 2500 9200 0.2 1 --- --- --- --- --- ---
10 NaN NaN 8/6/2013 2700 20000 5300 20000 2 6 --- --- --- --- --- ---
11 NaN NaN 11/13/2013 2600 14000 5400 20000 0.1 3 --- --- --- --- --- ---
12 NaN NaN 2/5/2014 3200 19000 6400 25000 18 0 --- --- --- --- --- ---
13 NaN NaN 5/7/2014 2000 15000 4100 16000 22 0 --- --- --- --- --- ---
14 NaN NaN 12/18/2014 2500 32000 5200 20000 8 8 --- --- --- --- --- ---
15 NaN NaN 6/4/2015 1700 15000 5200 21000 44 0 --- --- --- --- --- ---
16 NaN NaN 1/20/2017 1400 15,000 6,300 21,000 1 2 --- --- --- --- --- ---
17 NaN location2 1/15/1994 210 290 39 180 69 --- --- --- --- --- ---
18 NaN NaN 3/24/1994 1500 12000 4100 18000 400 0 --- --- --- --- --- ---
19 NaN NaN 1/4/1995 1 1 1 1 8 --- --- --- --- --- ---
20 NaN NaN 2/1/2000 <1000 8900 5200 58000 <10000 --- --- --- --- --- ---
21 NaN NaN 4/12/2004 <5.0 42 78 540 150 --- --- --- --- --- ---
22 NaN NaN 7/28/2008 23.3 27.9 28 409 9.34 --- --- --- --- --- ---
23 NaN NaN 5/19/2011 1.8 12 22 170 0.2 1 --- --- --- --- --- ---
24 NaN NaN 8/6/2013 4.3 23 71 590 0.1 3 --- --- --- --- --- ---
25 NaN NaN 1/19/2017 0.21 I 0.26 I 7.7 42 0.2 4 --- --- --- --- --- ---
26 NaN location3 3/21/1994 <1 <1 <1 <1 <8 --- --- --- --- --- ---
27 2/1/2000 <1 <1 <1 <2 <10 --- --- --- --- --- --- NaN NaN
01123456789101123
0列NaN列0列1列2列3列4列5列6列7列8列9列10列11列NaN
1楠位置日期楠楠楠楠
2 NaN测量1**40**30**20**20 0.02**3**10**5**100**15**NaN
3 NaN测量2 100 400 300 200 2 300 100 50 1000 150 NaN
4南区1/15/1994 5900 28000 7600 25000 150---
5南南3/16/1994 4900 12000 4400 11000 60-------
6楠楠1/4/1995 18--------
7楠楠4/12/2004 8400 34000 4600 17000关于第一点,您可以尝试以下方法:
df = df.T
df.iloc[:,-1] = df.iloc[:,-1].shift(1)
df = df.T
df = df.drop(df.columns[0], axis=1)
最后一点:
df['1'] = df['1'].ffill()
非常感谢,菲尔的工作很神奇。至于其他两点,我通过使用df.T转置数据帧并在列上使用shift()使它们工作。我不知道切片表示法在数据帧上有效。只需使用axis=1进行移位即可。不需要换位。@Nick不客气!请考虑投票并接受答案:
df['1'] = df['1'].ffill()