python,将数据列表转换为数据帧
我得到了一个txt文件的数据部分,并将其存储在一个列表中。数据应该是年份、数据1、数据2、数据3。它们在原始txt文件中由\t\t或\t分隔,因为我直接附加了数据行。现在我想把它放到一个数据框架中去处理。dataframe有三列year、data1和data2python,将数据列表转换为数据帧,python,Python,我得到了一个txt文件的数据部分,并将其存储在一个列表中。数据应该是年份、数据1、数据2、数据3。它们在原始txt文件中由\t\t或\t分隔,因为我直接附加了数据行。现在我想把它放到一个数据框架中去处理。dataframe有三列year、data1和data2 ['2018\t \t7,107\t4,394\t2,713', '2017\t \t16,478\t10,286\t6,192', '2016\t \t15,944\t9,971\t5,973', '2015\t \t15,071\
['2018\t \t7,107\t4,394\t2,713', '2017\t \t16,478\t10,286\t6,192', '2016\t \t15,944\t9,971\t5,973', '2015\t \t15,071\t9,079\t5,992', '2014\t \t14,415\t8,596\t5,819', '2013\t \t14,259\t8,269\t5,990', '2012\t \t14,010\t8,143\t5,867', '2011\t \t14,149\t8,126\t6,023', '2010\t \t14,505\t7,943\t6,562', '2009\t \t14,632\t8,022\t6,610', '2008\t \t14,207\t7,989\t6,218', '2007\t \t14,400\t8,085\t6,315', '2006\t \t14,750\t8,017\t6,733', '2005\t \t14,497\t7,593\t6,904', '2004\t \t14,155\t7,150\t7,005', '2003\t \t13,285\t6,457\t6,828', '2002\t \t12,821\t6,190\t6,631', '2001\t \t12,702\t6,080\t6,622', '2000\t \t11,942\t5,985\t5,957', '1999\t \t10,872\t5,824\t5,048', '2018\t \t10,362\t5,793\t4,569', '2017\t \t9,546\t5,479\t4,067', '2016\t \t9,222\t5,418\t3,804', '2015\t \t8,859\t5,363\t3,496', '2014\t \t8,203\t5,099\t3,104', '2013\t \t7,766\t4,861\t2,905', '2012\t \t7,091\t4,520\t2,571', '2011\t \t6,953\t4,526\t2,427', '2010\t \t6,632\t4,509\t2,123', '2009\t \t5,929\t4,011\t1,918', '2008\t \t5,909\t4,080\t1,829']
我希望最后有一个列名为year、data1、data2、data3的dataframe
谢谢。通过
re
模块和生成器表达式:
假设我们有每年的数据
In [60]: import re
In [61]: lst = ['2018\t \t7,107\t4,394\t2,713', '2017\t \t16,478\t10,286\t6,192', '2016\t \t15,944\t9,971\t5,973', '2015\t \t15,071\t9,079\t5,992', '
...: 2014\t \t14,415\t8,596\t5,819', '2013\t \t14,259\t8,269\t5,990', '2012\t \t14,010\t8,143\t5,867', '2011\t \t14,149\t8,126\t6,023', '2010\t
...: \t14,505\t7,943\t6,562', '2009\t \t14,632\t8,022\t6,610', '2008\t \t14,207\t7,989\t6,218', '2007\t \t14,400\t8,085\t6,315', '2006\t \t14,750
...: \t8,017\t6,733', '2005\t \t14,497\t7,593\t6,904', '2004\t \t14,155\t7,150\t7,005', '2003\t \t13,285\t6,457\t6,828', '2002\t \t12,821\t6,190\
...: t6,631', '2001\t \t12,702\t6,080\t6,622', '2000\t \t11,942\t5,985\t5,957', '1999\t \t10,872\t5,824\t5,048', '1998\t \t10,362\t5,793\t4,569'
...: , '1997\t \t9,546\t5,479\t4,067', '1996\t \t9,222\t5,418\t3,804', '1995\t \t8,859\t5,363\t3,496', '1994\t \t8,203\t5,099\t3,104', '1993\t \t
...: 7,766\t4,861\t2,905', '1992\t \t7,091\t4,520\t2,571', '1991\t \t6,953\t4,526\t2,427', '1990\t \t6,632\t4,509\t2,123', '1989\t \t5,929\t4,011
...: \t1,918', '1988\t \t5,909\t4,080\t1,829']
In [62]: pat = re.compile(r'[^\s]+')
In [63]: parsed = (pat.findall(i) for i in lst)
In [64]: df = pd.DataFrame({i[0] : i[1:] for i in parsed})
In [65]: df
Out[65]:
1988 1989 1990 1991 1992 1993 1994 1995 1996 ... 2010 2011 2012 2013 2014 2015 2016 2017 2018
0 5,909 5,929 6,632 6,953 7,091 7,766 8,203 8,859 9,222 ... 14,505 14,149 14,010 14,259 14,415 15,071 15,944 16,478 7,107
1 4,080 4,011 4,509 4,526 4,520 4,861 5,099 5,363 5,418 ... 7,943 8,126 8,143 8,269 8,596 9,079 9,971 10,286 4,394
2 1,829 1,918 2,123 2,427 2,571 2,905 3,104 3,496 3,804 ... 6,562 6,023 5,867 5,990 5,819 5,992 5,973 6,192 2,713
[3 rows x 31 columns]
通过
re
模块和生成器表达式:
假设我们有每年的数据
In [60]: import re
In [61]: lst = ['2018\t \t7,107\t4,394\t2,713', '2017\t \t16,478\t10,286\t6,192', '2016\t \t15,944\t9,971\t5,973', '2015\t \t15,071\t9,079\t5,992', '
...: 2014\t \t14,415\t8,596\t5,819', '2013\t \t14,259\t8,269\t5,990', '2012\t \t14,010\t8,143\t5,867', '2011\t \t14,149\t8,126\t6,023', '2010\t
...: \t14,505\t7,943\t6,562', '2009\t \t14,632\t8,022\t6,610', '2008\t \t14,207\t7,989\t6,218', '2007\t \t14,400\t8,085\t6,315', '2006\t \t14,750
...: \t8,017\t6,733', '2005\t \t14,497\t7,593\t6,904', '2004\t \t14,155\t7,150\t7,005', '2003\t \t13,285\t6,457\t6,828', '2002\t \t12,821\t6,190\
...: t6,631', '2001\t \t12,702\t6,080\t6,622', '2000\t \t11,942\t5,985\t5,957', '1999\t \t10,872\t5,824\t5,048', '1998\t \t10,362\t5,793\t4,569'
...: , '1997\t \t9,546\t5,479\t4,067', '1996\t \t9,222\t5,418\t3,804', '1995\t \t8,859\t5,363\t3,496', '1994\t \t8,203\t5,099\t3,104', '1993\t \t
...: 7,766\t4,861\t2,905', '1992\t \t7,091\t4,520\t2,571', '1991\t \t6,953\t4,526\t2,427', '1990\t \t6,632\t4,509\t2,123', '1989\t \t5,929\t4,011
...: \t1,918', '1988\t \t5,909\t4,080\t1,829']
In [62]: pat = re.compile(r'[^\s]+')
In [63]: parsed = (pat.findall(i) for i in lst)
In [64]: df = pd.DataFrame({i[0] : i[1:] for i in parsed})
In [65]: df
Out[65]:
1988 1989 1990 1991 1992 1993 1994 1995 1996 ... 2010 2011 2012 2013 2014 2015 2016 2017 2018
0 5,909 5,929 6,632 6,953 7,091 7,766 8,203 8,859 9,222 ... 14,505 14,149 14,010 14,259 14,415 15,071 15,944 16,478 7,107
1 4,080 4,011 4,509 4,526 4,520 4,861 5,099 5,363 5,418 ... 7,943 8,126 8,143 8,269 8,596 9,079 9,971 10,286 4,394
2 1,829 1,918 2,123 2,427 2,571 2,905 3,104 3,496 3,804 ... 6,562 6,023 5,867 5,990 5,819 5,992 5,973 6,192 2,713
[3 rows x 31 columns]
不使用正则表达式的另一种方法(但不像正则表达式那样整洁),使用列表理解清理数据,然后将其放入dict中,以从中创建数据帧:
data = ['2018\t \t7,107\t4,394\t2,713', '2017\t \t16,478\t10,286\t6,192',
'2016\t \t15,944\t9,971\t5,973', '2015\t \t15,071\t9,079\t5,992',
'2014\t \t14,415\t8,596\t5,819', '2013\t \t14,259\t8,269\t5,990',
'2012\t \t14,010\t8,143\t5,867', '2011\t \t14,149\t8,126\t6,023',
'2010\t \t14,505\t7,943\t6,562', '2009\t \t14,632\t8,022\t6,610',
'2008\t \t14,207\t7,989\t6,218', '2007\t \t14,400\t8,085\t6,315',
'2006\t \t14,750\t8,017\t6,733', '2005\t \t14,497\t7,593\t6,904',
'2004\t \t14,155\t7,150\t7,005', '2003\t \t13,285\t6,457\t6,828',
'2002\t \t12,821\t6,190\t6,631', '2001\t \t12,702\t6,080\t6,622',
'2000\t \t11,942\t5,985\t5,957', '1999\t \t10,872\t5,824\t5,048',
'1998\t \t10,362\t5,793\t4,569', '1997\t \t9,546\t5,479\t4,067',
'1996\t \t9,222\t5,418\t3,804', '1995\t \t8,859\t5,363\t3,496',
'1994\t \t8,203\t5,099\t3,104', '1993\t \t7,766\t4,861\t2,905',
'1992\t \t7,091\t4,520\t2,571', '1991\t \t6,953\t4,526\t2,427',
'1990\t \t6,632\t4,509\t2,123', '1989\t \t5,929\t4,011\t1,918',
'1988\t \t5,909\t4,080\t1,829']
# partition and clean the data
cleaned = [ [x.strip() for x in year.split("\t") if x.strip()] for year in data ]
# make a dict
dataCleaned = {x:y for x,*y in cleaned}
print (dataCleaned)
import pandas as pd
df = pd.DataFrame(dataCleaned)
print(df)
输出:
# the dict
{'2018': ['7,107', '4,394', '2,713'], '2017': ['16,478', '10,286', '6,192'],
'2016': ['15,944', '9,971', '5,973'], '2015': ['15,071', '9,079', '5,992'],
'2014': ['14,415', '8,596', '5,819'], '2013': ['14,259', '8,269', '5,990'],
'2012': ['14,010', '8,143', '5,867'], '2011': ['14,149', '8,126', '6,023'],
'2010': ['14,505', '7,943', '6,562'], '2009': ['14,632', '8,022', '6,610'],
'2008': ['14,207', '7,989', '6,218'], '2007': ['14,400', '8,085', '6,315'],
'2006': ['14,750', '8,017', '6,733'], '2005': ['14,497', '7,593', '6,904'],
'2004': ['14,155', '7,150', '7,005'], '2003': ['13,285', '6,457', '6,828'],
'2002': ['12,821', '6,190', '6,631'], '2001': ['12,702', '6,080', '6,622'],
'2000': ['11,942', '5,985', '5,957'], '1999': ['10,872', '5,824', '5,048'],
'1998': ['10,362', '5,793', '4,569'], '1997': ['9,546', '5,479', '4,067'],
'1996': ['9,222', '5,418', '3,804'], '1995': ['8,859', '5,363', '3,496'],
'1994': ['8,203', '5,099', '3,104'], '1993': ['7,766', '4,861', '2,905'],
'1992': ['7,091', '4,520', '2,571'], '1991': ['6,953', '4,526', '2,427'],
'1990': ['6,632', '4,509', '2,123'], '1989': ['5,929', '4,011', '1,918'],
'1988': ['5,909', '4,080', '1,829']
}
编辑后:
import pandas as pd
data = ['2018\t \t7,107\t4,394\t2,713', '2017\t \t16,478\t10,286\t6,192',
'2016\t \t15,944\t9,971\t5,973', '2015\t \t15,071\t9,079\t5,992',
'2014\t \t14,415\t8,596\t5,819', '2013\t \t14,259\t8,269\t5,990',
'2012\t \t14,010\t8,143\t5,867', '2011\t \t14,149\t8,126\t6,023',
'2010\t \t14,505\t7,943\t6,562', '2009\t \t14,632\t8,022\t6,610',
'2008\t \t14,207\t7,989\t6,218', '2007\t \t14,400\t8,085\t6,315',
'2006\t \t14,750\t8,017\t6,733', '2005\t \t14,497\t7,593\t6,904',
'2004\t \t14,155\t7,150\t7,005', '2003\t \t13,285\t6,457\t6,828',
'2002\t \t12,821\t6,190\t6,631', '2001\t \t12,702\t6,080\t6,622',
'2000\t \t11,942\t5,985\t5,957', '1999\t \t10,872\t5,824\t5,048',
'2018\t \t10,362\t5,793\t4,569', '2017\t \t9,546\t5,479\t4,067',
'2016\t \t9,222\t5,418\t3,804', '2015\t \t8,859\t5,363\t3,496',
'2014\t \t8,203\t5,099\t3,104', '2013\t \t7,766\t4,861\t2,905',
'2012\t \t7,091\t4,520\t2,571', '2011\t \t6,953\t4,526\t2,427',
'2010\t \t6,632\t4,509\t2,123', '2009\t \t5,929\t4,011\t1,918',
'2008\t \t5,909\t4,080\t1,829']
# partition and clean the data
cleaned = [ [x.strip() for x in year.split("\t") if x.strip()] for year in data ]
import pandas as pd
df = pd.DataFrame(cleaned,columns=['year', 'data1', 'data2', 'data3'])
print(df)
编辑后的输出:
year data1 data2 data3
0 2018 7,107 4,394 2,713
1 2017 16,478 10,286 6,192
2 2016 15,944 9,971 5,973
3 2015 15,071 9,079 5,992
4 2014 14,415 8,596 5,819
5 2013 14,259 8,269 5,990
6 2012 14,010 8,143 5,867
7 2011 14,149 8,126 6,023
8 2010 14,505 7,943 6,562
9 2009 14,632 8,022 6,610
10 2008 14,207 7,989 6,218
11 2007 14,400 8,085 6,315
12 2006 14,750 8,017 6,733
13 2005 14,497 7,593 6,904
14 2004 14,155 7,150 7,005
15 2003 13,285 6,457 6,828
16 2002 12,821 6,190 6,631
17 2001 12,702 6,080 6,622
18 2000 11,942 5,985 5,957
19 1999 10,872 5,824 5,048
20 2018 10,362 5,793 4,569
21 2017 9,546 5,479 4,067
22 2016 9,222 5,418 3,804
23 2015 8,859 5,363 3,496
24 2014 8,203 5,099 3,104
25 2013 7,766 4,861 2,905
26 2012 7,091 4,520 2,571
27 2011 6,953 4,526 2,427
28 2010 6,632 4,509 2,123
29 2009 5,929 4,011 1,918
30 2008 5,909 4,080 1,829
编辑: 与以下内容大致相同:
alsoCleaned = []
for year in data:
part = [] # collect all parts of one string
for x in year.split("\t"): # split the one string
partCleaned = x.strip() # remove whitespaces from x
if partCleaned : # only if now got content
part.append(partCleaned) # add to part
alsoCleaned.append(part) # done all parts so add to big list
part = []
print(alsoCleaned)
==>
不使用正则表达式的另一种方法(但不像正则表达式那样整洁),使用列表理解清理数据,然后将其放入dict中,以从中创建数据帧:
data = ['2018\t \t7,107\t4,394\t2,713', '2017\t \t16,478\t10,286\t6,192',
'2016\t \t15,944\t9,971\t5,973', '2015\t \t15,071\t9,079\t5,992',
'2014\t \t14,415\t8,596\t5,819', '2013\t \t14,259\t8,269\t5,990',
'2012\t \t14,010\t8,143\t5,867', '2011\t \t14,149\t8,126\t6,023',
'2010\t \t14,505\t7,943\t6,562', '2009\t \t14,632\t8,022\t6,610',
'2008\t \t14,207\t7,989\t6,218', '2007\t \t14,400\t8,085\t6,315',
'2006\t \t14,750\t8,017\t6,733', '2005\t \t14,497\t7,593\t6,904',
'2004\t \t14,155\t7,150\t7,005', '2003\t \t13,285\t6,457\t6,828',
'2002\t \t12,821\t6,190\t6,631', '2001\t \t12,702\t6,080\t6,622',
'2000\t \t11,942\t5,985\t5,957', '1999\t \t10,872\t5,824\t5,048',
'1998\t \t10,362\t5,793\t4,569', '1997\t \t9,546\t5,479\t4,067',
'1996\t \t9,222\t5,418\t3,804', '1995\t \t8,859\t5,363\t3,496',
'1994\t \t8,203\t5,099\t3,104', '1993\t \t7,766\t4,861\t2,905',
'1992\t \t7,091\t4,520\t2,571', '1991\t \t6,953\t4,526\t2,427',
'1990\t \t6,632\t4,509\t2,123', '1989\t \t5,929\t4,011\t1,918',
'1988\t \t5,909\t4,080\t1,829']
# partition and clean the data
cleaned = [ [x.strip() for x in year.split("\t") if x.strip()] for year in data ]
# make a dict
dataCleaned = {x:y for x,*y in cleaned}
print (dataCleaned)
import pandas as pd
df = pd.DataFrame(dataCleaned)
print(df)
输出:
# the dict
{'2018': ['7,107', '4,394', '2,713'], '2017': ['16,478', '10,286', '6,192'],
'2016': ['15,944', '9,971', '5,973'], '2015': ['15,071', '9,079', '5,992'],
'2014': ['14,415', '8,596', '5,819'], '2013': ['14,259', '8,269', '5,990'],
'2012': ['14,010', '8,143', '5,867'], '2011': ['14,149', '8,126', '6,023'],
'2010': ['14,505', '7,943', '6,562'], '2009': ['14,632', '8,022', '6,610'],
'2008': ['14,207', '7,989', '6,218'], '2007': ['14,400', '8,085', '6,315'],
'2006': ['14,750', '8,017', '6,733'], '2005': ['14,497', '7,593', '6,904'],
'2004': ['14,155', '7,150', '7,005'], '2003': ['13,285', '6,457', '6,828'],
'2002': ['12,821', '6,190', '6,631'], '2001': ['12,702', '6,080', '6,622'],
'2000': ['11,942', '5,985', '5,957'], '1999': ['10,872', '5,824', '5,048'],
'1998': ['10,362', '5,793', '4,569'], '1997': ['9,546', '5,479', '4,067'],
'1996': ['9,222', '5,418', '3,804'], '1995': ['8,859', '5,363', '3,496'],
'1994': ['8,203', '5,099', '3,104'], '1993': ['7,766', '4,861', '2,905'],
'1992': ['7,091', '4,520', '2,571'], '1991': ['6,953', '4,526', '2,427'],
'1990': ['6,632', '4,509', '2,123'], '1989': ['5,929', '4,011', '1,918'],
'1988': ['5,909', '4,080', '1,829']
}
编辑后:
import pandas as pd
data = ['2018\t \t7,107\t4,394\t2,713', '2017\t \t16,478\t10,286\t6,192',
'2016\t \t15,944\t9,971\t5,973', '2015\t \t15,071\t9,079\t5,992',
'2014\t \t14,415\t8,596\t5,819', '2013\t \t14,259\t8,269\t5,990',
'2012\t \t14,010\t8,143\t5,867', '2011\t \t14,149\t8,126\t6,023',
'2010\t \t14,505\t7,943\t6,562', '2009\t \t14,632\t8,022\t6,610',
'2008\t \t14,207\t7,989\t6,218', '2007\t \t14,400\t8,085\t6,315',
'2006\t \t14,750\t8,017\t6,733', '2005\t \t14,497\t7,593\t6,904',
'2004\t \t14,155\t7,150\t7,005', '2003\t \t13,285\t6,457\t6,828',
'2002\t \t12,821\t6,190\t6,631', '2001\t \t12,702\t6,080\t6,622',
'2000\t \t11,942\t5,985\t5,957', '1999\t \t10,872\t5,824\t5,048',
'2018\t \t10,362\t5,793\t4,569', '2017\t \t9,546\t5,479\t4,067',
'2016\t \t9,222\t5,418\t3,804', '2015\t \t8,859\t5,363\t3,496',
'2014\t \t8,203\t5,099\t3,104', '2013\t \t7,766\t4,861\t2,905',
'2012\t \t7,091\t4,520\t2,571', '2011\t \t6,953\t4,526\t2,427',
'2010\t \t6,632\t4,509\t2,123', '2009\t \t5,929\t4,011\t1,918',
'2008\t \t5,909\t4,080\t1,829']
# partition and clean the data
cleaned = [ [x.strip() for x in year.split("\t") if x.strip()] for year in data ]
import pandas as pd
df = pd.DataFrame(cleaned,columns=['year', 'data1', 'data2', 'data3'])
print(df)
编辑后的输出:
year data1 data2 data3
0 2018 7,107 4,394 2,713
1 2017 16,478 10,286 6,192
2 2016 15,944 9,971 5,973
3 2015 15,071 9,079 5,992
4 2014 14,415 8,596 5,819
5 2013 14,259 8,269 5,990
6 2012 14,010 8,143 5,867
7 2011 14,149 8,126 6,023
8 2010 14,505 7,943 6,562
9 2009 14,632 8,022 6,610
10 2008 14,207 7,989 6,218
11 2007 14,400 8,085 6,315
12 2006 14,750 8,017 6,733
13 2005 14,497 7,593 6,904
14 2004 14,155 7,150 7,005
15 2003 13,285 6,457 6,828
16 2002 12,821 6,190 6,631
17 2001 12,702 6,080 6,622
18 2000 11,942 5,985 5,957
19 1999 10,872 5,824 5,048
20 2018 10,362 5,793 4,569
21 2017 9,546 5,479 4,067
22 2016 9,222 5,418 3,804
23 2015 8,859 5,363 3,496
24 2014 8,203 5,099 3,104
25 2013 7,766 4,861 2,905
26 2012 7,091 4,520 2,571
27 2011 6,953 4,526 2,427
28 2010 6,632 4,509 2,123
29 2009 5,929 4,011 1,918
30 2008 5,909 4,080 1,829
编辑: 与以下内容大致相同:
alsoCleaned = []
for year in data:
part = [] # collect all parts of one string
for x in year.split("\t"): # split the one string
partCleaned = x.strip() # remove whitespaces from x
if partCleaned : # only if now got content
part.append(partCleaned) # add to part
alsoCleaned.append(part) # done all parts so add to big list
part = []
print(alsoCleaned)
==>
数据帧。。。如:pandas.dataframe?为什么要将其写入文件?每个项目中有4个值(
'2018\t\t7107\t4394\t2713'
),为什么数据应该是年、数据1、数据2(3列)?我想最后将其放入数据框中。我也试着把它放进一个文件,然后把它读入dataframe,但失败了。有四列,year,data1,data2,data3.dataframe。。。如:pandas.dataframe?为什么要将其写入文件?每个项目中有4个值('2018\t\t7107\t4394\t2713'
),为什么数据应该是年、数据1、数据2(3列)?我想最后将其放入数据框中。我还尝试将其放入一个文件,然后将其读入数据框,但失败了。共有四列,year、data1、data2、data3。正则表达式将每个['2018\t\t7107\t4394\t2713',…]
分区为[[2018'、'7107'、'4394'、'2713',…]
当输入数据帧时,您的dict理解能力在不断提高?我有几个1988-2018年的组,但最后代码只保留了1988-2018年的最后一组,其他组丢失了。我理解解析所有数字的意义。不理解代码的最后一行“df=pd.DataFrame({i[0]:i[1:]for i in parsed})”。也许这会使代码出错。@zilong,你1988-2018年的几组数据是什么意思(数据是具体的)?你应该先了解一下dict
理解,然后再考虑是否有什么东西使代码出错,这意味着一个面板数据。当有新数据时,dictionary的值可能会不断变化,从而最终保持最后一个时间序列。是的,我的错,我应该在问题中的数据样本中放一个短面板。正则表达式将每个['2018\t\t7107\t4394\t2713',…]
划分为['2018','7107','4394','2713',…]
,在输入数据帧时,您的dict理解是动态的?我有几个1988-2018年的组,但最终代码只保留了1988-2018年的最后一组,而失去了其他组。我理解解析所有数字的意义。不理解代码的最后一行“df=pd.DataFrame({i[0]:i[1:]for i in parsed})”。也许这会使代码出错。@zilong,你1988-2018年的几组数据是什么意思(数据是具体的)?你应该先了解一下dict
理解,然后再考虑是否有什么东西使代码出错,这意味着一个面板数据。当有新数据时,dictionary的值可能会不断变化,从而最终保持最后一个时间序列。是的,我的错,我应该在问题中的数据样本中放一个短面板。我的真实数据很长,有几组1988-2018年的数据,代码最后只有一组1988-2018年的数据,而丢失了其他数据。@zilong你为什么在问题中准备和发布不符合真实数据的数据?已编辑-有关固定列名,请参阅其他答案。抱歉。我的数据太长了,我必须在这里复制一部分数据来缩短问题。然而,我没有在第一时间保持数据的代表性。非常感谢你。问题解决了。我仍然无法理解这行重要的代码。我知道strp()使用空格和split()拆分\t。“数据中的年份”、“年份中的x”是什么意思?或者我应该学习什么来理解这一点(可能是一些网页或资源)。cleaned=[[x.strip()表示年份中的x.split(“\t”)如果x.strip()]表示数据中的年份]@zilong-这称为列表理解。你得倒着读<代码>数据是一个列表<数据中年份的代码>表示<代码>年份将依次取下数据的每个值<代码>[x.strip()表示年份中的x.split(“\t”),如果x.strip()][/code>表示-在“|”处拆分年份(给出另一个列表)和表示年份中的x.split(“|”)
表示x将依次取每个拆分部分的值。”x、 strip()'删除空白,如果x.strip()
只匹配剥离后不为空的x(从结果中删除'\t\t'
空白).我的真实数据很长,有几组1988-2018年的数据,代码最后只有一组1988-2018年的数据,其他数据都丢失了。@zilong你为什么在你的问题中准备和发布不符合你真实数据的数据?已编辑-有关固定列名,请参阅其他答案。抱歉。我的数据太长了,我必须在这里复制一部分数据来缩短问题。然而,我没有在第一时间保持数据的代表性。非常感谢你。问题解决了。我仍然无法理解这行重要的代码。我知道strp()使用空格和split()拆分\t。“数据中的年份”、“年份中的x”是什么