python,将数据列表转换为数据帧

python,将数据列表转换为数据帧,python,Python,我得到了一个txt文件的数据部分,并将其存储在一个列表中。数据应该是年份、数据1、数据2、数据3。它们在原始txt文件中由\t\t或\t分隔,因为我直接附加了数据行。现在我想把它放到一个数据框架中去处理。dataframe有三列year、data1和data2 ['2018\t \t7,107\t4,394\t2,713', '2017\t \t16,478\t10,286\t6,192', '2016\t \t15,944\t9,971\t5,973', '2015\t \t15,071\

我得到了一个txt文件的数据部分,并将其存储在一个列表中。数据应该是年份、数据1、数据2、数据3。它们在原始txt文件中由\t\t或\t分隔,因为我直接附加了数据行。现在我想把它放到一个数据框架中去处理。dataframe有三列year、data1和data2

['2018\t  \t7,107\t4,394\t2,713', '2017\t \t16,478\t10,286\t6,192', '2016\t  \t15,944\t9,971\t5,973', '2015\t \t15,071\t9,079\t5,992', '2014\t  \t14,415\t8,596\t5,819', '2013\t \t14,259\t8,269\t5,990', '2012\t  \t14,010\t8,143\t5,867', '2011\t \t14,149\t8,126\t6,023', '2010\t  \t14,505\t7,943\t6,562', '2009\t \t14,632\t8,022\t6,610', '2008\t  \t14,207\t7,989\t6,218', '2007\t \t14,400\t8,085\t6,315', '2006\t  \t14,750\t8,017\t6,733', '2005\t \t14,497\t7,593\t6,904', '2004\t  \t14,155\t7,150\t7,005', '2003\t \t13,285\t6,457\t6,828', '2002\t  \t12,821\t6,190\t6,631', '2001\t \t12,702\t6,080\t6,622', '2000\t  \t11,942\t5,985\t5,957', '1999\t \t10,872\t5,824\t5,048', '2018\t   \t10,362\t5,793\t4,569', '2017\t \t9,546\t5,479\t4,067', '2016\t  \t9,222\t5,418\t3,804', '2015\t \t8,859\t5,363\t3,496', '2014\t  \t8,203\t5,099\t3,104', '2013\t \t7,766\t4,861\t2,905', '2012\t  \t7,091\t4,520\t2,571', '2011\t \t6,953\t4,526\t2,427', '2010\t  \t6,632\t4,509\t2,123', '2009\t \t5,929\t4,011\t1,918', '2008\t  \t5,909\t4,080\t1,829']
我希望最后有一个列名为year、data1、data2、data3的dataframe


谢谢。

通过
re
模块和生成器表达式:

假设我们有每年的数据

In [60]: import re

In [61]: lst = ['2018\t  \t7,107\t4,394\t2,713', '2017\t \t16,478\t10,286\t6,192', '2016\t  \t15,944\t9,971\t5,973', '2015\t \t15,071\t9,079\t5,992', '
    ...: 2014\t  \t14,415\t8,596\t5,819', '2013\t \t14,259\t8,269\t5,990', '2012\t  \t14,010\t8,143\t5,867', '2011\t \t14,149\t8,126\t6,023', '2010\t  
    ...: \t14,505\t7,943\t6,562', '2009\t \t14,632\t8,022\t6,610', '2008\t  \t14,207\t7,989\t6,218', '2007\t \t14,400\t8,085\t6,315', '2006\t  \t14,750
    ...: \t8,017\t6,733', '2005\t \t14,497\t7,593\t6,904', '2004\t  \t14,155\t7,150\t7,005', '2003\t \t13,285\t6,457\t6,828', '2002\t  \t12,821\t6,190\
    ...: t6,631', '2001\t \t12,702\t6,080\t6,622', '2000\t  \t11,942\t5,985\t5,957', '1999\t \t10,872\t5,824\t5,048', '1998\t   \t10,362\t5,793\t4,569'
    ...: , '1997\t \t9,546\t5,479\t4,067', '1996\t  \t9,222\t5,418\t3,804', '1995\t \t8,859\t5,363\t3,496', '1994\t  \t8,203\t5,099\t3,104', '1993\t \t
    ...: 7,766\t4,861\t2,905', '1992\t  \t7,091\t4,520\t2,571', '1991\t \t6,953\t4,526\t2,427', '1990\t  \t6,632\t4,509\t2,123', '1989\t \t5,929\t4,011
    ...: \t1,918', '1988\t  \t5,909\t4,080\t1,829']

In [62]: pat = re.compile(r'[^\s]+')

In [63]: parsed = (pat.findall(i) for i in lst)

In [64]: df = pd.DataFrame({i[0] : i[1:] for i in parsed})

In [65]: df
Out[65]: 
    1988   1989   1990   1991   1992   1993   1994   1995   1996  ...      2010    2011    2012    2013    2014    2015    2016    2017   2018
0  5,909  5,929  6,632  6,953  7,091  7,766  8,203  8,859  9,222  ...    14,505  14,149  14,010  14,259  14,415  15,071  15,944  16,478  7,107
1  4,080  4,011  4,509  4,526  4,520  4,861  5,099  5,363  5,418  ...     7,943   8,126   8,143   8,269   8,596   9,079   9,971  10,286  4,394
2  1,829  1,918  2,123  2,427  2,571  2,905  3,104  3,496  3,804  ...     6,562   6,023   5,867   5,990   5,819   5,992   5,973   6,192  2,713

[3 rows x 31 columns]

通过
re
模块和生成器表达式:

假设我们有每年的数据

In [60]: import re

In [61]: lst = ['2018\t  \t7,107\t4,394\t2,713', '2017\t \t16,478\t10,286\t6,192', '2016\t  \t15,944\t9,971\t5,973', '2015\t \t15,071\t9,079\t5,992', '
    ...: 2014\t  \t14,415\t8,596\t5,819', '2013\t \t14,259\t8,269\t5,990', '2012\t  \t14,010\t8,143\t5,867', '2011\t \t14,149\t8,126\t6,023', '2010\t  
    ...: \t14,505\t7,943\t6,562', '2009\t \t14,632\t8,022\t6,610', '2008\t  \t14,207\t7,989\t6,218', '2007\t \t14,400\t8,085\t6,315', '2006\t  \t14,750
    ...: \t8,017\t6,733', '2005\t \t14,497\t7,593\t6,904', '2004\t  \t14,155\t7,150\t7,005', '2003\t \t13,285\t6,457\t6,828', '2002\t  \t12,821\t6,190\
    ...: t6,631', '2001\t \t12,702\t6,080\t6,622', '2000\t  \t11,942\t5,985\t5,957', '1999\t \t10,872\t5,824\t5,048', '1998\t   \t10,362\t5,793\t4,569'
    ...: , '1997\t \t9,546\t5,479\t4,067', '1996\t  \t9,222\t5,418\t3,804', '1995\t \t8,859\t5,363\t3,496', '1994\t  \t8,203\t5,099\t3,104', '1993\t \t
    ...: 7,766\t4,861\t2,905', '1992\t  \t7,091\t4,520\t2,571', '1991\t \t6,953\t4,526\t2,427', '1990\t  \t6,632\t4,509\t2,123', '1989\t \t5,929\t4,011
    ...: \t1,918', '1988\t  \t5,909\t4,080\t1,829']

In [62]: pat = re.compile(r'[^\s]+')

In [63]: parsed = (pat.findall(i) for i in lst)

In [64]: df = pd.DataFrame({i[0] : i[1:] for i in parsed})

In [65]: df
Out[65]: 
    1988   1989   1990   1991   1992   1993   1994   1995   1996  ...      2010    2011    2012    2013    2014    2015    2016    2017   2018
0  5,909  5,929  6,632  6,953  7,091  7,766  8,203  8,859  9,222  ...    14,505  14,149  14,010  14,259  14,415  15,071  15,944  16,478  7,107
1  4,080  4,011  4,509  4,526  4,520  4,861  5,099  5,363  5,418  ...     7,943   8,126   8,143   8,269   8,596   9,079   9,971  10,286  4,394
2  1,829  1,918  2,123  2,427  2,571  2,905  3,104  3,496  3,804  ...     6,562   6,023   5,867   5,990   5,819   5,992   5,973   6,192  2,713

[3 rows x 31 columns]

不使用正则表达式的另一种方法(但不像正则表达式那样整洁),使用列表理解清理数据,然后将其放入dict中,以从中创建数据帧:

data =  ['2018\t  \t7,107\t4,394\t2,713',              '2017\t \t16,478\t10,286\t6,192', 
         '2016\t  \t15,944\t9,971\t5,973',             '2015\t \t15,071\t9,079\t5,992', 
         '2014\t  \t14,415\t8,596\t5,819',             '2013\t \t14,259\t8,269\t5,990', 
         '2012\t  \t14,010\t8,143\t5,867',             '2011\t \t14,149\t8,126\t6,023', 
         '2010\t  \t14,505\t7,943\t6,562',             '2009\t \t14,632\t8,022\t6,610', 
         '2008\t  \t14,207\t7,989\t6,218',             '2007\t \t14,400\t8,085\t6,315', 
         '2006\t  \t14,750\t8,017\t6,733',             '2005\t \t14,497\t7,593\t6,904', 
         '2004\t  \t14,155\t7,150\t7,005',             '2003\t \t13,285\t6,457\t6,828', 
         '2002\t  \t12,821\t6,190\t6,631',             '2001\t \t12,702\t6,080\t6,622', 
         '2000\t  \t11,942\t5,985\t5,957',             '1999\t \t10,872\t5,824\t5,048', 
         '1998\t   \t10,362\t5,793\t4,569',            '1997\t \t9,546\t5,479\t4,067', 
         '1996\t  \t9,222\t5,418\t3,804',              '1995\t \t8,859\t5,363\t3,496', 
         '1994\t  \t8,203\t5,099\t3,104',              '1993\t \t7,766\t4,861\t2,905', 
         '1992\t  \t7,091\t4,520\t2,571',              '1991\t \t6,953\t4,526\t2,427', 
         '1990\t  \t6,632\t4,509\t2,123',              '1989\t \t5,929\t4,011\t1,918', 
         '1988\t  \t5,909\t4,080\t1,829']

# partition and clean the data
cleaned = [ [x.strip() for x in year.split("\t") if x.strip()] for year in data  ]
# make a dict
dataCleaned = {x:y for x,*y in cleaned}

print (dataCleaned)

import pandas as pd
df = pd.DataFrame(dataCleaned)

print(df)
输出:

# the dict 
{'2018': ['7,107', '4,394', '2,713'], '2017': ['16,478', '10,286', '6,192'], 
 '2016': ['15,944', '9,971', '5,973'], '2015': ['15,071', '9,079', '5,992'], 
 '2014': ['14,415', '8,596', '5,819'], '2013': ['14,259', '8,269', '5,990'], 
 '2012': ['14,010', '8,143', '5,867'], '2011': ['14,149', '8,126', '6,023'], 
 '2010': ['14,505', '7,943', '6,562'], '2009': ['14,632', '8,022', '6,610'], 
 '2008': ['14,207', '7,989', '6,218'], '2007': ['14,400', '8,085', '6,315'], 
 '2006': ['14,750', '8,017', '6,733'], '2005': ['14,497', '7,593', '6,904'], 
 '2004': ['14,155', '7,150', '7,005'], '2003': ['13,285', '6,457', '6,828'], 
 '2002': ['12,821', '6,190', '6,631'], '2001': ['12,702', '6,080', '6,622'], 
 '2000': ['11,942', '5,985', '5,957'], '1999': ['10,872', '5,824', '5,048'], 
 '1998': ['10,362', '5,793', '4,569'], '1997': ['9,546', '5,479', '4,067'], 
 '1996': ['9,222', '5,418', '3,804'], '1995': ['8,859', '5,363', '3,496'], 
 '1994': ['8,203', '5,099', '3,104'], '1993': ['7,766', '4,861', '2,905'], 
 '1992': ['7,091', '4,520', '2,571'], '1991': ['6,953', '4,526', '2,427'], 
 '1990': ['6,632', '4,509', '2,123'], '1989': ['5,929', '4,011', '1,918'], 
 '1988': ['5,909', '4,080', '1,829']
}


编辑后:

import pandas as pd

data = ['2018\t  \t7,107\t4,394\t2,713', '2017\t \t16,478\t10,286\t6,192', 
        '2016\t  \t15,944\t9,971\t5,973', '2015\t \t15,071\t9,079\t5,992', 
        '2014\t  \t14,415\t8,596\t5,819', '2013\t \t14,259\t8,269\t5,990', 
        '2012\t  \t14,010\t8,143\t5,867', '2011\t \t14,149\t8,126\t6,023', 
        '2010\t  \t14,505\t7,943\t6,562', '2009\t \t14,632\t8,022\t6,610', 
        '2008\t  \t14,207\t7,989\t6,218', '2007\t \t14,400\t8,085\t6,315', 
        '2006\t  \t14,750\t8,017\t6,733', '2005\t \t14,497\t7,593\t6,904', 
        '2004\t  \t14,155\t7,150\t7,005', '2003\t \t13,285\t6,457\t6,828', 
        '2002\t  \t12,821\t6,190\t6,631', '2001\t \t12,702\t6,080\t6,622', 
        '2000\t  \t11,942\t5,985\t5,957', '1999\t \t10,872\t5,824\t5,048', 
        '2018\t   \t10,362\t5,793\t4,569', '2017\t \t9,546\t5,479\t4,067', 
        '2016\t  \t9,222\t5,418\t3,804', '2015\t \t8,859\t5,363\t3,496', 
        '2014\t  \t8,203\t5,099\t3,104', '2013\t \t7,766\t4,861\t2,905', 
        '2012\t  \t7,091\t4,520\t2,571', '2011\t \t6,953\t4,526\t2,427', 
        '2010\t  \t6,632\t4,509\t2,123', '2009\t \t5,929\t4,011\t1,918', 
        '2008\t  \t5,909\t4,080\t1,829']

# partition and clean the data
cleaned = [ [x.strip() for x in year.split("\t") if x.strip()] for year in data  ]

import pandas as pd
df = pd.DataFrame(cleaned,columns=['year', 'data1', 'data2', 'data3'])

print(df)
编辑后的输出:

    year   data1   data2  data3
0   2018   7,107   4,394  2,713
1   2017  16,478  10,286  6,192
2   2016  15,944   9,971  5,973
3   2015  15,071   9,079  5,992
4   2014  14,415   8,596  5,819
5   2013  14,259   8,269  5,990
6   2012  14,010   8,143  5,867
7   2011  14,149   8,126  6,023
8   2010  14,505   7,943  6,562
9   2009  14,632   8,022  6,610
10  2008  14,207   7,989  6,218
11  2007  14,400   8,085  6,315
12  2006  14,750   8,017  6,733
13  2005  14,497   7,593  6,904
14  2004  14,155   7,150  7,005
15  2003  13,285   6,457  6,828
16  2002  12,821   6,190  6,631
17  2001  12,702   6,080  6,622
18  2000  11,942   5,985  5,957
19  1999  10,872   5,824  5,048
20  2018  10,362   5,793  4,569
21  2017   9,546   5,479  4,067
22  2016   9,222   5,418  3,804
23  2015   8,859   5,363  3,496
24  2014   8,203   5,099  3,104
25  2013   7,766   4,861  2,905
26  2012   7,091   4,520  2,571
27  2011   6,953   4,526  2,427
28  2010   6,632   4,509  2,123
29  2009   5,929   4,011  1,918
30  2008   5,909   4,080  1,829 

编辑:

与以下内容大致相同:

alsoCleaned = []
for year in data:
    part = []    # collect all parts of one string
    for x in year.split("\t"):  # split the one string
        partCleaned = x.strip()   # remove whitespaces from x
        if partCleaned :          # only if now got content
            part.append(partCleaned) # add to part
    alsoCleaned.append(part)    # done all parts  so add to big list
    part = []

print(alsoCleaned)
==>


不使用正则表达式的另一种方法(但不像正则表达式那样整洁),使用列表理解清理数据,然后将其放入dict中,以从中创建数据帧:

data =  ['2018\t  \t7,107\t4,394\t2,713',              '2017\t \t16,478\t10,286\t6,192', 
         '2016\t  \t15,944\t9,971\t5,973',             '2015\t \t15,071\t9,079\t5,992', 
         '2014\t  \t14,415\t8,596\t5,819',             '2013\t \t14,259\t8,269\t5,990', 
         '2012\t  \t14,010\t8,143\t5,867',             '2011\t \t14,149\t8,126\t6,023', 
         '2010\t  \t14,505\t7,943\t6,562',             '2009\t \t14,632\t8,022\t6,610', 
         '2008\t  \t14,207\t7,989\t6,218',             '2007\t \t14,400\t8,085\t6,315', 
         '2006\t  \t14,750\t8,017\t6,733',             '2005\t \t14,497\t7,593\t6,904', 
         '2004\t  \t14,155\t7,150\t7,005',             '2003\t \t13,285\t6,457\t6,828', 
         '2002\t  \t12,821\t6,190\t6,631',             '2001\t \t12,702\t6,080\t6,622', 
         '2000\t  \t11,942\t5,985\t5,957',             '1999\t \t10,872\t5,824\t5,048', 
         '1998\t   \t10,362\t5,793\t4,569',            '1997\t \t9,546\t5,479\t4,067', 
         '1996\t  \t9,222\t5,418\t3,804',              '1995\t \t8,859\t5,363\t3,496', 
         '1994\t  \t8,203\t5,099\t3,104',              '1993\t \t7,766\t4,861\t2,905', 
         '1992\t  \t7,091\t4,520\t2,571',              '1991\t \t6,953\t4,526\t2,427', 
         '1990\t  \t6,632\t4,509\t2,123',              '1989\t \t5,929\t4,011\t1,918', 
         '1988\t  \t5,909\t4,080\t1,829']

# partition and clean the data
cleaned = [ [x.strip() for x in year.split("\t") if x.strip()] for year in data  ]
# make a dict
dataCleaned = {x:y for x,*y in cleaned}

print (dataCleaned)

import pandas as pd
df = pd.DataFrame(dataCleaned)

print(df)
输出:

# the dict 
{'2018': ['7,107', '4,394', '2,713'], '2017': ['16,478', '10,286', '6,192'], 
 '2016': ['15,944', '9,971', '5,973'], '2015': ['15,071', '9,079', '5,992'], 
 '2014': ['14,415', '8,596', '5,819'], '2013': ['14,259', '8,269', '5,990'], 
 '2012': ['14,010', '8,143', '5,867'], '2011': ['14,149', '8,126', '6,023'], 
 '2010': ['14,505', '7,943', '6,562'], '2009': ['14,632', '8,022', '6,610'], 
 '2008': ['14,207', '7,989', '6,218'], '2007': ['14,400', '8,085', '6,315'], 
 '2006': ['14,750', '8,017', '6,733'], '2005': ['14,497', '7,593', '6,904'], 
 '2004': ['14,155', '7,150', '7,005'], '2003': ['13,285', '6,457', '6,828'], 
 '2002': ['12,821', '6,190', '6,631'], '2001': ['12,702', '6,080', '6,622'], 
 '2000': ['11,942', '5,985', '5,957'], '1999': ['10,872', '5,824', '5,048'], 
 '1998': ['10,362', '5,793', '4,569'], '1997': ['9,546', '5,479', '4,067'], 
 '1996': ['9,222', '5,418', '3,804'], '1995': ['8,859', '5,363', '3,496'], 
 '1994': ['8,203', '5,099', '3,104'], '1993': ['7,766', '4,861', '2,905'], 
 '1992': ['7,091', '4,520', '2,571'], '1991': ['6,953', '4,526', '2,427'], 
 '1990': ['6,632', '4,509', '2,123'], '1989': ['5,929', '4,011', '1,918'], 
 '1988': ['5,909', '4,080', '1,829']
}


编辑后:

import pandas as pd

data = ['2018\t  \t7,107\t4,394\t2,713', '2017\t \t16,478\t10,286\t6,192', 
        '2016\t  \t15,944\t9,971\t5,973', '2015\t \t15,071\t9,079\t5,992', 
        '2014\t  \t14,415\t8,596\t5,819', '2013\t \t14,259\t8,269\t5,990', 
        '2012\t  \t14,010\t8,143\t5,867', '2011\t \t14,149\t8,126\t6,023', 
        '2010\t  \t14,505\t7,943\t6,562', '2009\t \t14,632\t8,022\t6,610', 
        '2008\t  \t14,207\t7,989\t6,218', '2007\t \t14,400\t8,085\t6,315', 
        '2006\t  \t14,750\t8,017\t6,733', '2005\t \t14,497\t7,593\t6,904', 
        '2004\t  \t14,155\t7,150\t7,005', '2003\t \t13,285\t6,457\t6,828', 
        '2002\t  \t12,821\t6,190\t6,631', '2001\t \t12,702\t6,080\t6,622', 
        '2000\t  \t11,942\t5,985\t5,957', '1999\t \t10,872\t5,824\t5,048', 
        '2018\t   \t10,362\t5,793\t4,569', '2017\t \t9,546\t5,479\t4,067', 
        '2016\t  \t9,222\t5,418\t3,804', '2015\t \t8,859\t5,363\t3,496', 
        '2014\t  \t8,203\t5,099\t3,104', '2013\t \t7,766\t4,861\t2,905', 
        '2012\t  \t7,091\t4,520\t2,571', '2011\t \t6,953\t4,526\t2,427', 
        '2010\t  \t6,632\t4,509\t2,123', '2009\t \t5,929\t4,011\t1,918', 
        '2008\t  \t5,909\t4,080\t1,829']

# partition and clean the data
cleaned = [ [x.strip() for x in year.split("\t") if x.strip()] for year in data  ]

import pandas as pd
df = pd.DataFrame(cleaned,columns=['year', 'data1', 'data2', 'data3'])

print(df)
编辑后的输出:

    year   data1   data2  data3
0   2018   7,107   4,394  2,713
1   2017  16,478  10,286  6,192
2   2016  15,944   9,971  5,973
3   2015  15,071   9,079  5,992
4   2014  14,415   8,596  5,819
5   2013  14,259   8,269  5,990
6   2012  14,010   8,143  5,867
7   2011  14,149   8,126  6,023
8   2010  14,505   7,943  6,562
9   2009  14,632   8,022  6,610
10  2008  14,207   7,989  6,218
11  2007  14,400   8,085  6,315
12  2006  14,750   8,017  6,733
13  2005  14,497   7,593  6,904
14  2004  14,155   7,150  7,005
15  2003  13,285   6,457  6,828
16  2002  12,821   6,190  6,631
17  2001  12,702   6,080  6,622
18  2000  11,942   5,985  5,957
19  1999  10,872   5,824  5,048
20  2018  10,362   5,793  4,569
21  2017   9,546   5,479  4,067
22  2016   9,222   5,418  3,804
23  2015   8,859   5,363  3,496
24  2014   8,203   5,099  3,104
25  2013   7,766   4,861  2,905
26  2012   7,091   4,520  2,571
27  2011   6,953   4,526  2,427
28  2010   6,632   4,509  2,123
29  2009   5,929   4,011  1,918
30  2008   5,909   4,080  1,829 

编辑:

与以下内容大致相同:

alsoCleaned = []
for year in data:
    part = []    # collect all parts of one string
    for x in year.split("\t"):  # split the one string
        partCleaned = x.strip()   # remove whitespaces from x
        if partCleaned :          # only if now got content
            part.append(partCleaned) # add to part
    alsoCleaned.append(part)    # done all parts  so add to big list
    part = []

print(alsoCleaned)
==>



数据帧。。。如:pandas.dataframe?为什么要将其写入文件?每个项目中有4个值(
'2018\t\t7107\t4394\t2713'
),为什么数据应该是年、数据1、数据2(3列)?我想最后将其放入数据框中。我也试着把它放进一个文件,然后把它读入dataframe,但失败了。有四列,year,data1,data2,data3.dataframe。。。如:pandas.dataframe?为什么要将其写入文件?每个项目中有4个值(
'2018\t\t7107\t4394\t2713'
),为什么数据应该是年、数据1、数据2(3列)?我想最后将其放入数据框中。我还尝试将其放入一个文件,然后将其读入数据框,但失败了。共有四列,year、data1、data2、data3。正则表达式将每个
['2018\t\t7107\t4394\t2713',…]
分区为
[[2018'、'7107'、'4394'、'2713',…]
当输入数据帧时,您的dict理解能力在不断提高?我有几个1988-2018年的组,但最后代码只保留了1988-2018年的最后一组,其他组丢失了。我理解解析所有数字的意义。不理解代码的最后一行“df=pd.DataFrame({i[0]:i[1:]for i in parsed})”。也许这会使代码出错。@zilong,你1988-2018年的几组数据是什么意思(数据是具体的)?你应该先了解一下
dict
理解,然后再考虑是否有什么东西使代码出错,这意味着一个面板数据。当有新数据时,dictionary的值可能会不断变化,从而最终保持最后一个时间序列。是的,我的错,我应该在问题中的数据样本中放一个短面板。正则表达式将每个
['2018\t\t7107\t4394\t2713',…]
划分为
['2018','7107','4394','2713',…]
,在输入数据帧时,您的dict理解是动态的?我有几个1988-2018年的组,但最终代码只保留了1988-2018年的最后一组,而失去了其他组。我理解解析所有数字的意义。不理解代码的最后一行“df=pd.DataFrame({i[0]:i[1:]for i in parsed})”。也许这会使代码出错。@zilong,你1988-2018年的几组数据是什么意思(数据是具体的)?你应该先了解一下
dict
理解,然后再考虑是否有什么东西使代码出错,这意味着一个面板数据。当有新数据时,dictionary的值可能会不断变化,从而最终保持最后一个时间序列。是的,我的错,我应该在问题中的数据样本中放一个短面板。我的真实数据很长,有几组1988-2018年的数据,代码最后只有一组1988-2018年的数据,而丢失了其他数据。@zilong你为什么在问题中准备和发布不符合真实数据的数据?已编辑-有关固定列名,请参阅其他答案。抱歉。我的数据太长了,我必须在这里复制一部分数据来缩短问题。然而,我没有在第一时间保持数据的代表性。非常感谢你。问题解决了。我仍然无法理解这行重要的代码。我知道strp()使用空格和split()拆分\t。“数据中的年份”、“年份中的x”是什么意思?或者我应该学习什么来理解这一点(可能是一些网页或资源)。cleaned=[[x.strip()表示年份中的x.split(“\t”)如果x.strip()]表示数据中的年份]@zilong-这称为列表理解。你得倒着读<代码>数据是一个列表<数据中年份的代码>表示<代码>年份将依次取下数据的每个值<代码>[x.strip()表示年份中的x.split(“\t”),如果x.strip()][/code>表示-在“|”处拆分
年份(给出另一个列表)和
表示年份中的x.split(“|”)
表示x将依次取每个拆分部分的值。”x、 strip()'删除空白,如果x.strip()
只匹配剥离后不为空的x(从结果中删除
'\t\t'
空白).我的真实数据很长,有几组1988-2018年的数据,代码最后只有一组1988-2018年的数据,其他数据都丢失了。@zilong你为什么在你的问题中准备和发布不符合你真实数据的数据?已编辑-有关固定列名,请参阅其他答案。抱歉。我的数据太长了,我必须在这里复制一部分数据来缩短问题。然而,我没有在第一时间保持数据的代表性。非常感谢你。问题解决了。我仍然无法理解这行重要的代码。我知道strp()使用空格和split()拆分\t。“数据中的年份”、“年份中的x”是什么