使用Python'；从TXT文件解析DD-MM-YY-HH-MM-SS列；熊猫_Python_Datetime_Pandas

使用Python'；从TXT文件解析DD-MM-YY-HH-MM-SS列；熊猫

python datetime pandas

使用Python'；从TXT文件解析DD-MM-YY-HH-MM-SS列；熊猫,python,datetime,pandas,Python,Datetime,Pandas,提前感谢大家抽出时间。我有一些空格分隔的文本文件的格式 29 04 13 18 15 00 7.667 29 04 13 18 30 00 7.000 29 04 13 18 45 00 7.000 29 04 13 19 00 00 7.333 29 04 13 19 15 00 7.000 格式为DD MM YY HH MM SS和我的结果值。我正在尝试使用Python的pandas读取txt文件。在发布这个问题之前，我

提前感谢大家抽出时间。我有一些空格分隔的文本文件的格式

    29 04 13 18 15 00    7.667
    29 04 13 18 30 00    7.000
    29 04 13 18 45 00    7.000
    29 04 13 19 00 00    7.333
    29 04 13 19 15 00    7.000

格式为DD MM YY HH MM SS和我的结果值。我正在尝试使用Python的pandas读取txt文件。在发布这个问题之前，我已经做了大量的研究，所以我希望我没有涉及到被践踏的土地

根据反复试验和研究，我得出：

    import pandas as pd
    from cStringIO import StringIO
    def parse_all_fields(day_col, month_col, year_col, hour_col, minute_col,second_col):
    day_col = _maybe_cast(day_col)
    month_col = _maybe_cast(month_col)
    year_col = _maybe_cast(year_col)
    hour_col = _maybe_cast(hour_col)
    minute_col = _maybe_cast(minute_col)
    second_col = _maybe_cast(second_col)
    return lib.try_parse_datetime_components(day_col, month_col, year_col, hour_col, minute_col, second_col)
    ##Read the .txt file
    data1 = pd.read_table('0132_3.TXT', sep='\s+', names=['Day','Month','Year','Hour','Min','Sec','Value'])
    data1[:10]

    Out[21]: 

    Day,Month,Year,Hour, Min, Sec, Value
    29 04 13 18 15 00    7.667
    29 04 13 18 30 00    7.000
    29 04 13 18 45 00    7.000
    29 04 13 19 00 00    7.333
    29 04 13 19 15 00    7.000

    data2 = pd.read_table(StringIO(data1), parse_dates={'datetime':['Day','Month','Year','Hour''Min','Sec']}, date_parser=parse_all_fields, dayfirst=True)

TypeError回溯（最近一次调用）
在（）
---->1 data2=pd.read_table（StringIO（data1），parse_dates={'datetime'：['Day'，'Month'，'Year'，'Hour''Min'，'Sec']}，date_parser=parse_all_fields，dayfirst=True）
TypeError:应为读取缓冲区，找到数据帧

在这一点上，我被卡住了。首先，预期的读取缓冲区错误使我困惑。我是否需要对.txt文件进行更多的预处理以将日期转换为可读格式？注意-read_table的parse_函数在此日期格式上不能单独工作

我是一个初学者，正在努力学习。抱歉，如果代码错误/基本/混乱。如果有人能帮忙，我将不胜感激。非常感谢。

我认为在阅读csv时解析日期会更容易：

In [1]: df = pd.read_csv('0132_3.TXT', header=None, sep='\s+\s', parse_dates=[[0]])

In [2]: df
Out[2]:
                    0      1
0 2013-04-29 00:00:00  7.667
1 2013-04-29 00:00:00  7.000
2 2013-04-29 00:00:00  7.000
3 2013-04-29 00:00:00  7.333
4 2013-04-29 00:00:00  7.000

由于您使用的是不寻常的日期格式，因此还需要指定一个日期解析器：

In [11]: def date_parser(ss):
             day, month, year, hour, min, sec = ss.split()
             return pd.Timestamp('20%s-%s-%s %s:%s:%s' % (year, month, day, hour, min, sec))

In [12]: df = pd.read_csv('0132_3.TXT', header=None, sep='\s+\s', parse_dates=[[0]], date_parser=date_parser)

In [13]: df
Out[13]:
                    0      1
0 2013-04-29 18:15:00  7.667
1 2013-04-29 18:30:00  7.000
2 2013-04-29 18:45:00  7.000
3 2013-04-29 19:00:00  7.333
4 2013-04-29 19:15:00  7.000

安迪，非常感谢你——我知道你做了什么——而且效果非常好。

In [11]: def date_parser(ss):
             day, month, year, hour, min, sec = ss.split()
             return pd.Timestamp('20%s-%s-%s %s:%s:%s' % (year, month, day, hour, min, sec))

In [12]: df = pd.read_csv('0132_3.TXT', header=None, sep='\s+\s', parse_dates=[[0]], date_parser=date_parser)

In [13]: df
Out[13]:
                    0      1
0 2013-04-29 18:15:00  7.667
1 2013-04-29 18:30:00  7.000
2 2013-04-29 18:45:00  7.000
3 2013-04-29 19:00:00  7.333
4 2013-04-29 19:15:00  7.000