Python 读取文件时使用lambda函数将日期转换为时间戳_Python_Csv_Pandas_Lambda_Timestamp

Python 读取文件时使用lambda函数将日期转换为时间戳

python csv pandas lambda

Python 读取文件时使用lambda函数将日期转换为时间戳,python,csv,pandas,lambda,timestamp,Python,Csv,Pandas,Lambda,Timestamp,我正在读取csv文件，该文件包含以下格式的日期： date 01/05/2014 01/05/2014 01/05/2014 01/05/2014 01/05/2014 01/05/2014 01/05/2014 01/05/2014 01/05/2014 我不能在字符串格式中使用这样的日期，我需要将其转换为数字时间戳所以我写了这段代码： Train = pd.read_csv("train.tsv", sep='\t') Train['timestamp'] = pd.to_dateti

我正在读取csv文件，该文件包含以下格式的日期：

date
01/05/2014
01/05/2014
01/05/2014
01/05/2014
01/05/2014
01/05/2014
01/05/2014
01/05/2014
01/05/2014

我不能在字符串格式中使用这样的日期，我需要将其转换为数字时间戳

所以我写了这段代码：

Train = pd.read_csv("train.tsv", sep='\t') 
Train['timestamp'] = pd.to_datetime(Train['date']).apply(lambda a: a.timestamp())

这给了我：

Train['timestamp']=pd.to_datetime（Train['date']）.apply（lambda:a.timestamp（））
AttributeError:“Timestamp”对象没有属性“Timestamp”

你能告诉我在lambda的时间戳吗

编辑代码：

Train = pd.read_csv("data_scientist_assignment.tsv", sep='\t', parse_dates=['date'])
#print df.head()
# Train['timestamp'] = pd.to_datetime(Train['date']).apply(lambda a: a.timestamp())
Train['timestamp'] = Train.date.values.astype(np.int64)
x1=["timestamp", "hr_of_day"]
test=pd.read_csv("test.csv")
print(Train.columns)
print(test.columns)
model = LogisticRegression()
model.fit(Train[x1], Train["vals"])
print(model)
print model.score(Train[x1], Train["vals"])

您需要将参数

parse_dates

添加到，并将列名转换为

datetime

：

import pandas as pd
import io

temp=u"""date
01/05/2014
01/05/2014
01/05/2014
01/05/2014
01/05/2014
01/05/2014
01/05/2014
01/05/2014
01/05/2014"""
#after testing replace io.StringIO(temp) to filename
df = pd.read_csv(io.StringIO(temp), sep='\t', parse_dates=['date'])

print (df)
        date
0 2014-01-05
1 2014-01-05
2 2014-01-05
3 2014-01-05
4 2014-01-05
5 2014-01-05
6 2014-01-05
7 2014-01-05
8 2014-01-05

print (df.dtypes)
date    datetime64[ns]
dtype: object

另一种解决方案是为列

date

的顺序添加数字-在示例中，它是第一列，所以添加

（python从

计数）：

然后需要将列转换为

numpy数组

by并转换为

int

：

#unix time in ns
df.date = df.date.values.astype(np.int64)
print (df)
                  date
0  1388880000000000000
1  1388880000000000000
2  1388880000000000000
3  1388880000000000000
4  1388880000000000000
5  1388880000000000000
6  1388880000000000000
7  1388880000000000000
8  1388880000000000000

#unix time in us
df.date = df.date.values.astype(np.int64) // 1000
print (df)
               date
0  1388880000000000
1  1388880000000000
2  1388880000000000
3  1388880000000000
4  1388880000000000
5  1388880000000000
6  1388880000000000
7  1388880000000000
8  1388880000000000

另一个简单的方法是使用：

此外，您还可以获得如下所示的秒数：

In [232]: df['date'].astype(pd.np.int64) // 10**9
Out[232]: 
0    1388880000
1    1388880000
2    1388880000
3    1388880000
4    1388880000
5    1388880000
6    1388880000
7    1388880000
8    1388880000
Name: date, dtype: int64

谢谢，但我的下一个操作仍然失败，因为我说

“Timestamp”对象没有属性“Timestamp”

是的，我添加了解决方案，请检查它。

#unix time in ms
df.date = df.date.values.astype(np.int64) // 1000000
#df.date = pd.to_datetime(df.date, unit='ms')
print (df)
            date
0  1388880000000
1  1388880000000
2  1388880000000
3  1388880000000
4  1388880000000
5  1388880000000
6  1388880000000
7  1388880000000
8  1388880000000

#unix time in s
df.date = df.date.values.astype(np.int64) // 1000000000
print (df)
         date
0  1388880000
1  1388880000
2  1388880000
3  1388880000
4  1388880000
5  1388880000
6  1388880000
7  1388880000
8  1388880000

In [209]: df['date']
Out[209]: 
0    01/05/2014
1    01/05/2014
2    01/05/2014
3    01/05/2014
4    01/05/2014
5    01/05/2014
6    01/05/2014
7    01/05/2014
8    01/05/2014
Name: date, dtype: object

In [210]: df['date'] = pd.to_datetime(df['date'])

In [211]: df['date']
Out[211]: 
0   2014-01-05
1   2014-01-05
2   2014-01-05
3   2014-01-05
4   2014-01-05
5   2014-01-05
6   2014-01-05
7   2014-01-05
8   2014-01-05
Name: date, dtype: datetime64[ns]

In [232]: df['date'].astype(pd.np.int64) // 10**9
Out[232]: 
0    1388880000
1    1388880000
2    1388880000
3    1388880000
4    1388880000
5    1388880000
6    1388880000
7    1388880000
8    1388880000
Name: date, dtype: int64