Python 矢量化方法,将一列整数格式化为pandas数据帧和dask数据帧中指定长度的字符串
我有一个数据框:Python 矢量化方法,将一列整数格式化为pandas数据帧和dask数据帧中指定长度的字符串,python,pandas,dask,Python,Pandas,Dask,我有一个数据框: date time user_id 0 20160921 5947 13079492369730773513 1 20160921 5948 13079492369730773513 2 20160921 235949 13079492369730773513 3 20160921 235950 13079492369730773513 4 20160921 235951 13079492369730
date time user_id
0 20160921 5947 13079492369730773513
1 20160921 5948 13079492369730773513
2 20160921 235949 13079492369730773513
3 20160921 235950 13079492369730773513
4 20160921 235951 13079492369730773513
我想将“时间”列格式化为:
date time user_id
0 20160921 005947 13079492369730773513
1 20160921 005948 13079492369730773513
2 20160921 235949 13079492369730773513
3 20160921 235950 13079492369730773513
4 20160921 235951 13079492369730773513
我知道列表的理解方式:
df['time'] = ["%06d" % t for t in df['time'].tolist()]
In [5]: %timeit df.time.astype(str).str.zfill(6)
228 µs ± 4.99 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [6]: %timeit ["%06d" % t for t in df['time'].tolist()]
17.5 µs ± 208 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
有没有矢量化的方法来做同样的把戏?如果是Dask数据帧,该如何执行此操作?是的,有一种矢量化方法可以执行相同的操作。可以将列强制转换为字符串,然后对其使用字符串方法:
df.time.astype(str).str.zfill(6)
0 005947
1 005948
2 235949
3 235950
4 235951
然后将其分配回:
df.time = df.time.astype(str).str.zfill(6)
这假定时间字符串的最大长度为6个字符
不幸的是,这比列表理解方式慢得多:
df['time'] = ["%06d" % t for t in df['time'].tolist()]
In [5]: %timeit df.time.astype(str).str.zfill(6)
228 µs ± 4.99 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [6]: %timeit ["%06d" % t for t in df['time'].tolist()]
17.5 µs ± 208 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)