Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/337.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 矢量化方法,将一列整数格式化为pandas数据帧和dask数据帧中指定长度的字符串_Python_Pandas_Dask - Fatal编程技术网

Python 矢量化方法,将一列整数格式化为pandas数据帧和dask数据帧中指定长度的字符串

Python 矢量化方法,将一列整数格式化为pandas数据帧和dask数据帧中指定长度的字符串,python,pandas,dask,Python,Pandas,Dask,我有一个数据框: date time user_id 0 20160921 5947 13079492369730773513 1 20160921 5948 13079492369730773513 2 20160921 235949 13079492369730773513 3 20160921 235950 13079492369730773513 4 20160921 235951 13079492369730

我有一个数据框:

   date    time               user_id
0  20160921    5947  13079492369730773513
1  20160921    5948  13079492369730773513
2  20160921  235949  13079492369730773513
3  20160921  235950  13079492369730773513
4  20160921  235951  13079492369730773513
我想将“时间”列格式化为:

   date    time               user_id
0  20160921  005947  13079492369730773513
1  20160921  005948  13079492369730773513
2  20160921  235949  13079492369730773513
3  20160921  235950  13079492369730773513
4  20160921  235951  13079492369730773513
我知道列表的理解方式:

df['time'] = ["%06d" % t for t in df['time'].tolist()]
In [5]: %timeit df.time.astype(str).str.zfill(6)
228 µs ± 4.99 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [6]: %timeit ["%06d" % t for t in df['time'].tolist()]
17.5 µs ± 208 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

有没有矢量化的方法来做同样的把戏?如果是Dask数据帧,该如何执行此操作?

是的,有一种矢量化方法可以执行相同的操作。可以将列强制转换为字符串,然后对其使用字符串方法:

df.time.astype(str).str.zfill(6)
0    005947
1    005948
2    235949
3    235950
4    235951
然后将其分配回:

df.time = df.time.astype(str).str.zfill(6)
这假定时间字符串的最大长度为6个字符

不幸的是,这比列表理解方式慢得多:

df['time'] = ["%06d" % t for t in df['time'].tolist()]
In [5]: %timeit df.time.astype(str).str.zfill(6)
228 µs ± 4.99 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [6]: %timeit ["%06d" % t for t in df['time'].tolist()]
17.5 µs ± 208 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)