Python 如何确定dataframe列中列表的长度_Python_Pandas

Python 如何确定dataframe列中列表的长度

python pandas

Python 如何确定dataframe列中列表的长度,python,pandas,Python,Pandas,如何在没有迭代的情况下确定列中列表的长度我有这样一个数据帧： CreationDate 2013-12-22 15:25:02 [ubuntu, mac-osx, syslinux] 2009-12-14 14:29:32 [ubuntu, mod-rewrite, laconica, apache-2.2] 2013-12-22 15:42:00

如何在没有迭代的情况下确定列中列表的长度

我有这样一个数据帧：

                                                    CreationDate
2013-12-22 15:25:02                  [ubuntu, mac-osx, syslinux]
2009-12-14 14:29:32  [ubuntu, mod-rewrite, laconica, apache-2.2]
2013-12-22 15:42:00               [ubuntu, nat, squid, mikrotik]

df['Length'] = df.CreationDate.apply(lambda x: len(x))

我正在计算

CreationDate

列中列表的长度，并创建一个新的

length

列，如下所示：

                                                    CreationDate
2013-12-22 15:25:02                  [ubuntu, mac-osx, syslinux]
2009-12-14 14:29:32  [ubuntu, mod-rewrite, laconica, apache-2.2]
2013-12-22 15:42:00               [ubuntu, nat, squid, mikrotik]

df['Length'] = df.CreationDate.apply(lambda x: len(x))

这就给了我：

                                                    CreationDate  Length
2013-12-22 15:25:02                  [ubuntu, mac-osx, syslinux]       3
2009-12-14 14:29:32  [ubuntu, mod-rewrite, laconica, apache-2.2]       4
2013-12-22 15:42:00               [ubuntu, nat, squid, mikrotik]       4

有没有比python更合适的方法来实现这一点？

您也可以使用

str

访问器来执行一些列表操作。在这个例子中

df['CreationDate'].str.len()

返回每个列表的长度。有关详细信息，请参阅文档

对于这些操作，普通Python通常更快。不过熊猫可以对付南人。以下是时间安排：

ser = pd.Series([random.sample(string.ascii_letters, 
                               random.randint(1, 20)) for _ in range(10**6)])

%timeit ser.apply(lambda x: len(x))
1 loop, best of 3: 425 ms per loop

%timeit ser.str.len()
1 loop, best of 3: 248 ms per loop

%timeit [len(x) for x in ser]
10 loops, best of 3: 84 ms per loop

%timeit pd.Series([len(x) for x in ser], index=ser.index)
1 loop, best of 3: 236 ms per loop

pandas.Series.map（len）

和

pandas.Series.apply（len）

在执行时间上是等效的，并且略快于

pandas.Series.str.len（）


将熊猫作为pd导入
数据={'os'：[['ubuntu'，'MacOSX'，'syslinux']，['ubuntu'，'mod rewrite'，'laconica'，'apache-2.2']，['ubuntu'，'nat'，'squid'，'mikrotik']}
索引=['2013-12-22 15:25:02'，'2009-12-14 14:29:32'，'2013-12-22 15:42:00']
df=pd.DataFrame（数据，索引）
#创建长度列
df['Length']=df.os.map（len）
#显示（df）
os长度
2013-12-22 15:25:02[ubuntu、MacOSX、syslinux]3
2009-12-14 14:29:32[ubuntu，mod rewrite，laconica，apache-2.2]4
2013-12-22 15:42:00[ubuntu、nat、squid、mikrotik]4

%timeit
将熊猫作为pd导入
随机输入
导入字符串
随机种子（365）
ser=pd.系列（[random.sample（string.ascii_字母，random.randint（1,20））表示范围（10**6）]）
%timeit ser.str.len（）
每个回路252 ms±12.8 ms（7次运行的平均值±标准偏差，每个回路1次）
%时间序列图（len）
每个回路220 ms±7.2 ms（7次运行的平均值±标准偏差，每个回路1次）
%时间序列应用（len）
每个回路222 ms±8.31 ms（7次运行的平均值±标准偏差，每个回路1次）