Python 熊猫系列行动_Python_Pandas_Dataframe_Series

Python 熊猫系列行动

python pandas dataframe

Python 熊猫系列行动,python,pandas,dataframe,series,Python,Pandas,Dataframe,Series,假设两个系列 a=[1,2,3,4,5],， b=[60,7,80,9100] 我想创建一个新变量，其计算如下： C=a/b如果b>10，则a/b+1 我可以通过以下方式使用列表压缩来完成此操作： C=[a[i]\b[i]如果b[i]>10，则a[i]\b[i]+1代表范围内的i（len（b））] 我的问题如下：是否有其他方法（例如使用lambda、map、apply等）来避免for循环？（系列a、b、c也可以是pd.数据帧的一部分）第一个想法是按条件将值除以并添加1-将掩码转换为整数1和0

假设两个系列

a=[1,2,3,4,5]

,，

b=[60,7,80,9100]

我想创建一个新变量，其计算如下：

C=a/b如果b>10，则a/b+1

我可以通过以下方式使用列表压缩来完成此操作：

C=[a[i]\b[i]如果b[i]>10，则a[i]\b[i]+1代表范围内的i（len（b））]

我的问题如下：

是否有其他方法（例如使用lambda、map、apply等）来避免for循环？

（系列a、b、c也可以是pd.数据帧的一部分）

第一个想法是按条件将值除以并添加

-将掩码转换为整数

和

：

c  = a/b + (b <=10).astype(int)
#alternative
#c  = a/b + (~(b > 10)).astype(int)

如果想要除以2倍，也可以（在大数据中应该慢一点）

设置：

a = pd.Series([1,2,3,4,5])
b = pd.Series([60,7,80,9,100])

np.random.seed(2019)

a = pd.Series(np.random.randint(1,100, size=100000))
b = pd.Series(np.random.randint(1,100, size=100000))

In [322]: %timeit [a[i] /b[i] if b[i] > 10 else a[i] /b[i] +1 for i in range(len(b))]
3.08 s ± 84.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [323]: %timeit a/b + (b <=10).astype(int)
1.71 ms ± 44.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [324]: %timeit a/b + np.where(b > 10, 0, 1)
1.67 ms ± 66.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [325]: %timeit np.where(b >10, a/b, a/b +1)
2.7 ms ± 13.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [326]: %timeit pd.Series(np.where(b >10, a/b, a/b +1), index=a.index)
2.74 ms ± 21.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

性能：

a = pd.Series([1,2,3,4,5])
b = pd.Series([60,7,80,9,100])

np.random.seed(2019)

a = pd.Series(np.random.randint(1,100, size=100000))
b = pd.Series(np.random.randint(1,100, size=100000))

In [322]: %timeit [a[i] /b[i] if b[i] > 10 else a[i] /b[i] +1 for i in range(len(b))]
3.08 s ± 84.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [323]: %timeit a/b + (b <=10).astype(int)
1.71 ms ± 44.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [324]: %timeit a/b + np.where(b > 10, 0, 1)
1.67 ms ± 66.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [325]: %timeit np.where(b >10, a/b, a/b +1)
2.7 ms ± 13.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [326]: %timeit pd.Series(np.where(b >10, a/b, a/b +1), index=a.index)
2.74 ms ± 21.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

np.random.seed（2019年）
a=pd系列（np.random.randint（1100，尺寸=100000））
b=pd系列（np.random.randint（1100，尺寸=100000））
在[322]：%timeit[a[i]/b[i]如果b[i]>10，则a[i]/b[i]+1表示范围内的i（len（b））]
每个回路3.08 s±84.1 ms（7次运行的平均值±标准偏差，每个回路1次）
在[323]中：%timeit a/b+（b 10,0,1）
每个回路1.67 ms±66.6µs（7次运行的平均值±标准偏差，每个1000个回路）
在[325]中：%timeit np.where（b>10，a/b，a/b+1）
每个回路2.7 ms±13.8µs（7次运行的平均值±标准偏差，每个100个回路）
在[326]中：%timeit pd.系列（np.式中（b>10，a/b，a/b+1），index=a.index）
每个回路2.74 ms±21.1µs（7次运行的平均值±标准偏差，每个100个回路）

2号和3号提案似乎效果不错。第一个的问题是它链接到spesific示例，您在其中添加了1。为了推广它，我想知道在给定某个约束的两个级数之间进行给定算术运算的任何替代方法。我理解您与np.where的建议在这里有效，因为没有“elif”声明。

np.random.seed(2019)

a = pd.Series(np.random.randint(1,100, size=100000))
b = pd.Series(np.random.randint(1,100, size=100000))

In [322]: %timeit [a[i] /b[i] if b[i] > 10 else a[i] /b[i] +1 for i in range(len(b))]
3.08 s ± 84.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [323]: %timeit a/b + (b <=10).astype(int)
1.71 ms ± 44.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [324]: %timeit a/b + np.where(b > 10, 0, 1)
1.67 ms ± 66.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [325]: %timeit np.where(b >10, a/b, a/b +1)
2.7 ms ± 13.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [326]: %timeit pd.Series(np.where(b >10, a/b, a/b +1), index=a.index)
2.74 ms ± 21.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)