Python 为什么内置数组.fromlist（）比cython代码慢？_Python_Performance_Cython

Python 为什么内置数组.fromlist（）比cython代码慢？

python performance

Python 为什么内置数组.fromlist（）比cython代码慢？,python,performance,cython,Python,Performance,Cython,通常，将Python和C代码粘合在一起时，需要将Python列表转换为连续内存，例如array.array。这也不奇怪，这个转换步骤成为了瓶颈，所以我发现自己在Cython上做了一些愚蠢的事情，因为它比内置Python解决方案快例如，要将Python列表lst转换为int32连续内存，我知道有两种可能性： a=array.array('i', lst) 及但是，它们都比以下cython版本慢： %%cython import array from cpython cimport array

通常，将Python和C代码粘合在一起时，需要将Python列表转换为连续内存，例如

array.array

。这也不奇怪，这个转换步骤成为了瓶颈，所以我发现自己在Cython上做了一些愚蠢的事情，因为它比内置Python解决方案快

例如，要将Python列表

lst

转换为

int32

连续内存，我知道有两种可能性：

a=array.array('i', lst)

及

但是，它们都比以下cython版本慢：

%%cython
import array
from cpython cimport array
def array_from_list_iter(lst):
    cdef Py_ssize_t n=len(lst)
    cdef array.array res=array.array('i')
    cdef int cnt=0
    array.resize(res, n)  #preallocate memory
    for i in lst:
       res.data.as_ints[cnt]=i
       cnt+=1
    return res

我的计时结果显示（Linux、Python3.6，但Windows和/或Python2.7的结果非常相似），cython解决方案的速度大约快6倍：

Size       new_array   from_list  cython_iter    factor
1             284ns    347ns        176ns           1.6
10            599ns    621ns        209ns           2.9
10**2         3.7µs    3.5µs        578ns           6.1
10**3        38.5µs    32µs         4.3µs           7.4
10**4         343µs    316µs       40.4µs           7.8
10**5         3.5ms    3.4ms        481µs           7.1
10**6        34.1ms    31.5ms       5.0ms           6.3
10**7         353ms    316ms       53.3ms           5.9

在我对CPython了解有限的情况下，我想说，来自列表的

-解决方案使用了以下内容：

numpy解决方案比python版本慢大约2倍。

最大的区别似乎是实际的int拆箱。cython调用时使用的是CPython数组实现——至少我认为是通过几层宏

%%cython -a
from cpython cimport PyArg_Parse
def arg_parse(obj):
    cdef int i
    for _ in range(100000):
        PyArg_Parse(obj, "i;array item must be integer", &i)
    return i

def cython_parse(obj):
    cdef int i
    for _ in range(100000):
        i = obj
    return i

%timeit arg_parse(1)
# 2.52 ms ± 67.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit cython_parse(1)
# 299 µs ± 1.86 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

我怀疑如果您查看Cython输出，您会发现设置输出数组中的每个项的过程会绕过

self->ob_descr->setitem

函数调用，直接设置内存。如果数组实际上不是由

int

s组成，它将不正确地运行，但作为交换，它将通过多个指针（包括每次类型检查（以确定每次目标的宽度））替换函数调用，并使用直接内存分配。此外，迭代

列表

可能比重复调用

PyList\u GetItem

（执行重复的边界检查）稍快。您的

cython

代码只返回结果，但

python数组。数组

是

['append'，'buffer_info'，'byteswap'，'count'，'extend'，'fromfile'，'fromlist'，'fromstring'，'fromunicode'，'index'，'insert'，'itemsize'，'pop'，'read'，'remove'，'reverse'，'tofile'，'tolist'，'tostring'，'tounicode'，'typecode write']

@ShadowRanger cython直接写入内存，但也检查传递的python整数的范围是否正确。例如，

列表中的数组（[2**33]）

将产生

Python int太大，无法转换为C long

-错误。@ead:

setitem

函数必须做同样的事情，并且必须选择“太大”动态绑定。正如@ShadowRanger所说，Cython生成的C代码不使用

setitem

，而是在使用

Pyx\u PyInt\u As\u int

进行转换后，在数组的索引处直接分配内存，这是范围检查的来源。这是与

fromlist

C代码的唯一主要区别。Cython代码可以通过使用已知的迭代列表大小来避免对lst[：n]中的i进行边界检查，e的速度稍微加快了一些。Cython中的赋值看起来像

（\uPyx\uV\uRes->data.as\u ints[\uPyx\uV\uCNT]）=\uu pyx\u t\u 3；

谢谢，你是对的：当我在

arraymodule.c

中用

PyArg\u AsLong

替换

PyArg\u Parse

时，python的版本几乎和cython的一样快。通过查看cython的

\uu pyx\u PyInt\u as\u int

我可以理解为什么有人会坚持使用缓慢的

PyArg\u解析：

）

static PyObject *
array_array_fromlist(arrayobject *self, PyObject *list)
{
    Py_ssize_t n;

    if (!PyList_Check(list)) {
        PyErr_SetString(PyExc_TypeError, "arg must be list");
        return NULL;
    }
    n = PyList_Size(list);
    if (n > 0) {
        Py_ssize_t i, old_size;
        old_size = Py_SIZE(self);
        if (array_resize(self, old_size + n) == -1)
            return NULL;
        for (i = 0; i < n; i++) {
            PyObject *v = PyList_GetItem(list, i);
            if ((*self->ob_descr->setitem)(self,
                            Py_SIZE(self) - n + i, v) != 0) {
                array_resize(self, old_size);
                return NULL;
            }
        }
    }
    Py_RETURN_NONE;
}

import array
import numpy as np
for n in [1, 10,10**2, 10**3, 10**4, 10**5, 10**6, 10**7]:
    print ("N=",n)
    lst=list(range(n))
    print("python:")
    %timeit array.array('i', lst)
    print("python, from list:")
    %timeit a=array.array('i'); a.fromlist(lst)
    print("numpy:")
    %timeit np.array(lst, dtype=np.int32)
    print("cython_iter:")
    %timeit array_from_list_iter(lst)

%%cython -a
from cpython cimport PyArg_Parse
def arg_parse(obj):
    cdef int i
    for _ in range(100000):
        PyArg_Parse(obj, "i;array item must be integer", &i)
    return i

def cython_parse(obj):
    cdef int i
    for _ in range(100000):
        i = obj
    return i

%timeit arg_parse(1)
# 2.52 ms ± 67.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit cython_parse(1)
# 299 µs ± 1.86 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)