Numpy Cython键入字符串列表_Numpy_Cython_Cythonize

Numpy Cython键入字符串列表

numpy

Numpy Cython键入字符串列表,numpy,cython,cythonize,Numpy,Cython,Cythonize,我试图使用cython来提高循环的性能，但我正在运行关于声明输入类型的一些问题如何在我的类型化结构中包含一个可以 “前”或“后” 我有一个np.recarray如下所示（注意重新排列作为编译时未知）以及字符串列表和时间戳的输入 import pandas as pd ts = pd.Timestamp("2015-01-01") contracts = ["CLX16", "CLZ16"] 我正在尝试对以下循环进行Cythonization def ploop(weights, con

我试图使用cython来提高循环的性能，但我正在运行关于声明输入类型的一些问题

如何在我的类型化结构中包含一个可以 “前”或“后”

我有一个

np.recarray

如下所示（注意重新排列作为编译时未知）

以及字符串列表和时间戳的输入

import pandas as pd
ts = pd.Timestamp("2015-01-01")
contracts = ["CLX16", "CLZ16"]

我正在尝试对以下循环进行Cythonization

def ploop(weights, contracts, timestamp):
    cwts = []
    for gen_num, position, weighting in weights:
        if weighting != 0:
            if position == "front":
                cntrct_idx = gen_num
            elif position == "back":
                cntrct_idx = gen_num + 1
            else:
                raise ValueError("transition.columns must contain "
                                 "'front' or 'back'")
            cwts.append((gen_num, contracts[cntrct_idx], weighting, timestamp))
    return cwts

我的尝试包括在cython中键入作为结构的

权重

输入，在文件

struct_test.pyx

中，如下所示

import numpy as np
cimport numpy as np


cdef packed struct tstruct:
    np.int64_t gen_num
    char[5] position
    np.float64_t weighting


def cloop(tstruct[:] weights_array, contracts, timestamp):
    cdef tstruct weights
    cdef int i
    cdef int cntrct_idx

    cwts = []
    for k in xrange(len(weights_array)):
        w = weights_array[k]
        if w.weighting != 0:
            if w.position == "front":
                cntrct_idx = w.gen_num
            elif w.position == "back":
                cntrct_idx = w.gen_num + 1
            else:
                raise ValueError("transition.columns must contain "
                                 "'front' or 'back'")
            cwts.append((w.gen_num, contracts[cntrct_idx], w.weighting,
                         timestamp))
    return cwts

但我收到了运行时错误，我相信这与

char[5]位置

import pyximport
pyximport.install()
import struct_test

struct_test.cloop(weights, contracts, ts)

ValueError: Does not understand character buffer dtype format string ('w')

此外，我也有点不清楚如何输入

合同

当

timestamp

您的

ploop

（不带

timestamp

变量）产生：

In [226]: ploop(weights, contracts)
Out[226]: [(0, 'CLX16', 0.5), (0, 'CLZ16', 0.5), (1, 'CLZ16', 1.0)]

无回路的等效函数：

def ploopless(weights, contracts):
    arr_contracts = np.array(contracts) # to allow array indexing
    wgts1 = weights[weights['c']!=0]
    mask = wgts1['b']=='front'
    wgts1['b'][mask] = arr_contracts[wgts1['a'][mask]]
    mask = wgts1['b']=='back'
    wgts1['b'][mask] = arr_contracts[wgts1['a'][mask]+1]
    return wgts1.tolist()

In [250]: ploopless(weights, contracts)
Out[250]: [(0, 'CLX16', 0.5), (0, 'CLZ16', 0.5), (1, 'CLZ16', 1.0)]

我正在利用返回的元组列表与输入

weight

数组具有相同的（int，str，int）布局这一事实。所以我只是复制了

权重

并替换

字段的选定值

请注意，我在

掩码之前使用了字段选择索引。布尔掩码
生成一个副本，因此我们必须小心索引顺序
我猜无环阵列版本将在时间上与cloop
（在真实阵列上）竞争。cloop
中的字符串和列表操作可能会限制其加速
 我有限的经验是，复合数据类型和字符串很难在cython
中使用，并且速度提升有限。但乍一看，您的ploop
似乎可以使用numpy
数组方法编写，同时对所有权重进行操作。我可能稍后再试试。我想您使用的是Python3，它将np.str\uu
解释为unicode。如果您使用np.bytes\uu来代替，那么您的代码的简化版本对我来说是可行的。（我不会把这篇文章作为回答，因为我真的不想进入你关于合同和时间戳问题的第二部分）
def ploopless(weights, contracts):
    arr_contracts = np.array(contracts) # to allow array indexing
    wgts1 = weights[weights['c']!=0]
    mask = wgts1['b']=='front'
    wgts1['b'][mask] = arr_contracts[wgts1['a'][mask]]
    mask = wgts1['b']=='back'
    wgts1['b'][mask] = arr_contracts[wgts1['a'][mask]+1]
    return wgts1.tolist()

In [250]: ploopless(weights, contracts)
Out[250]: [(0, 'CLX16', 0.5), (0, 'CLZ16', 0.5), (1, 'CLZ16', 1.0)]