Python 将函数应用于列表的numpy数组中的每个列表_Python_Numpy

Python 将函数应用于列表的numpy数组中的每个列表

python numpy

Python 将函数应用于列表的numpy数组中的每个列表,python,numpy,Python,Numpy,我有一个接受（字符串）列表的函数。它对该列表进行一些处理，并返回另一个字符串列表，可能长度较短现在，我有一个字符串输入列表的numpy数组。我想将此转换函数应用于数组中的每个列表从我迄今为止所做的搜索来看，它似乎是或可能是好的候选人，但两者都没有达到预期的效果我希望尽可能高效地完成这项工作。最终，输入数组将包含大约100K个列表我想我可以在for循环中迭代numpy数组，然后append将每个输出列表一次添加到一个新的输出数组中，但这似乎效率极低这是我试过的。出于测试目的，我制作了一个

我有一个接受（字符串）列表的函数。它对该列表进行一些处理，并返回另一个字符串列表，可能长度较短

现在，我有一个字符串输入列表的numpy数组。我想将此转换函数应用于数组中的每个列表

从我迄今为止所做的搜索来看，它似乎是或可能是好的候选人，但两者都没有达到预期的效果

我希望尽可能高效地完成这项工作。最终，输入数组将包含大约100K个列表

我想我可以在

for

循环中迭代numpy数组，然后

append

将每个输出列表一次添加到一个新的输出数组中，但这似乎效率极低

这是我试过的。出于测试目的，我制作了一个简化的转换函数，输入数组只包含3个列表

def my_func(l):
    # accepts list, returns another list
    # dumbed down list transformation function
    # for testing, just return the first 2 elems of original list
    return l[0:2]

test_arr = np.array([['the', 'quick', 'brown', 'fox'], ['lorem', 'ipsum'], ['this', 'is', 'a', 'test']])

np.apply_along_axis(my_func, 0, test_arr)
Out[51]: array([['the', 'quick', 'brown', 'fox'], ['lorem', 'ipsum']], dtype=object)

# Rather than applying item by item, this returns the first 2 elements of the entire outer array!!

# Expected:
# array([['the', 'quick'], ['lorem', 'ipsum'], ['this', 'is']])

# Attempt 2...

my_func_vec = np.vectorize(my_func)
my_func_vec(test_arr)

结果:

Traceback (most recent call last):

  File "<ipython-input-56-f9bbacee645c>", line 1, in <module>
    my_func_vec(test_arr)

  File "C:\Users\Tony\Anaconda2\lib\site-packages\numpy\lib\function_base.py", line 2218, in __call__
    return self._vectorize_call(func=func, args=vargs)

  File "C:\Users\Tony\Anaconda2\lib\site-packages\numpy\lib\function_base.py", line 2291, in _vectorize_call
    copy=False, subok=True, dtype=otypes[0])

ValueError: cannot set an array element with a sequence

回溯（最近一次呼叫最后一次）：
文件“”，第1行，在
我的功能向量（测试向量）
文件“C:\Users\Tony\Anaconda2\lib\site packages\numpy\lib\function\u base.py”，第2218行，在调用中__
返回self.\u矢量化\u调用（func=func，args=vargs）
文件“C:\Users\Tony\Anaconda2\lib\site packages\numpy\lib\function\u base.py”，第2291行，在矢量化调用中
copy=False，subok=True，dtype=otypes[0]）
ValueError:无法使用序列设置数组元素

从

矢量化的docstring中，它读取可选参数otypes

otypes : str or list of dtypes, optional
    The output data type. It must be specified as either a string of
    typecode characters or a list of data type specifiers. There should
    be one data type specifier for each output.

它允许您创建具有复杂输出的结构化数组，但也解决了将列表作为数组元素的问题
my_func\u vec=np.vectorize（my_func，otypes=[list]）
从vectorize
的docstring中读取可选参数otypes

otypes : str or list of dtypes, optional
    The output data type. It must be specified as either a string of
    typecode characters or a list of data type specifiers. There should
    be one data type specifier for each output.

它允许您创建具有复杂输出的结构化数组，但也解决了将列表作为数组元素的问题
my\u func\u vec=np.vectorize（my\u func，otypes=[list]）

您需要降低一级，您的解决方案只输出数组的前两项，而不是数组中每个项的前两项
您需要降低一级，您的解决方案只输出数组的前两项，而不是数组中每个项的前两项。
一些比较和时间测试；但请记住，这只是一个小例子
In [106]: test_arr = np.array([['the', 'quick', 'brown', 'fox'], ['lorem', 'ipsum'], ['this', 'is', 'a', 'test']])
     ...: 
In [107]: def my_func(l):
     ...:     # accepts list, returns another list
     ...:     # dumbed down list transformation function
     ...:     # for testing, just return the first 2 elems of original list
     ...:     return l[0:2]
     ...: 

list comprehension方法返回一个2d字符串数组，因为该函数每次返回2个元素列表
In [108]: np.array([my_func(x) for x in test_arr])
Out[108]: 
array([['the', 'quick'],
       ['lorem', 'ipsum'],
       ['this', 'is']],
      dtype='<U5')

frompyfunc
返回一个对象数据类型数组；与我过去的测试一致，它稍微快一些（2倍，但从来不是一个数量级）
vectorize
使用frompyfunc
，但开销更大。需要使用otypes
来避免序列
错误（否则它会尝试从试算中推断返回类型）：
一些比较和时间测试；但请记住，这只是一个小例子
In [106]: test_arr = np.array([['the', 'quick', 'brown', 'fox'], ['lorem', 'ipsum'], ['this', 'is', 'a', 'test']])
     ...: 
In [107]: def my_func(l):
     ...:     # accepts list, returns another list
     ...:     # dumbed down list transformation function
     ...:     # for testing, just return the first 2 elems of original list
     ...:     return l[0:2]
     ...: 

list comprehension方法返回一个2d字符串数组，因为该函数每次返回2个元素列表
In [108]: np.array([my_func(x) for x in test_arr])
Out[108]: 
array([['the', 'quick'],
       ['lorem', 'ipsum'],
       ['this', 'is']],
      dtype='<U5')

frompyfunc
返回一个对象数据类型数组；与我过去的测试一致，它稍微快一些（2倍，但从来不是一个数量级）
vectorize
使用frompyfunc
，但开销更大。需要使用otypes
来避免序列
错误（否则它会尝试从试算中推断返回类型）：
numpy面向高效处理数值阵列。我会仔细检查你所做的矢量化是否比@Wli@IgnacioVergaraKausel人们更喜欢复杂的答案：）numpy面向高效处理数字数组。我会仔细检查你所做的矢量化是否比@Wli@IgnacioVergaraKausel人们更喜欢复杂的答案：）这已经讨论过很多次了。无论是矢量化
还是应用…
都不会提高效率
。他们仍然需要在每个列表上调用您的函数，并将结果累积到数组或列表中。总的来说，运行函数的速度会很慢，而不是迭代框架。在过去的测试中，我发现np.frompyfunc
是一个很好的对象数组迭代工具。实际上，我的想法与此完全相同，但决定回答这个问题，而不是告诉我如何去做。但后来我测试了它，出于某种原因，矢量化比列表理解或map（）函数快10倍。。。知道为什么吗？这已经讨论过很多次了。无论是矢量化
还是应用…
都不会提高效率
。他们仍然需要在每个列表上调用您的函数，并将结果累积到数组或列表中。总的来说，运行函数的速度会很慢，而不是迭代框架。在过去的测试中，我发现np.frompyfunc是一个很好的对象数组迭代工具。实际上，我的想法与此完全相同，但决定回答这个问题，而不是告诉我如何去做。但后来我测试了它，出于某种原因，矢量化比列表理解或map（）函数快10倍。。。知道为什么吗？很好，尽管我想说，对于纯python解决方案来说，将其转换为numpy数组或将输入作为numpy数组是没有意义的。正如我所说的，我以相同的方式对其进行了测试，使用矢量化比列表理解快10倍。但我用了一个“很长”的测试。。。类似于100kNice，尽管我认为对于纯python解决方案，将其转换为numpy数组或将输入作为numpy数组是没有意义的。正如我所说的，我以相同的方式对其进行了测试，使用矢量化比列表理解快10倍。但我用了一个“很长”的测试。。。s
In [113]: np.vectorize(my_func,otypes=[object])(test_arr)
Out[113]: 
array([list(['the', 'quick']), list(['lorem', 'ipsum']),
       list(['this', 'is'])], dtype=object)
In [114]: timeit np.vectorize(my_func,otypes=[object])(test_arr)
30.4 µs ± 132 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)