Python 高效地多处理字符串数组_Python_Arrays_String_Performance_Multiprocessing

Python 高效地多处理字符串数组

python arrays string performance

Python 高效地多处理字符串数组,python,arrays,string,performance,multiprocessing,Python,Arrays,String,Performance,Multiprocessing,我有一个字符串数组需要处理。由于字符串可以独立处理，因此我并行处理： import multiprocessing import numpy as np def func(x): ls = ["this", "is"] return [i.upper() for i in x.split(' ') if i not in ls] arr = np.asarray(["this is a test", "this is not a test", "see my good exa

我有一个字符串数组需要处理。由于字符串可以独立处理，因此我并行处理：

import multiprocessing
import numpy as np

def func(x):
    ls = ["this", "is"]
    return [i.upper() for i in x.split(' ') if i not in ls]

arr = np.asarray(["this is a test", "this is not a test", "see my good example"])
pool = multiprocessing.Pool(processes=2)
tst = pool.map(func, arr)
pool.close()

我的问题是：在减少内存使用和CPU时间方面，有什么明显的方法可以改进我的代码吗？比如

在
```
func
```
中使用numpy数组
使用Python列表而不是numpy数组

您可以使用numpy对整个执行进行矢量化。这比本机Python实现快得多

import numpy as np
import functools


def func(x):    
    ls = ["this", "is"]
    print( [i.upper() for i in x.split(',') if i not in ls])


x = np.array(["this is a test", "this is not a test", "see my good example"])
np.frompyfunc(func,1,1)(x)

为什么要使用numpy数组？@roganjosh我有一种印象，认为numpy数组比Python列表更有效，因为它们效率更高（那么我在这一点上可能是错的。）是的：）numpy数组本质上并不比列表更有效。在许多情况下（例如，在循环中追加），它们的速度较慢。它们只有在使用numpy方法时才真正起作用；然后，它们可以在数量级上超过列表操作。但是不是所有的事情都可以用数组来完成如果ls在你真正的问题中是相当大的，你首先要尝试的是把它转换成一个集合，它是+100个单词，所以我肯定会这么做

arr

也很长，超过100万长。它会比使用

Joblib

或

multiprocessing

进行并行化更快吗？这就是文档所说的，矢量化比正常实现快，numpy vectorize是pyfunc（）原始函数的包装器