Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/python-3.x/19.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/sharepoint/4.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 3.x 并行快速排序,有人能帮我吗?_Python 3.x_Cuda_Pycuda - Fatal编程技术网

Python 3.x 并行快速排序,有人能帮我吗?

Python 3.x 并行快速排序,有人能帮我吗?,python-3.x,cuda,pycuda,Python 3.x,Cuda,Pycuda,我试图通过在另外两个与pivo相比的文件中指定列表分隔片段来实现快速排序并行化。我的语法有问题,无法将指针保存在两个新列表的末尾。如何消除语法错误并在内核末尾保存列表大小 导入pycuda.autoinit 将pycuda.driver导入为cuda 从pycuda导入gpuarray,编译器 从pycuda.compiler导入SourceModule 导入时间 将numpy作为np导入 def快速排序\u paralleloGlobal(列表元素:列表)->列表: 如果len(lisele

我试图通过在另外两个与
pivo
相比的文件中指定列表分隔片段来实现快速排序并行化。我的语法有问题,无法将指针保存在两个新列表的末尾。如何消除语法错误并在内核末尾保存列表大小


导入pycuda.autoinit
将pycuda.driver导入为cuda
从pycuda导入gpuarray,编译器
从pycuda.compiler导入SourceModule
导入时间
将numpy作为np导入
def快速排序\u paralleloGlobal(列表元素:列表)->列表:

如果len(liselements)则代码存在许多问题。我想我不能把它们全部列出。然而,中心问题之一是,您尝试将串行快速排序转换为线程并行快速排序,而这样简单的转换是不可能的

要允许线程以并行方式工作,同时将输入列表划分为两个单独的输出列表之一,需要对内核代码进行大量更改

但是,我们可以通过将内核启动限制为每个线程来解决大多数其他问题

有了这个想法,下面的代码似乎可以正确地对给定的输入进行排序:

$ cat t18.py
import pycuda.autoinit
import pycuda.driver as cuda
from pycuda import gpuarray, compiler
from pycuda.compiler import SourceModule
import time
import numpy as np


def quickSort_paralleloGlobal(listElements):

        if len(listElements) <= 1:

            return listElements

        else:

            pivo = listElements.pop()
            pivo = np.int32(pivo)

            kernel_code_template = """
                    __global__ void separateQuick(int *listElements, int *list1, int *list2, int *l1_size, int *l2_size, int pivo)
                    {
                        int index1 = 0, index2 = 0;
                        int index = blockIdx.x * blockDim.x + threadIdx.x;
                        int stride = blockDim.x * gridDim.x;
                        for (int i = index; i < %(ARRAY_SIZE)s; i+= stride)
                            if (listElements[i] < pivo)
                            {
                                list1[index1] = listElements[i];
                                index1++;
                            }
                            else
                            {
                                list2[index2] = listElements[i];
                                index2++;
                            }
                        *l1_size = index1;
                        *l2_size = index2;
                    }
                    """
            SIZE = len(listElements)

            listElements = np.asarray(listElements)
            listElements = listElements.astype(np.int32)
            lista_gpu = cuda.mem_alloc(listElements.nbytes)
            cuda.memcpy_htod(lista_gpu, listElements)

            list1_gpu = cuda.mem_alloc(listElements.nbytes)
            list2_gpu = cuda.mem_alloc(listElements.nbytes)
            l1_size   = cuda.mem_alloc(4)
            l2_size   = cuda.mem_alloc(4)
            BLOCK_SIZE = 1
            NUM_BLOCKS = 1
            kernel_code = kernel_code_template % {
                'ARRAY_SIZE': SIZE
            }

            mod = compiler.SourceModule(kernel_code)
            arraysQuick = mod.get_function("separateQuick")

            arraysQuick(lista_gpu, list1_gpu, list2_gpu, l1_size, l2_size, pivo, block=(BLOCK_SIZE, 1, 1), grid=(NUM_BLOCKS, 1))
            l1_sh = np.zeros(1, dtype = np.int32)
            l2_sh = np.zeros(1, dtype = np.int32)
            cuda.memcpy_dtoh(l1_sh, l1_size)
            cuda.memcpy_dtoh(l2_sh, l2_size)
            list1 = np.zeros(l1_sh, dtype=np.int32)
            list2 = np.zeros(l2_sh, dtype=np.int32)
            cuda.memcpy_dtoh(list1, list1_gpu)
            cuda.memcpy_dtoh(list2, list2_gpu)
            list1 = list1.tolist()
            list2 = list2.tolist()
            return quickSort_paralleloGlobal(list1) + [pivo] + quickSort_paralleloGlobal(list2)

print(quickSort_paralleloGlobal([1, 5, 4, 2, 0]))
$ python t18.py
[0, 1, 2, 4, 5]
$
$cat t18.py
导入pycuda.autoinit
将pycuda.driver导入为cuda
从pycuda导入gpuarray,编译器
从pycuda.compiler导入SourceModule
导入时间
将numpy作为np导入
def快速排序\u并行全局(列表元素):

if(listElements)
if(lista[i]
……是另一个错误。这是您的问题中显示的唯一错误(编译器甚至说了它到底是什么)。谢谢,请多多帮助。
$ cat t18.py
import pycuda.autoinit
import pycuda.driver as cuda
from pycuda import gpuarray, compiler
from pycuda.compiler import SourceModule
import time
import numpy as np


def quickSort_paralleloGlobal(listElements):

        if len(listElements) <= 1:

            return listElements

        else:

            pivo = listElements.pop()
            pivo = np.int32(pivo)

            kernel_code_template = """
                    __global__ void separateQuick(int *listElements, int *list1, int *list2, int *l1_size, int *l2_size, int pivo)
                    {
                        int index1 = 0, index2 = 0;
                        int index = blockIdx.x * blockDim.x + threadIdx.x;
                        int stride = blockDim.x * gridDim.x;
                        for (int i = index; i < %(ARRAY_SIZE)s; i+= stride)
                            if (listElements[i] < pivo)
                            {
                                list1[index1] = listElements[i];
                                index1++;
                            }
                            else
                            {
                                list2[index2] = listElements[i];
                                index2++;
                            }
                        *l1_size = index1;
                        *l2_size = index2;
                    }
                    """
            SIZE = len(listElements)

            listElements = np.asarray(listElements)
            listElements = listElements.astype(np.int32)
            lista_gpu = cuda.mem_alloc(listElements.nbytes)
            cuda.memcpy_htod(lista_gpu, listElements)

            list1_gpu = cuda.mem_alloc(listElements.nbytes)
            list2_gpu = cuda.mem_alloc(listElements.nbytes)
            l1_size   = cuda.mem_alloc(4)
            l2_size   = cuda.mem_alloc(4)
            BLOCK_SIZE = 1
            NUM_BLOCKS = 1
            kernel_code = kernel_code_template % {
                'ARRAY_SIZE': SIZE
            }

            mod = compiler.SourceModule(kernel_code)
            arraysQuick = mod.get_function("separateQuick")

            arraysQuick(lista_gpu, list1_gpu, list2_gpu, l1_size, l2_size, pivo, block=(BLOCK_SIZE, 1, 1), grid=(NUM_BLOCKS, 1))
            l1_sh = np.zeros(1, dtype = np.int32)
            l2_sh = np.zeros(1, dtype = np.int32)
            cuda.memcpy_dtoh(l1_sh, l1_size)
            cuda.memcpy_dtoh(l2_sh, l2_size)
            list1 = np.zeros(l1_sh, dtype=np.int32)
            list2 = np.zeros(l2_sh, dtype=np.int32)
            cuda.memcpy_dtoh(list1, list1_gpu)
            cuda.memcpy_dtoh(list2, list2_gpu)
            list1 = list1.tolist()
            list2 = list2.tolist()
            return quickSort_paralleloGlobal(list1) + [pivo] + quickSort_paralleloGlobal(list2)

print(quickSort_paralleloGlobal([1, 5, 4, 2, 0]))
$ python t18.py
[0, 1, 2, 4, 5]
$
$ cat t18.py
import pycuda.autoinit
import pycuda.driver as cuda
from pycuda import gpuarray, compiler
from pycuda.compiler import SourceModule
import time
import numpy as np


def quickSort_paralleloGlobal(listElements):

        if len(listElements) <= 1:

            return listElements

        else:

            pivo = listElements.pop()
            pivo = np.int32(pivo)

            kernel_code_template = """
                    __global__ void separateQuick(int *listElements, int *list1, int *list2, int *l1_size, int *l2_size, int pivo)
                    {
                        int index = blockIdx.x * blockDim.x + threadIdx.x;
                        int stride = blockDim.x * gridDim.x;
                        for (int i = index; i < %(ARRAY_SIZE)s; i+= stride)
                            if (listElements[i] < pivo)
                            {
                                list1[atomicAdd(l1_size, 1)] = listElements[i];
                            }
                            else
                            {
                                list2[atomicAdd(l2_size, 1)] = listElements[i];
                            }
                    }
                    """
            SIZE = len(listElements)

            listElements = np.asarray(listElements)
            listElements = listElements.astype(np.int32)
            lista_gpu = cuda.mem_alloc(listElements.nbytes)
            cuda.memcpy_htod(lista_gpu, listElements)

            list1_gpu = cuda.mem_alloc(listElements.nbytes)
            list2_gpu = cuda.mem_alloc(listElements.nbytes)
            l1_size   = cuda.mem_alloc(4)
            l2_size   = cuda.mem_alloc(4)
            BLOCK_SIZE = 256
            NUM_BLOCKS = (SIZE + BLOCK_SIZE - 1) // BLOCK_SIZE
            kernel_code = kernel_code_template % {
                'ARRAY_SIZE': SIZE
            }

            mod = compiler.SourceModule(kernel_code)
            arraysQuick = mod.get_function("separateQuick")
            l1_sh = np.zeros(1, dtype = np.int32)
            l2_sh = np.zeros(1, dtype = np.int32)
            cuda.memcpy_htod(l1_size, l1_sh)
            cuda.memcpy_htod(l2_size, l2_sh)
            arraysQuick(lista_gpu, list1_gpu, list2_gpu, l1_size, l2_size, pivo, block=(BLOCK_SIZE, 1, 1), grid=(NUM_BLOCKS, 1))
            cuda.memcpy_dtoh(l1_sh, l1_size)
            cuda.memcpy_dtoh(l2_sh, l2_size)
            list1 = np.zeros(l1_sh, dtype=np.int32)
            list2 = np.zeros(l2_sh, dtype=np.int32)
            cuda.memcpy_dtoh(list1, list1_gpu)
            cuda.memcpy_dtoh(list2, list2_gpu)
            list1 = list1.tolist()
            list2 = list2.tolist()
            return quickSort_paralleloGlobal(list1) + [pivo] + quickSort_paralleloGlobal(list2)

print(quickSort_paralleloGlobal([1, 5, 4, 2, 0]))
$ python t18.py
[0, 1, 2, 4, 5]
$