Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/fsharp/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
F# Alea没有正确地处理内存_F#_Aleagpu - Fatal编程技术网

F# Alea没有正确地处理内存

F# Alea没有正确地处理内存,f#,aleagpu,F#,Aleagpu,以下F#代码在第三次调用时崩溃,出现无内存异常。要么我遗漏了什么,要么Alea由于某种原因没有正确释放内存。我在F#Interactive和Compiled两个版本中都尝试过。我也尝试过手动调用dispose,但不起作用。知道为什么吗 let squareGPU (inputs:float[]) = use dInputs = worker.Malloc(inputs) use dOutputs = worker.Malloc(inputs.Length)

以下F#代码在第三次调用时崩溃,出现无内存异常。要么我遗漏了什么,要么Alea由于某种原因没有正确释放内存。我在F#Interactive和Compiled两个版本中都尝试过。我也尝试过手动调用dispose,但不起作用。知道为什么吗

let squareGPU (inputs:float[]) =
        use dInputs = worker.Malloc(inputs)
        use dOutputs = worker.Malloc(inputs.Length)
        let blockSize = 256
        let numSm = worker.Device.Attributes.MULTIPROCESSOR_COUNT
        let gridSize = Math.Min(16 * numSm, divup inputs.Length blockSize)
        let lp = new LaunchParam(gridSize, blockSize)
        worker.Launch <@ squareKernel @> lp dOutputs.Ptr dInputs.Ptr inputs.Length
        dOutputs.Gather()


let x = squareGPU [|0.0..0.001..100000.0|]
printfn "1" 
let y = squareGPU [|0.0..0.001..100000.0|]
printfn "2" 
let z = squareGPU [|0.0..0.001..100000.0|]
printfn "3"
let squareGPU(输入:浮点[])=
使用dInputs=worker.Malloc(输入)
使用dOutputs=worker.Malloc(inputs.Length)
让blockSize=256
让numSm=worker.Device.Attributes.MULTIPROCESSOR\u计数
让gridSize=Math.Min(16*numm,分段输入。长度块大小)
设lp=新启动参数(gridSize、blockSize)
worker.Launch lp dOutputs.Ptr dinput.Ptr inputs.Length
dOutputs.Gather()
设x=squareGPU[| 0.0..0.001..100000.0 |]
printfn“1”
设y=squareGPU[| 0.0..0.001..100000.0 |]
printfn“2”
设z=squareGPU[| 0.0..0.001..100000.0 |]
打印fn“3”

我想你得到了
系统。OutOfMemoryException
,对吗?这并不意味着GPU设备内存不足,而是意味着主机内存不足。在您的示例中,您在主机中创建了一个相当大的数组,并计算它,然后收集另一个大数组作为输出。关键是,您使用不同的变量名(x、y和z)来存储输出数组,因此GC将没有机会释放它,因此最终您将耗尽主机内存

我做了一个非常简单的测试(在您的示例中,我使用的是停止值30000,而不是100000),该测试只使用主机代码,不使用GPU代码:

let x1 = [|0.0..0.001..30000.0|]
printfn "1" 
let x2 = [|0.0..0.001..30000.0|]
printfn "2" 
let x3 = [|0.0..0.001..30000.0|]
printfn "3"
let x4 = [|0.0..0.001..30000.0|]
printfn "4"
let x5 = [|0.0..0.001..30000.0|]
printfn "5"
let x6 = [|0.0..0.001..30000.0|]
printfn "6"
我在F#interactive(32位进程)中运行了这段代码,得到了以下结果:

Microsoft (R) F# Interactive version 12.0.30815.0
Copyright (c) Microsoft Corporation. All Rights Reserved.

For help type #help;;

> 
1
2
System.OutOfMemoryException: Exception of type 'System.OutOfMemoryException' was thrown.
   at System.Collections.Generic.List`1.set_Capacity(Int32 value)
   at System.Collections.Generic.List`1.EnsureCapacity(Int32 min)
   at System.Collections.Generic.List`1.Add(T item)
   at Microsoft.FSharp.Collections.SeqModule.ToArray[T](IEnumerable`1 source)
   at <StartupCode$FSI_0002>.$FSI_0002.main@() in C:\Users\Xiang\Documents\Inbox\ConsoleApplication6\Script1.fsx:line 32
Stopped due to error
> 
如果我添加GPU内核,它仍然可以工作:

let foo() =
    let x = squareGPU [|0.0..0.001..30000.0|]
    printfn "1" 
    let x = squareGPU [|0.0..0.001..30000.0|]
    printfn "2" 
    let x = squareGPU [|0.0..0.001..30000.0|]
    printfn "3"
    let x = squareGPU [|0.0..0.001..30000.0|]
    printfn "4"
    let x = squareGPU [|0.0..0.001..30000.0|]
    printfn "5"
    let x = squareGPU [|0.0..0.001..30000.0|]
    printfn "6"
    let x = squareGPU [|0.0..0.001..30000.0|]
    printfn "7"
    let x = squareGPU [|0.0..0.001..30000.0|]
    printfn "8"

> foo();;
1
2
3
4
5
6
7
8
val it : unit = ()
> 

或者,您可以尝试使用64位进程。

GC在一个单独的后台线程中工作,因此,如果您经常新建大型数组,它将很容易抛出内存不足异常

在这个大数组的情况下,我建议您使用“就地修改”样式,这样会更稳定。我创建了一个测试来显示:(注意,由于数组非常大,您最好转到项目属性页,在Build选项卡中,取消选中“preferred32-bit”,以确保它作为64位进程运行)

开放系统
打开Alea.CUDA
打开Alea.CUDA.Utilities
打开NUnit.Framework
[]
let squareKernel(输出:deviceptr)(输入:deviceptr)(n:int)=
让start=blockIdx.x*blockDim.x+threadIdx.x
让stride=gridDim.x*blockDim.x
让可变的i=开始
而我

输出。[我]对aleagpu一无所知,我觉得奇怪的是,你会调用
dOutputs.Gather()
,而不是
dinput.Gather()
。我假设这是在调用
Malloc()
之后进行的清理。不,
Gather()
只是将数据从GPU复制到主机内存。你是对的。我最初在交互模式下测试它的方法是在没有任何let绑定的情况下反复调用squareGPU[|0.0..0.001..100000.0]。由于某些原因,当您在同一行上重复执行该操作时,它不会触发GC。即使在没有任何函数调用的情况下执行[|0.0..0.001..30000.0 |],也会导致内存阻塞。我想知道这是怎么回事?嗯,那么我不知道是怎么回事,你可以看到,我运行了8次,效果很好。你介意发布没有主机内存问题的测试吗?你的例子对我来说很有用,当变量超出范围时,确实可以适当地释放内存。如果您在F#Interactive中执行[|0.0..0.001..30000.0 |][|0.0..0.001..30000.0 |]时打开任务管理器,您将看到我的意思,而不是在函数内部。内存使用率应该会上升,但不会因为它从未超出范围而被释放。正如您在下面的文章中所说,GC在一个单独的线程中工作。谢谢你的课。我希望我有代表,这样我就可以投票给你了。
let foo() =
    let x = squareGPU [|0.0..0.001..30000.0|]
    printfn "1" 
    let x = squareGPU [|0.0..0.001..30000.0|]
    printfn "2" 
    let x = squareGPU [|0.0..0.001..30000.0|]
    printfn "3"
    let x = squareGPU [|0.0..0.001..30000.0|]
    printfn "4"
    let x = squareGPU [|0.0..0.001..30000.0|]
    printfn "5"
    let x = squareGPU [|0.0..0.001..30000.0|]
    printfn "6"
    let x = squareGPU [|0.0..0.001..30000.0|]
    printfn "7"
    let x = squareGPU [|0.0..0.001..30000.0|]
    printfn "8"

> foo();;
1
2
3
4
5
6
7
8
val it : unit = ()
> 
open System
open Alea.CUDA
open Alea.CUDA.Utilities
open NUnit.Framework

[<ReflectedDefinition>]
let squareKernel (outputs:deviceptr<float>) (inputs:deviceptr<float>) (n:int) =
    let start = blockIdx.x * blockDim.x + threadIdx.x
    let stride = gridDim.x * blockDim.x
    let mutable i = start 
    while i < n do
        outputs.[i] <- inputs.[i] * inputs.[i]
        i <- i + stride

let squareGPUInplaceUpdate (worker:Worker) (lp:LaunchParam) (hData:float[]) (dData:DeviceMemory<float>) =
    // instead of malloc a new device memory, you just reuse the device memory dData
    // and scatter new data to it.
    dData.Scatter(hData)
    worker.Launch <@ squareKernel @> lp dData.Ptr dData.Ptr hData.Length
    // actually, there should be a counterpart of data.Scatter(hData) like data.Gather(hData)
    // but unfortunately, that is missing, but there is a workaround of using worker.Gather.
    worker.Gather(dData.Ptr, hData)

let squareGPUManyTimes (iters:int) =
    let worker = Worker.Default

    // actually during the many iters, you just malloc 2 host array (for data and expected value)
    // and you malloc a device array. You keep reusing them, since they are big array.
    // if you new the huge array very frequentely, GC is under pressure. and since GC works
    // as a separate thread, so you will get System.OutOfMemoryException from time to time.
    let hData = [|0.0..0.001..100000.0|]
    let n = hData.Length
    let expected = Array.zeroCreate n
    use dData = worker.Malloc<float>(n)

    let rng = Random()
    let update () =
        // in-place updating the data
        for i = 0 to n - 1 do
            hData.[i] <- rng.NextDouble()
            expected.[i] <- hData.[i] * hData.[i]

    let lp =
        let blockSize = 256
        let numSm = worker.Device.Attributes.MULTIPROCESSOR_COUNT
        let gridSize = Math.Min(16 * numSm, divup n blockSize)
        new LaunchParam(gridSize, blockSize)

    for i = 1 to iters do
        update()
        squareGPUInplaceUpdate worker lp hData dData
        Assert.AreEqual(expected, hData)
        printfn "iter %d passed..." i

[<Test>]
let test() =
    squareGPUManyTimes 5