Synchronization 如何在DirectX/Direct3D 12中使用fence同步CPU和GPU?

Synchronization 如何在DirectX/Direct3D 12中使用fence同步CPU和GPU?,synchronization,gpu,directx,direct3d12,Synchronization,Gpu,Directx,Direct3d12,我开始学习Direct3D 12,在理解CPU-GPU同步方面有困难。据我所知,fence(ID3D12Fence)只不过是一个用作计数器的UINT64(unsignedlong-long)值。但它的方法让我困惑。下面是来自D3D12示例的部分源代码。() mCommandQueue->Signal(mFence.Get(),mcurrentfinence)一旦执行了命令队列上所有先前排队的命令,就将栅栏值设置为mcurrentfinence。在这种情况下,“指定值”为mCurrentFence

我开始学习Direct3D 12,在理解CPU-GPU同步方面有困难。据我所知,fence(ID3D12Fence)只不过是一个用作计数器的UINT64(unsignedlong-long)值。但它的方法让我困惑。下面是来自D3D12示例的部分源代码。()


mCommandQueue->Signal(mFence.Get(),mcurrentfinence)
一旦执行了命令队列上所有先前排队的命令,就将栅栏值设置为
mcurrentfinence
。在这种情况下,“指定值”为mCurrentFence

启动时,围栏和mCurrentFence的值都设置为0。接下来,将McCurrentFence设置为1。然后我们执行
mCommandQueue->Signal(mFence.Get(),1)
,一旦在该队列上执行了所有操作,就会将围栏设置为1。最后,我们调用
mFence->SetEventOnCompletion(1,eventHandle)
,然后调用
WaitForSingleObject
,等待围栏设置为1

在下一次迭代中将1替换为2,以此类推

请注意,
mCommandQueue->Signal
是一种非阻塞操作,只有在执行所有其他gpu命令后,才会立即设置围栏的值。在本例中,您可以假设
m_Fence->GetCompletedValue()
始终为真

为什么需要McCurrentFence值?

我认为不一定需要它,但通过这种方式跟踪围栏值,可以避免额外的API调用。在这种情况下,您还可以执行以下操作:

//检索围栏的最后一个值并按1递增(附加API调用)
auto nextFence=mFence->GetCompletedValue()+1;
ThrowIfFailed(mCommandQueue->Signal(mFence.Get(),nextFence));
//等待GPU完成命令,直到到达该围栏点。
if(mFence->GetCompletedValue()SetEventOnCompletion(nextFence,eventHandle));
WaitForSingleObject(eventHandle,无限);
CloseHandle(eventHandle);
}

作为对费利克斯答案的补充:

跟踪围栏值(例如,
mCurrentFence
)有助于在命令队列中等待更具体的点

例如,假设我们正在使用此设置:

ComPtr<ID3D12CommandQueue> queue;
ComPtr<ID3D12Fence> queueFence;
UINT64 fenceVal = 0;

UINT64 incrementFence()
{
    fenceVal++;
    queue->Signal(queueFence.Get(), fenceVal); // CHECK HRESULT
    return fenceVal;
}

void waitFor(UINT64 fenceVal, DWORD timeout = INFINITE)
{
    if (queueFence->GetCompletedValue() < fenceVal)
    {
        queueFence->SetEventOnCompletion(fenceVal, fenceEv); // CHECK HRESULT
        WaitForSingleObject(fenceEv, timeout);
    }
}

这个例子比最初的问题要复杂一些,但是,我认为在询问关于跟踪
mcurrentfinence

作为一种分割提交部分和等待部分的方法时,这一点很重要。可以像下面这样编码吗?void SynchronizeWithGPU(){if(mFence->GetCompletedValue()SetEventOnCompletion(m_nextFence,eventHandle));WaitForSingleObject(eventHandle,INFINITE);CloseHandle(eventHandle);}}并将信号部分放在mCommandQueue->ExecuteCommandLists()附近?在GPU执行信号命令之前,这似乎会提供更多的时间间隔,因为信号没有立即处理。在我看来,这没问题。
// Suppose mCurrentFence is 1 after submitting 1 command list (Index 0), and the thread reached to here for the FIRST time
ThrowIfFailed(mCommandQueue->Signal(mFence.Get()));
// At this point Fence value inside mFence is updated
if (m_Fence->GetCompletedValue() < mCurrentFence)
{
...
}
mCommandQueue->ExecuteCommandLists(_countof(cmdsLists), cmdsLists);
mCurrentFence++;
ComPtr<ID3D12CommandQueue> queue;
ComPtr<ID3D12Fence> queueFence;
UINT64 fenceVal = 0;

UINT64 incrementFence()
{
    fenceVal++;
    queue->Signal(queueFence.Get(), fenceVal); // CHECK HRESULT
    return fenceVal;
}

void waitFor(UINT64 fenceVal, DWORD timeout = INFINITE)
{
    if (queueFence->GetCompletedValue() < fenceVal)
    {
        queueFence->SetEventOnCompletion(fenceVal, fenceEv); // CHECK HRESULT
        WaitForSingleObject(fenceEv, timeout);
    }
}
SUBMIT COMMANDS 1
cmds1Complete = incrementFence();
    .
    . <- CPU STUFF
    .
SUBMIT COMMANDS 2
cmds2Complete = incrementFence();
    .
    . <- CPU STUFF
    .
waitFor(cmds1Complete)
    .
    . <- CPU STUFF (that needs COMMANDS 1 to be complete,
      but COMMANDS 2 is NOT required to be completed [but also could be])
    .
waitFor(cmds2Complete)
    .
    . <- EVERYTHING COMPLETE
    .
void flushCmdQueue()
{
    waitFor(incrementFence());
}