Objective c 如何使用VideoToolbox解压H.264视频流_Objective C_Ios8_H.264_Video Toolbox

Objective c 如何使用VideoToolbox解压H.264视频流

objective-c ios8

Objective c 如何使用VideoToolbox解压H.264视频流,objective-c,ios8,h.264,video-toolbox,Objective C,Ios8,H.264,Video Toolbox,我很难弄清楚如何使用苹果的硬件加速视频框架来解压缩H.264视频流。几周后，我明白了这一点，并想分享一个广泛的例子，因为我找不到一个我的目标是给出一个在中介绍的视频工具箱的全面、有指导意义的示例。我的代码不会编译或运行，因为它需要与基本的H.264流（如从文件读取的视频或从在线流传输的视频等）集成，并且需要根据具体情况进行调整我应该提到的是，除了在谷歌搜索主题时学到的东西外，我对视频解码的经验很少。我不知道有关视频格式、参数结构等的所有细节，所以我只包含了我认为您需要知道的内容我使用的是X

我很难弄清楚如何使用苹果的硬件加速视频框架来解压缩H.264视频流。几周后，我明白了这一点，并想分享一个广泛的例子，因为我找不到一个

我的目标是给出一个在中介绍的视频工具箱的全面、有指导意义的示例。我的代码不会编译或运行，因为它需要与基本的H.264流（如从文件读取的视频或从在线流传输的视频等）集成，并且需要根据具体情况进行调整

我应该提到的是，除了在谷歌搜索主题时学到的东西外，我对视频解码的经验很少。我不知道有关视频格式、参数结构等的所有细节，所以我只包含了我认为您需要知道的内容

我使用的是XCode 6.2，并已部署到运行iOS 8.1和8.2的iOS设备上。

概念： NALUs:NALU只是一个长度不同的数据块，它有一个NALU起始代码头

0x00 01 YY

，其中

YY

的前5位告诉您这是什么类型的NALU，因此头后面是什么类型的数据。（因为您只需要前5位，所以我使用

YY&0x1F

仅获取相关位。）我列出了方法

NSString*const naluTypesStrings[]

中的所有这些类型，但您不需要知道它们都是什么

参数：解码器需要参数，以便了解H.264视频数据的存储方式。您需要设置的2个是序列参数集（SPS）和图片参数集（PPS），它们都有自己的NALU类型编号。你不需要知道这些参数意味着什么，解码器知道如何处理它们

H.264流格式：在大多数H.264流中，您将收到一组初始PPS和SPS参数，后跟一个i帧（也称IDR帧或flush帧）NALU。然后，您将收到几个P帧nalu（可能几十个左右），然后是另一组参数（可能与初始参数相同）和一个i帧，更多的P帧，等等。i帧比P帧大得多。从概念上讲，您可以将i帧视为视频的整个图像，P帧只是对该i帧所做的更改，直到您收到下一个i帧为止

程序：

从您的H.264流生成单个NALU。我无法显示此步骤的代码，因为它在很大程度上取决于您使用的视频源。我制作这个图形是为了显示我正在处理的内容（“图形中的数据”在我下面的代码中是“框架”），但是您的情况可能会有所不同，也可能会有所不同。我的方法

receivedRawVideoFrame:

在每次我收到两种类型之一的帧（

uint8\t*frame

）时都会被调用。在图中，这两种帧类型是两个大的紫色框

使用CMVideoFormatDescriptionCreatefromH264参数集（）从SPS和PPS NALU创建CMVideoFormatDescriptionRef。如果不先执行此操作，则无法显示任何帧。SPS和PPS看起来像一堆乱七八糟的数字，但VTD知道如何处理它们。您需要知道的是，
CMVideoFormatDescriptionRef
是对视频数据的描述，如宽度/高度、格式类型（
kCMPixelFormat_32BGRA
、
kCMVideoCodecType_H264
等）、纵横比、颜色空间等。解码器将保留这些参数，直到新的参数集到达（有时参数即使没有更改，也会定期重新显示）

根据“AVCC”格式重新打包您的IDR和非IDR帧NALU。这意味着删除NALU开始代码，并将其替换为说明NALU长度的4字节头。您无需对SPS和PPS NALU执行此操作。（请注意，4字节的NALU长度标头是以大端为单位的，因此，如果您有
UInt32
值，则必须在使用
CFSwapInt32
复制到
CMBlockBuffer
之前对其进行字节交换。我在代码中使用
htonl
函数调用执行此操作。）

将IDR和非IDR NALU帧打包到CMBlockBuffer中。不要使用SPS PPS参数NALUs执行此操作。关于
CMBlockBuffers
您需要知道的是，它们是一种将任意数据块打包到核心媒体中的方法。（视频管道中的任何压缩视频数据都打包在该文件中。）

将CMBlockBuffer打包到CMSampleBuffer中。关于
CMSampleBuffers
您需要知道的是，它们将我们的
CMBlockBuffers
与其他信息打包在一起（这里是
CMVideoFormatDescription
和
CMTime
，如果使用
CMTime
）

创建VTDecompressionRef并将示例缓冲区馈送到VTDecompressionDecodeFrame（）.或者，您可以使用
AVSampleBufferDisplayLayer
及其
enqueueSampleBuffer:
方法，并且您不需要使用VTDecodeSession。它的设置更简单，但在VTD出现问题时不会抛出错误

在VTDecomposition回调中，使用生成的CVImageBufferRef显示视频帧。如果需要将
CVImageBuffer
转换为
UIImage
，请参阅我的StackOverflow答案
其他说明：

H.264流可能变化很大。据我所知，NALU开始代码头有时是3个字节（
0x00 01
），有时是4个字节（
0x00 00 01
）。我的代码工作4个字节；如果使用3，您需要更改一些内容

如果你想知道更多关于NALUs的事，我发现这很有帮助
#import <VideoToolbox/VideoToolbox.h> @property (nonatomic, assign) CMVideoFormatDescriptionRef formatDesc; @property (nonatomic, assign) VTDecompressionSessionRef decompressionSession; @property (nonatomic, retain) AVSampleBufferDisplayLayer *videoLayer; @property (nonatomic, assign) int spsSize; @property (nonatomic, assign) int ppsSize;

NSString * const naluTypesStrings[] = { @"0: Unspecified (non-VCL)", @"1: Coded slice of a non-IDR picture (VCL)", // P frame @"2: Coded slice data partition A (VCL)", @"3: Coded slice data partition B (VCL)", @"4: Coded slice data partition C (VCL)", @"5: Coded slice of an IDR picture (VCL)", // I frame @"6: Supplemental enhancement information (SEI) (non-VCL)", @"7: Sequence parameter set (non-VCL)", // SPS parameter @"8: Picture parameter set (non-VCL)", // PPS parameter @"9: Access unit delimiter (non-VCL)", @"10: End of sequence (non-VCL)", @"11: End of stream (non-VCL)", @"12: Filler data (non-VCL)", @"13: Sequence parameter set extension (non-VCL)", @"14: Prefix NAL unit (non-VCL)", @"15: Subset sequence parameter set (non-VCL)", @"16: Reserved (non-VCL)", @"17: Reserved (non-VCL)", @"18: Reserved (non-VCL)", @"19: Coded slice of an auxiliary coded picture without partitioning (non-VCL)", @"20: Coded slice extension (non-VCL)", @"21: Coded slice extension for depth view components (non-VCL)", @"22: Reserved (non-VCL)", @"23: Reserved (non-VCL)", @"24: STAP-A Single-time aggregation packet (non-VCL)", @"25: STAP-B Single-time aggregation packet (non-VCL)", @"26: MTAP16 Multi-time aggregation packet (non-VCL)", @"27: MTAP24 Multi-time aggregation packet (non-VCL)", @"28: FU-A Fragmentation unit (non-VCL)", @"29: FU-B Fragmentation unit (non-VCL)", @"30: Unspecified (non-VCL)", @"31: Unspecified (non-VCL)", };

-(void) receivedRawVideoFrame:(uint8_t *)frame withSize:(uint32_t)frameSize isIFrame:(int)isIFrame { OSStatus status; uint8_t *data = NULL; uint8_t *pps = NULL; uint8_t *sps = NULL; // I know what my H.264 data source's NALUs look like so I know start code index is always 0. // if you don't know where it starts, you can use a for loop similar to how i find the 2nd and 3rd start codes int startCodeIndex = 0; int secondStartCodeIndex = 0; int thirdStartCodeIndex = 0; long blockLength = 0; CMSampleBufferRef sampleBuffer = NULL; CMBlockBufferRef blockBuffer = NULL; int nalu_type = (frame[startCodeIndex + 4] & 0x1F); NSLog(@"~~~~~~~ Received NALU Type \"%@\" ~~~~~~~~", naluTypesStrings[nalu_type]); // if we havent already set up our format description with our SPS PPS parameters, we // can't process any frames except type 7 that has our parameters if (nalu_type != 7 && _formatDesc == NULL) { NSLog(@"Video error: Frame is not an I Frame and format description is null"); return; } // NALU type 7 is the SPS parameter NALU if (nalu_type == 7) { // find where the second PPS start code begins, (the 0x00 00 00 01 code) // from which we also get the length of the first SPS code for (int i = startCodeIndex + 4; i < startCodeIndex + 40; i++) { if (frame[i] == 0x00 && frame[i+1] == 0x00 && frame[i+2] == 0x00 && frame[i+3] == 0x01) { secondStartCodeIndex = i; _spsSize = secondStartCodeIndex; // includes the header in the size break; } } // find what the second NALU type is nalu_type = (frame[secondStartCodeIndex + 4] & 0x1F); NSLog(@"~~~~~~~ Received NALU Type \"%@\" ~~~~~~~~", naluTypesStrings[nalu_type]); } // type 8 is the PPS parameter NALU if(nalu_type == 8) { // find where the NALU after this one starts so we know how long the PPS parameter is for (int i = _spsSize + 4; i < _spsSize + 30; i++) { if (frame[i] == 0x00 && frame[i+1] == 0x00 && frame[i+2] == 0x00 && frame[i+3] == 0x01) { thirdStartCodeIndex = i; _ppsSize = thirdStartCodeIndex - _spsSize; break; } } // allocate enough data to fit the SPS and PPS parameters into our data objects. // VTD doesn't want you to include the start code header (4 bytes long) so we add the - 4 here sps = malloc(_spsSize - 4); pps = malloc(_ppsSize - 4); // copy in the actual sps and pps values, again ignoring the 4 byte header memcpy (sps, &frame[4], _spsSize-4); memcpy (pps, &frame[_spsSize+4], _ppsSize-4); // now we set our H264 parameters uint8_t* parameterSetPointers[2] = {sps, pps}; size_t parameterSetSizes[2] = {_spsSize-4, _ppsSize-4}; // suggestion from @Kris Dude's answer below if (_formatDesc) { CFRelease(_formatDesc); _formatDesc = NULL; } status = CMVideoFormatDescriptionCreateFromH264ParameterSets(kCFAllocatorDefault, 2, (const uint8_t *const*)parameterSetPointers, parameterSetSizes, 4, &_formatDesc); NSLog(@"\t\t Creation of CMVideoFormatDescription: %@", (status == noErr) ? @"successful!" : @"failed..."); if(status != noErr) NSLog(@"\t\t Format Description ERROR type: %d", (int)status); // See if decomp session can convert from previous format description // to the new one, if not we need to remake the decomp session. // This snippet was not necessary for my applications but it could be for yours /*BOOL needNewDecompSession = (VTDecompressionSessionCanAcceptFormatDescription(_decompressionSession, _formatDesc) == NO); if(needNewDecompSession) { [self createDecompSession]; }*/ // now lets handle the IDR frame that (should) come after the parameter sets // I say "should" because that's how I expect my H264 stream to work, YMMV nalu_type = (frame[thirdStartCodeIndex + 4] & 0x1F); NSLog(@"~~~~~~~ Received NALU Type \"%@\" ~~~~~~~~", naluTypesStrings[nalu_type]); } // create our VTDecompressionSession. This isnt neccessary if you choose to use AVSampleBufferDisplayLayer if((status == noErr) && (_decompressionSession == NULL)) { [self createDecompSession]; } // type 5 is an IDR frame NALU. The SPS and PPS NALUs should always be followed by an IDR (or IFrame) NALU, as far as I know if(nalu_type == 5) { // find the offset, or where the SPS and PPS NALUs end and the IDR frame NALU begins int offset = _spsSize + _ppsSize; blockLength = frameSize - offset; data = malloc(blockLength); data = memcpy(data, &frame[offset], blockLength); // replace the start code header on this NALU with its size. // AVCC format requires that you do this. // htonl converts the unsigned int from host to network byte order uint32_t dataLength32 = htonl (blockLength - 4); memcpy (data, &dataLength32, sizeof (uint32_t)); // create a block buffer from the IDR NALU status = CMBlockBufferCreateWithMemoryBlock(NULL, data, // memoryBlock to hold buffered data blockLength, // block length of the mem block in bytes. kCFAllocatorNull, NULL, 0, // offsetToData blockLength, // dataLength of relevant bytes, starting at offsetToData 0, &blockBuffer); NSLog(@"\t\t BlockBufferCreation: \t %@", (status == kCMBlockBufferNoErr) ? @"successful!" : @"failed..."); } // NALU type 1 is non-IDR (or PFrame) picture if (nalu_type == 1) { // non-IDR frames do not have an offset due to SPS and PSS, so the approach // is similar to the IDR frames just without the offset blockLength = frameSize; data = malloc(blockLength); data = memcpy(data, &frame[0], blockLength); // again, replace the start header with the size of the NALU uint32_t dataLength32 = htonl (blockLength - 4); memcpy (data, &dataLength32, sizeof (uint32_t)); status = CMBlockBufferCreateWithMemoryBlock(NULL, data, // memoryBlock to hold data. If NULL, block will be alloc when needed blockLength, // overall length of the mem block in bytes kCFAllocatorNull, NULL, 0, // offsetToData blockLength, // dataLength of relevant data bytes, starting at offsetToData 0, &blockBuffer); NSLog(@"\t\t BlockBufferCreation: \t %@", (status == kCMBlockBufferNoErr) ? @"successful!" : @"failed..."); } // now create our sample buffer from the block buffer, if(status == noErr) { // here I'm not bothering with any timing specifics since in my case we displayed all frames immediately const size_t sampleSize = blockLength; status = CMSampleBufferCreate(kCFAllocatorDefault, blockBuffer, true, NULL, NULL, _formatDesc, 1, 0, NULL, 1, &sampleSize, &sampleBuffer); NSLog(@"\t\t SampleBufferCreate: \t %@", (status == noErr) ? @"successful!" : @"failed..."); } if(status == noErr) { // set some values of the sample buffer's attachments CFArrayRef attachments = CMSampleBufferGetSampleAttachmentsArray(sampleBuffer, YES); CFMutableDictionaryRef dict = (CFMutableDictionaryRef)CFArrayGetValueAtIndex(attachments, 0); CFDictionarySetValue(dict, kCMSampleAttachmentKey_DisplayImmediately, kCFBooleanTrue); // either send the samplebuffer to a VTDecompressionSession or to an AVSampleBufferDisplayLayer [self render:sampleBuffer]; } // free memory to avoid a memory leak, do the same for sps, pps and blockbuffer if (NULL != data) { free (data); data = NULL; } }

-(void) createDecompSession { // make sure to destroy the old VTD session _decompressionSession = NULL; VTDecompressionOutputCallbackRecord callBackRecord; callBackRecord.decompressionOutputCallback = decompressionSessionDecodeFrameCallback; // this is necessary if you need to make calls to Objective C "self" from within in the callback method. callBackRecord.decompressionOutputRefCon = (__bridge void *)self; // you can set some desired attributes for the destination pixel buffer. I didn't use this but you may // if you need to set some attributes, be sure to uncomment the dictionary in VTDecompressionSessionCreate NSDictionary *destinationImageBufferAttributes = [NSDictionary dictionaryWithObjectsAndKeys: [NSNumber numberWithBool:YES], (id)kCVPixelBufferOpenGLESCompatibilityKey, nil]; OSStatus status = VTDecompressionSessionCreate(NULL, _formatDesc, NULL, NULL, // (__bridge CFDictionaryRef)(destinationImageBufferAttributes) &callBackRecord, &_decompressionSession); NSLog(@"Video Decompression Session Create: \t %@", (status == noErr) ? @"successful!" : @"failed..."); if(status != noErr) NSLog(@"\t\t VTD ERROR type: %d", (int)status); }

void decompressionSessionDecodeFrameCallback(void *decompressionOutputRefCon, void *sourceFrameRefCon, OSStatus status, VTDecodeInfoFlags infoFlags, CVImageBufferRef imageBuffer, CMTime presentationTimeStamp, CMTime presentationDuration) { THISCLASSNAME *streamManager = (__bridge THISCLASSNAME *)decompressionOutputRefCon; if (status != noErr) { NSError *error = [NSError errorWithDomain:NSOSStatusErrorDomain code:status userInfo:nil]; NSLog(@"Decompressed error: %@", error); } else { NSLog(@"Decompressed sucessfully"); // do something with your resulting CVImageBufferRef that is your decompressed frame [streamManager displayDecodedFrame:imageBuffer]; } }

- (void) render:(CMSampleBufferRef)sampleBuffer { VTDecodeFrameFlags flags = kVTDecodeFrame_EnableAsynchronousDecompression; VTDecodeInfoFlags flagOut; NSDate* currentTime = [NSDate date]; VTDecompressionSessionDecodeFrame(_decompressionSession, sampleBuffer, flags, (void*)CFBridgingRetain(currentTime), &flagOut); CFRelease(sampleBuffer); // if you're using AVSampleBufferDisplayLayer, you only need to use this line of code // [videoLayer enqueueSampleBuffer:sampleBuffer]; }

-(void) viewDidLoad { // create our AVSampleBufferDisplayLayer and add it to the view videoLayer = [[AVSampleBufferDisplayLayer alloc] init]; videoLayer.frame = self.view.frame; videoLayer.bounds = self.view.bounds; videoLayer.videoGravity = AVLayerVideoGravityResizeAspect; // set Timebase, you may need this if you need to display frames at specific times // I didn't need it so I haven't verified that the timebase is working CMTimebaseRef controlTimebase; CMTimebaseCreateWithMasterClock(CFAllocatorGetDefault(), CMClockGetHostTimeClock(), &controlTimebase); //videoLayer.controlTimebase = controlTimebase; CMTimebaseSetTime(self.videoLayer.controlTimebase, kCMTimeZero); CMTimebaseSetRate(self.videoLayer.controlTimebase, 1.0); [[self.view layer] addSublayer:videoLayer]; }

kVTPropertyNotSupportedErr = -12900, kVTPropertyReadOnlyErr = -12901, kVTParameterErr = -12902, kVTInvalidSessionErr = -12903, kVTAllocationFailedErr = -12904, kVTPixelTransferNotSupportedErr = -12905, // c.f. -8961 kVTCouldNotFindVideoDecoderErr = -12906, kVTCouldNotCreateInstanceErr = -12907, kVTCouldNotFindVideoEncoderErr = -12908, kVTVideoDecoderBadDataErr = -12909, // c.f. -8969 kVTVideoDecoderUnsupportedDataFormatErr = -12910, // c.f. -8970 kVTVideoDecoderMalfunctionErr = -12911, // c.f. -8960 kVTVideoEncoderMalfunctionErr = -12912, kVTVideoDecoderNotAvailableNowErr = -12913, kVTImageRotationNotSupportedErr = -12914, kVTVideoEncoderNotAvailableNowErr = -12915, kVTFormatDescriptionChangeNotSupportedErr = -12916, kVTInsufficientSourceColorDataErr = -12917, kVTCouldNotCreateColorCorrectionDataErr = -12918, kVTColorSyncTransformConvertFailedErr = -12919, kVTVideoDecoderAuthorizationErr = -12210, kVTVideoEncoderAuthorizationErr = -12211, kVTColorCorrectionPixelTransferFailedErr = -12212, kVTMultiPassStorageIdentifierMismatchErr = -12213, kVTMultiPassStorageInvalidErr = -12214, kVTFrameSiloInvalidTimeStampErr = -12215, kVTFrameSiloInvalidTimeRangeErr = -12216, kVTCouldNotFindTemporalFilterErr = -12217, kVTPixelTransferNotPermittedErr = -12218,

kCMFormatDescriptionError_InvalidParameter = -12710, kCMFormatDescriptionError_AllocationFailed = -12711, kCMFormatDescriptionError_ValueNotAvailable = -12718, kCMBlockBufferNoErr = 0, kCMBlockBufferStructureAllocationFailedErr = -12700, kCMBlockBufferBlockAllocationFailedErr = -12701, kCMBlockBufferBadCustomBlockSourceErr = -12702, kCMBlockBufferBadOffsetParameterErr = -12703, kCMBlockBufferBadLengthParameterErr = -12704, kCMBlockBufferBadPointerParameterErr = -12705, kCMBlockBufferEmptyBBufErr = -12706, kCMBlockBufferUnallocatedBlockErr = -12707, kCMBlockBufferInsufficientSpaceErr = -12708, kCMSampleBufferError_AllocationFailed = -12730, kCMSampleBufferError_RequiredParameterMissing = -12731, kCMSampleBufferError_AlreadyHasDataBuffer = -12732, kCMSampleBufferError_BufferNotReady = -12733, kCMSampleBufferError_SampleIndexOutOfRange = -12734, kCMSampleBufferError_BufferHasNoSampleSizes = -12735, kCMSampleBufferError_BufferHasNoSampleTimingInfo = -12736, kCMSampleBufferError_ArrayTooSmall = -12737, kCMSampleBufferError_InvalidEntryCount = -12738, kCMSampleBufferError_CannotSubdivide = -12739, kCMSampleBufferError_SampleTimingInfoInvalid = -12740, kCMSampleBufferError_InvalidMediaTypeForOperation = -12741, kCMSampleBufferError_InvalidSampleData = -12742, kCMSampleBufferError_InvalidMediaFormat = -12743, kCMSampleBufferError_Invalidated = -12744, kCMSampleBufferError_DataFailed = -16750, kCMSampleBufferError_DataCanceled = -16751,

if (_formatDesc) { CFRelease(_formatDesc); _formatDesc = NULL; }

public enum NALUnitType : byte { NALU_TYPE_UNKNOWN = 0, NALU_TYPE_SLICE = 1, NALU_TYPE_DPA = 2, NALU_TYPE_DPB = 3, NALU_TYPE_DPC = 4, NALU_TYPE_IDR = 5, NALU_TYPE_SEI = 6, NALU_TYPE_SPS = 7, NALU_TYPE_PPS = 8, NALU_TYPE_AUD = 9, NALU_TYPE_EOSEQ = 10, NALU_TYPE_EOSTREAM = 11, NALU_TYPE_FILL = 12, NALU_TYPE_13 = 13, NALU_TYPE_14 = 14, NALU_TYPE_15 = 15, NALU_TYPE_16 = 16, NALU_TYPE_17 = 17, NALU_TYPE_18 = 18, NALU_TYPE_19 = 19, NALU_TYPE_20 = 20, NALU_TYPE_21 = 21, NALU_TYPE_22 = 22, NALU_TYPE_23 = 23, NALU_TYPE_STAP_A = 24, NALU_TYPE_STAP_B = 25, NALU_TYPE_MTAP16 = 26, NALU_TYPE_MTAP24 = 27, NALU_TYPE_FU_A = 28, NALU_TYPE_FU_B = 29, }

public static Dictionary<NALUnitType, string> GetDescription { get; } = new Dictionary<NALUnitType, string>() { { NALUnitType.NALU_TYPE_UNKNOWN, "Unspecified (non-VCL)" }, { NALUnitType.NALU_TYPE_SLICE, "Coded slice of a non-IDR picture (VCL) [P-frame]" }, { NALUnitType.NALU_TYPE_DPA, "Coded slice data partition A (VCL)" }, { NALUnitType.NALU_TYPE_DPB, "Coded slice data partition B (VCL)" }, { NALUnitType.NALU_TYPE_DPC, "Coded slice data partition C (VCL)" }, { NALUnitType.NALU_TYPE_IDR, "Coded slice of an IDR picture (VCL) [I-frame]" }, { NALUnitType.NALU_TYPE_SEI, "Supplemental Enhancement Information [SEI] (non-VCL)" }, { NALUnitType.NALU_TYPE_SPS, "Sequence Parameter Set [SPS] (non-VCL)" }, { NALUnitType.NALU_TYPE_PPS, "Picture Parameter Set [PPS] (non-VCL)" }, { NALUnitType.NALU_TYPE_AUD, "Access Unit Delimiter [AUD] (non-VCL)" }, { NALUnitType.NALU_TYPE_EOSEQ, "End of Sequence (non-VCL)" }, { NALUnitType.NALU_TYPE_EOSTREAM, "End of Stream (non-VCL)" }, { NALUnitType.NALU_TYPE_FILL, "Filler data (non-VCL)" }, { NALUnitType.NALU_TYPE_13, "Sequence Parameter Set Extension (non-VCL)" }, { NALUnitType.NALU_TYPE_14, "Prefix NAL Unit (non-VCL)" }, { NALUnitType.NALU_TYPE_15, "Subset Sequence Parameter Set (non-VCL)" }, { NALUnitType.NALU_TYPE_16, "Reserved (non-VCL)" }, { NALUnitType.NALU_TYPE_17, "Reserved (non-VCL)" }, { NALUnitType.NALU_TYPE_18, "Reserved (non-VCL)" }, { NALUnitType.NALU_TYPE_19, "Coded slice of an auxiliary coded picture without partitioning (non-VCL)" }, { NALUnitType.NALU_TYPE_20, "Coded Slice Extension (non-VCL)" }, { NALUnitType.NALU_TYPE_21, "Coded Slice Extension for Depth View Components (non-VCL)" }, { NALUnitType.NALU_TYPE_22, "Reserved (non-VCL)" }, { NALUnitType.NALU_TYPE_23, "Reserved (non-VCL)" }, { NALUnitType.NALU_TYPE_STAP_A, "STAP-A Single-time Aggregation Packet (non-VCL)" }, { NALUnitType.NALU_TYPE_STAP_B, "STAP-B Single-time Aggregation Packet (non-VCL)" }, { NALUnitType.NALU_TYPE_MTAP16, "MTAP16 Multi-time Aggregation Packet (non-VCL)" }, { NALUnitType.NALU_TYPE_MTAP24, "MTAP24 Multi-time Aggregation Packet (non-VCL)" }, { NALUnitType.NALU_TYPE_FU_A, "FU-A Fragmentation Unit (non-VCL)" }, { NALUnitType.NALU_TYPE_FU_B, "FU-B Fragmentation Unit (non-VCL)" } };

public void Decode(byte[] frame) { uint frameSize = (uint)frame.Length; SendDebugMessage($"Received frame of {frameSize} bytes."); // I know how my H.264 data source's NALUs looks like so I know start code index is always 0. // if you don't know where it starts, you can use a for loop similar to how I find the 2nd and 3rd start codes uint firstStartCodeIndex = 0; uint secondStartCodeIndex = 0; uint thirdStartCodeIndex = 0; // length of NALU start code in bytes. // for h.264 the start code is 4 bytes and looks like this: 0 x 00 00 00 01 const uint naluHeaderLength = 4; // check the first 8bits after the NALU start code, mask out bits 0-2, the NALU type ID is in bits 3-7 uint startNaluIndex = firstStartCodeIndex + naluHeaderLength; byte startByte = frame[startNaluIndex]; int naluTypeId = startByte & 0x1F; // 0001 1111 NALUnitType naluType = (NALUnitType)naluTypeId; SendDebugMessage($"1st Start Code Index: {firstStartCodeIndex}"); SendDebugMessage($"1st NALU Type: '{NALUnit.GetDescription[naluType]}' ({(int)naluType})"); // bits 1 and 2 are the NRI int nalRefIdc = startByte & 0x60; // 0110 0000 SendDebugMessage($"1st NRI (NAL Ref Idc): {nalRefIdc}"); // IF the very first NALU type is an IDR -> handle it like a slice frame (-> re-cast it to type 1 [Slice]) if (naluType == NALUnitType.NALU_TYPE_IDR) { naluType = NALUnitType.NALU_TYPE_SLICE; } // if we haven't already set up our format description with our SPS PPS parameters, // we can't process any frames except type 7 that has our parameters if (naluType != NALUnitType.NALU_TYPE_SPS && this.FormatDescription == null) { SendDebugMessage("Video Error: Frame is not an I-Frame and format description is null."); return; } // NALU type 7 is the SPS parameter NALU if (naluType == NALUnitType.NALU_TYPE_SPS) { // find where the second PPS 4byte start code begins (0x00 00 00 01) // from which we also get the length of the first SPS code for (uint i = firstStartCodeIndex + naluHeaderLength; i < firstStartCodeIndex + 40; i++) { if (frame[i] == 0x00 && frame[i + 1] == 0x00 && frame[i + 2] == 0x00 && frame[i + 3] == 0x01) { secondStartCodeIndex = i; this.SpsSize = secondStartCodeIndex; // includes the header in the size SendDebugMessage($"2nd Start Code Index: {secondStartCodeIndex} -> SPS Size: {this.SpsSize}"); break; } } // find what the second NALU type is startByte = frame[secondStartCodeIndex + naluHeaderLength]; naluType = (NALUnitType)(startByte & 0x1F); SendDebugMessage($"2nd NALU Type: '{NALUnit.GetDescription[naluType]}' ({(int)naluType})"); // bits 1 and 2 are the NRI nalRefIdc = startByte & 0x60; // 0110 0000 SendDebugMessage($"2nd NRI (NAL Ref Idc): {nalRefIdc}"); } // type 8 is the PPS parameter NALU if (naluType == NALUnitType.NALU_TYPE_PPS) { // find where the NALU after this one starts so we know how long the PPS parameter is for (uint i = this.SpsSize + naluHeaderLength; i < this.SpsSize + 30; i++) { if (frame[i] == 0x00 && frame[i + 1] == 0x00 && frame[i + 2] == 0x00 && frame[i + 3] == 0x01) { thirdStartCodeIndex = i; this.PpsSize = thirdStartCodeIndex - this.SpsSize; SendDebugMessage($"3rd Start Code Index: {thirdStartCodeIndex} -> PPS Size: {this.PpsSize}"); break; } } // allocate enough data to fit the SPS and PPS parameters into our data objects. // VTD doesn't want you to include the start code header (4 bytes long) so we subtract 4 here byte[] sps = new byte[this.SpsSize - naluHeaderLength]; byte[] pps = new byte[this.PpsSize - naluHeaderLength]; // copy in the actual sps and pps values, again ignoring the 4 byte header Array.Copy(frame, naluHeaderLength, sps, 0, sps.Length); Array.Copy(frame, this.SpsSize + naluHeaderLength, pps,0, pps.Length); // create video format description List<byte[]> parameterSets = new List<byte[]> { sps, pps }; this.FormatDescription = CMVideoFormatDescription.FromH264ParameterSets(parameterSets, (int)naluHeaderLength, out CMFormatDescriptionError formatDescriptionError); SendDebugMessage($"Creation of CMVideoFormatDescription: {((formatDescriptionError == CMFormatDescriptionError.None)? $"Successful! (Video Codec = {this.FormatDescription.VideoCodecType}, Dimension = {this.FormatDescription.Dimensions.Height} x {this.FormatDescription.Dimensions.Width}px, Type = {this.FormatDescription.MediaType})" : $"Failed ({formatDescriptionError})")}"); // re-create the decompression session whenever new PPS data was received this.DecompressionSession = this.CreateDecompressionSession(this.FormatDescription); // now lets handle the IDR frame that (should) come after the parameter sets // I say "should" because that's how I expect my H264 stream to work, YMMV startByte = frame[thirdStartCodeIndex + naluHeaderLength]; naluType = (NALUnitType)(startByte & 0x1F); SendDebugMessage($"3rd NALU Type: '{NALUnit.GetDescription[naluType]}' ({(int)naluType})"); // bits 1 and 2 are the NRI nalRefIdc = startByte & 0x60; // 0110 0000 SendDebugMessage($"3rd NRI (NAL Ref Idc): {nalRefIdc}"); } // type 5 is an IDR frame NALU. // The SPS and PPS NALUs should always be followed by an IDR (or IFrame) NALU, as far as I know. if (naluType == NALUnitType.NALU_TYPE_IDR || naluType == NALUnitType.NALU_TYPE_SLICE) { // find the offset or where IDR frame NALU begins (after the SPS and PPS NALUs end) uint offset = (naluType == NALUnitType.NALU_TYPE_SLICE)? 0 : this.SpsSize + this.PpsSize; uint blockLength = frameSize - offset; SendDebugMessage($"Block Length (NALU type '{naluType}'): {blockLength}"); var blockData = new byte[blockLength]; Array.Copy(frame, offset, blockData, 0, blockLength); // write the size of the block length (IDR picture data) at the beginning of the IDR block. // this means we replace the start code header (0 x 00 00 00 01) of the IDR NALU with the block size. // AVCC format requires that you do this. // This next block is very specific to my application and wasn't in Olivia's example: // For my stream is encoded by NVIDEA NVEC I had to deal with additional 3-byte start codes within my IDR/SLICE frame. // These start codes must be replaced by 4 byte start codes adding the block length as big endian. // ====================================================================================================================================================== // find all 3 byte start code indices (0x00 00 01) within the block data (including the first 4 bytes of NALU header) uint startCodeLength = 3; List<uint> foundStartCodeIndices = new List<uint>(); for (uint i = 0; i < blockData.Length; i++) { if (blockData[i] == 0x00 && blockData[i + 1] == 0x00 && blockData[i + 2] == 0x01) { foundStartCodeIndices.Add(i); byte naluByte = blockData[i + startCodeLength]; var tmpNaluType = (NALUnitType)(naluByte & 0x1F); SendDebugMessage($"3-Byte Start Code (0x000001) found at index: {i} (NALU type {(int)tmpNaluType} '{NALUnit.GetDescription[tmpNaluType]}'"); } } // determine the byte length of each slice uint totalLength = 0; List<uint> sliceLengths = new List<uint>(); for (int i = 0; i < foundStartCodeIndices.Count; i++) { // for convenience only bool isLastValue = (i == foundStartCodeIndices.Count-1); // start-index to bit right after the start code uint startIndex = foundStartCodeIndices[i] + startCodeLength; // set end-index to bit right before beginning of next start code or end of frame uint endIndex = isLastValue ? (uint) blockData.Length : foundStartCodeIndices[i + 1]; // now determine slice length including NALU header uint sliceLength = (endIndex - startIndex) + naluHeaderLength; // add length to list sliceLengths.Add(sliceLength); // sum up total length of all slices (including NALU header) totalLength += sliceLength; } // Arrange slices like this: // [4byte slice1 size][slice1 data][4byte slice2 size][slice2 data]...[4byte slice4 size][slice4 data] // Replace 3-Byte Start Code with 4-Byte start code, then replace the 4-Byte start codes with the length of the following data block (big endian). // https://stackoverflow.com/questions/65576349/nvidia-nvenc-media-foundation-encoded-h-264-frames-not-decoded-properly-using byte[] finalBuffer = new byte[totalLength]; uint destinationIndex = 0; // create a buffer for each slice and append it to the final block buffer for (int i = 0; i < sliceLengths.Count; i++) { // create byte vector of size of current slice, add additional bytes for NALU start code length byte[] sliceData = new byte[sliceLengths[i]]; // now copy the data of current slice into the byte vector, // start reading data after the 3-byte start code // start writing data after NALU start code, uint sourceIndex = foundStartCodeIndices[i] + startCodeLength; long dataLength = sliceLengths[i] - naluHeaderLength; Array.Copy(blockData, sourceIndex, sliceData, naluHeaderLength, dataLength); // replace the NALU start code with data length as big endian byte[] sliceLengthInBytes = BitConverter.GetBytes(sliceLengths[i] - naluHeaderLength); Array.Reverse(sliceLengthInBytes); Array.Copy(sliceLengthInBytes, 0, sliceData, 0, naluHeaderLength); // add the slice data to final buffer Array.Copy(sliceData, 0, finalBuffer, destinationIndex, sliceData.Length); destinationIndex += sliceLengths[i]; } // ====================================================================================================================================================== // from here we are back on track with Olivia's code: // now create block buffer from final byte[] buffer CMBlockBufferFlags flags = CMBlockBufferFlags.AssureMemoryNow | CMBlockBufferFlags.AlwaysCopyData; var finalBlockBuffer = CMBlockBuffer.FromMemoryBlock(finalBuffer, 0, flags, out CMBlockBufferError blockBufferError); SendDebugMessage($"Creation of Final Block Buffer: {(blockBufferError == CMBlockBufferError.None ? "Successful!" : $"Failed ({blockBufferError})")}"); if (blockBufferError != CMBlockBufferError.None) return; // now create the sample buffer nuint[] sampleSizeArray = new nuint[] { totalLength }; CMSampleBuffer sampleBuffer = CMSampleBuffer.CreateReady(finalBlockBuffer, this.FormatDescription, 1, null, sampleSizeArray, out CMSampleBufferError sampleBufferError); SendDebugMessage($"Creation of Final Sample Buffer: {(sampleBufferError == CMSampleBufferError.None ? "Successful!" : $"Failed ({sampleBufferError})")}"); if (sampleBufferError != CMSampleBufferError.None) return; // if sample buffer was successfully created -> pass sample to decoder // set sample attachments CMSampleBufferAttachmentSettings[] attachments = sampleBuffer.GetSampleAttachments(true); var attachmentSetting = attachments[0]; attachmentSetting.DisplayImmediately = true; // enable async decoding VTDecodeFrameFlags decodeFrameFlags = VTDecodeFrameFlags.EnableAsynchronousDecompression; // add time stamp var currentTime = DateTime.Now; var currentTimePtr = new IntPtr(currentTime.Ticks); // send the sample buffer to a VTDecompressionSession var result = DecompressionSession.DecodeFrame(sampleBuffer, decodeFrameFlags, currentTimePtr, out VTDecodeInfoFlags decodeInfoFlags); if (result == VTStatus.Ok) { SendDebugMessage($"Executing DecodeFrame(..): Successful! (Info: {decodeInfoFlags})"); } else { NSError error = new NSError(CFErrorDomain.OSStatus, (int)result); SendDebugMessage($"Executing DecodeFrame(..): Failed ({(VtStatusEx)result} [0x{(int)result:X8}] - {error}) - Info: {decodeInfoFlags}"); } } }

private VTDecompressionSession CreateDecompressionSession(CMVideoFormatDescription formatDescription) { VTDecompressionSession.VTDecompressionOutputCallback callBackRecord = this.DecompressionSessionDecodeFrameCallback; VTVideoDecoderSpecification decoderSpecification = new VTVideoDecoderSpecification { EnableHardwareAcceleratedVideoDecoder = true }; CVPixelBufferAttributes destinationImageBufferAttributes = new CVPixelBufferAttributes(); try { var decompressionSession = VTDecompressionSession.Create(callBackRecord, formatDescription, decoderSpecification, destinationImageBufferAttributes); SendDebugMessage("Video Decompression Session Creation: Successful!"); return decompressionSession; } catch (Exception e) { SendDebugMessage($"Video Decompression Session Creation: Failed ({e.Message})"); return null; } }

private void DecompressionSessionDecodeFrameCallback( IntPtr sourceFrame, VTStatus status, VTDecodeInfoFlags infoFlags, CVImageBuffer imageBuffer, CMTime presentationTimeStamp, CMTime presentationDuration) { if (status != VTStatus.Ok) { NSError error = new NSError(CFErrorDomain.OSStatus, (int)status); SendDebugMessage($"Decompression: Failed ({(VtStatusEx)status} [0x{(int)status:X8}] - {error})"); } else { SendDebugMessage("Decompression: Successful!"); try { var image = GetImageFromImageBuffer(imageBuffer); // In my application I do not use a display layer but send the decoded image directly by an event: ImageSource imgSource = ImageSource.FromStream(() => image.AsPNG().AsStream()); OnImageFrameReady?.Invoke(imgSource); } catch (Exception e) { SendDebugMessage(e.ToString()); } } }

private UIImage GetImageFromImageBuffer(CVImageBuffer imageBuffer) { if (!(imageBuffer is CVPixelBuffer pixelBuffer)) return null; var ciImage = CIImage.FromImageBuffer(pixelBuffer); var temporaryContext = new CIContext(); var rect = CGRect.FromLTRB(0, 0, pixelBuffer.Width, pixelBuffer.Height); CGImage cgImage = temporaryContext.CreateCGImage(ciImage, rect); if (cgImage == null) return null; var uiImage = UIImage.FromImage(cgImage); cgImage.Dispose(); return uiImage; }

private void SendDebugMessage(string msg) { Debug.WriteLine($"VideoDecoder (iOS) - {msg}"); }

using System; using System.Collections.Generic; using System.Diagnostics; using System.IO; using System.Net; using AvcLibrary; using CoreFoundation; using CoreGraphics; using CoreImage; using CoreMedia; using CoreVideo; using Foundation; using UIKit; using VideoToolbox; using Xamarin.Forms;