Adetunji Dahunsi

Video streaming algorithms with Kotlin: H.264 NAL unit parsing

How Kotlin made writing a video streaming algorithm a little easier

TJ Dahunsi

Jul 19 2023 ยท 8 min read

Categories:
android
kotlin

Rotating video gif via Erik on Dribble.

This post is part of a series of video streaming algorithms I've implemented in Kotlin.

I've come a long way from my first impression of Kotlin 4 years ago. I'm in love with the language and its features, from extension functions to coroutines and hopefully soon, context receivers. It improves the developer experience so much that I jump at any opportunity to think differently in Kotlin.

3 years ago I led the integration for the pivot to video streaming at GameChanger during the pandemic. Apart from helping select the streaming vendor (at the time Amazon's IVS was nascent, so we ended up choosing Mux as the streaming platform), I had to choose an RTMP library for video streaming on Android. I evaluated a few:

The first 2 were heavily based on Android Camera APIs, not the newer Camera2 APIs. Not quite satisfied with them and wanting a little more control over the streaming pipeline, I decided to build our own Camera2 based RTMP streaming stack. I'm certain the streaming stack has changed in the times since I built it, but here's a bit about the h.264 NAL unit extraction pipeline that powered version 1.0 of the GameChanger live streaming experience.

The h.264 bit stream

The h.264 bitstream consists of a sequence of bytes chunked in descriptive pieces called NAL (Network Abstraction Layer) units. This bit stream can come in a variety of containers; they may be in a ByteArray, or in the case of Android's MediaCodec, they come in ByteBuffer instances. Regardless of the enclosing container, they all offer constant random access to a byte at an index. The first piece of abstraction therefore is a typealias for ByteRandomAccess:

1typealias ByteRandomAccess<T> = (container: T, index: Int) -> Byte

Each NAL unit represents different parameters of the encoded stream from picture size to full blown pictures. Examples of these NAL units include:

  • SPS and PPS: Sequence parameter set and pictures parameter set contain metadata relevant for decoding the pictures in the bit stream; for example, picture resolution.
  • IDR frame: Instantaneous decoder refresh frames represent an image in the video stream that can be decoded independently. If you ever wondered when you seek a video it always starts at a particular frame, this is why. The decoder has a seed frame that it applies diffs to for consecutive frames; this is how the video is compressed.

These NAL units can be represented in an h.264 bitstream in in two formats:

  • The Annex B format
  • The AVCC format

The Annex B format will be the focus of this blog post for parsing the raw camera output from an Android phone. The AVCC format will be covered in the next post for FLV muxing and sending it to the server in an RTMP stream.

The Annex B format

The Annex B format bitstream uses start code prefixes (a sequence of distinct bytes that are never part of a NAL unit) to separate each NAL unit. The Android MediaCodec writes to a ByteBuffer in this format. Therefore to decode the bitstream, one first needs to detect whether a byte is part of a NAL unit or a start code.

The objective is the following API:

1/** 2 * A callback for Nal units in a range 3 */ 4typealias NalAccessor = (startCode: NalStartCode, start: Int, size: Int) -> Unit 5 6/** 7 * Invokes [action] on each Annex B formatted NAL unit in this [ByteRandomAccess], 8 * with the kind of [NalStartCode] it is prefixed by, and the range of the Unit. 9 * Note that the start codes are included in this range. 10 * 11 */ 12inline fun <T> ByteRandomAccess<T>.onEachNalUnit( 13 container: T, 14 max: Int, 15 action: NalAccessor 16)

That is, an extension function on a ByteRandomAccess<T> is defined that is invoked on each NAL unit in the bitstream. Let's define this algorithm in Kotlin.

Determining the NAL start code

In annex B formatted bit streams, the NAL start codes may either be:

  • A 3 byte sequence of [0, 0, 1]
  • A 4 byte sequence of [0, 0, 0, 1]

This works because this sequence of bytes is forbidden in any NAL unit without being properly escaped. Given a ByteRandomAccess<T>, determining if an index in the bitstream represents a start code can be performed by:

1enum class NalStartCode(val bytes: ByteArray) { 2 None(bytes = byteArrayOf()), 3 Three(bytes = byteArrayOf(0, 0, 1)), 4 Four(bytes = byteArrayOf(0, 0, 0, 1)) 5} 6 7/** 8 * Tests whether a NAL start code exists at a given index for the Annex B 9 * formatted [ByteRandomAccess]. 10 * 11 * @param @isNalStartCode The data. 12 * @param index The index to test. 13 * @return Whether there exists a start code that begins at `index`. 14 */ 15fun <T> ByteRandomAccess<T>.nalStartCode(container: T, index: Int, max: Int): NalStartCode { 16 if (max - index <= NalStartCode.Three.size) return NalStartCode.None 17 var startCode: NalStartCode = NalStartCode.Three 18 19 for (j in NalStartCode.Three.indices) { 20 if (this(container, index + j) == NalStartCode.Three.bytes[j]) continue 21 22 val matchesFour = 23 this(container, index + j) == NalStartCode.Four.bytes[j] 24 && this(container, index + j + 1) == NalStartCode.Four.bytes[j + 1] 25 26 if (!matchesFour) return NalStartCode.None 27 28 startCode = NalStartCode.Four 29 continue 30 } 31 return startCode 32}

Determining the end of a NAL unit

After determining the delimiting start code in the bitstream, the next step is extracting the NAL unit. This is done by finding the next delimiting start code in the bit stream. This can be done by testing the bit stream for start codes after the index of the previous start code. After a consecutive start code is found, the NAL unit is byte sequence delimited by the start codes:

1/** 2 * Finds the next occurrence of a NAL start code from a given index 3 * for an Annex B formatted [ByteRandomAccess]. 4 * 5 * @param @findNalStartCode The data in which to search. 6 * @param index The first index to test. 7 * @return The index of the first byte of the found start code, or [INDEX_UNSET]. 8 */ 9fun <T> ByteRandomAccess<T>.nalStartCodeIndex(container: T, index: Int, max: Int): Int { 10 val endIndex = max - NalStartCode.Three.size 11 for (i in index..endIndex) when (nalStartCode(container, i, max)) { 12 NalStartCode.Three, NalStartCode.Four -> return i 13 NalStartCode.None -> Unit 14 } 15 return INDEX_UNSET 16}

Accessing the NAL units

Now that consecutive NAL start codes and the NAL unit they delimit can be found, the final step is putting together the onEachNalUnit extension defined above:

1/** 2 * Invokes [action] on each Annex B formatted NAL unit in this [ByteRandomAccess], 3 * with the kind of [NalStartCode] it is prefixed by, and the range of the Unit. 4 * Note that the start codes are included in this range. 5 * 6 */ 7inline fun <T> ByteRandomAccess<T>.onEachNalUnit(container: T, max: Int, action: NalAccessor) { 8 var startIndex = 0 9 var startCode = nalStartCode(container, startIndex, max) 10 11 if (startCode == NalStartCode.None) return 12 13 while (startIndex != INDEX_UNSET) { 14 val nextStartIndex = nalStartCodeIndex(container, startIndex + startCode.size, max) 15 16 val endIndex = if (nextStartIndex == INDEX_UNSET) max else nextStartIndex 17 val size = endIndex - startIndex 18 19 action(startCode, startIndex, size) 20 startIndex = nextStartIndex 21 startCode = 22 if (nextStartIndex == INDEX_UNSET) NalStartCode.None 23 else nalStartCode(container, nextStartIndex, max) // O(3) or O(4)... effectively O(1) 24 } 25}

Consuming NAL Units

With the above, extracting NAL units from an Android MediaCodec is straightforward. The following snippet is edited for brevity and shows part of the video streaming pipeline:

MediaCodec callback

A ByteRandomAccess for a ByteBuffer is created using a static method reference on the ByteBuffer class:

1val byteBufferRandomAcces: ByteRandomAccess<ByteBuffer> = ByteBuffer::get

Next, a MediaCodec.Callback is created to be notified when a ByteBuffer is filled:

1private class VideoStreamingCallBack( 2 ... 3 private val writer: BufferConsumer 4) : MediaCodec.Callback() { 5 ... 6 override fun onOutputBufferAvailable(codec: MediaCodec, index: Int, info: BufferInfo) { 7 codec.getOutputBuffer(index)?.let { byteBuffer -> 8 val endOfStream: Int = info.flags and MediaCodec.BUFFER_FLAG_END_OF_STREAM 9 if (endOfStream == 0) writer(byteBuffer, info) 10 } 11 codec.releaseOutputBuffer(index, false) 12 } 13}

After the buffer is filled, it's passed to the FLVMuxer to insert the video frame after which it is streamed via RTMP.

FLVMuxer

The NAL units in the ByteBuffer instances from the Android media codec class are Annex B formatted. However in FLV, the NAL units are AVCC formatted. Therefore, the FLVMuxer needs to extract each NAL unit, and re-encode it in the AVCC format.

1class FlvMuxer<T>( 2 ... 3 private val accessor: ByteRandomAccess<T> 4) : Restartable, FrameModifier { 5 6 /** 7 * Muxes an H.264 video frame. The buffer passed in must represent a whole frame. 8 */ 9 fun muxVideo(container: T, accessLength: Int) { 10 if (!started) return 11 12 var frameType: VideoFrameType = VideoFrameType.InterFrame 13 val frame = videoPool.get() 14 val output = frame.data 15 var length = 0 16 17 accessor.onEachNalUnit(container, accessLength) { startCode, start, size -> 18 val nalStart = start + startCode.size 19 val nalSize = size - startCode.size 20 21 when (val nalUnitType = accessor(container, nalStart).nalUnitType) { 22 // Write sequence parameter set into the output frame 23 NalUnitType.SPS -> ... 24 // Write picture parameter set into the output frame 25 NalUnitType.PPS -> ... 26 // Write IDR frames, inter frames and other valid NAL units 27 // into the output frame 28 else -> ... 29 } 30 } 31}

The next post in this series will expand on the FLVMuxer and on its Kotlin specific algorithms.

Wrap up

The algorithm above used the following kotlin features:

  • Type aliases: Allowed for better communication of the context existing types like ByteRandomAccess and NalAccessor.
  • Extension functions: Allowed for easier to read method call sites for onEachNalUnit.
  • Inline functions: Allowed for defining onEachNalUnit on arbitrary ByteArray containers with no object allocation overhead.
  • when statements: For more readable and exhaustive control flow and pattern matching relative to if/else or switch statements.

The algorithm can still be written in java without these, however it is so much more concise in Kotlin. Consider the equivalent Java findNalUnit method that the algorithm above was adapted from. Functionally equivalent, a bit easier on the eyes.

This along with other initiatives like Kotlin Multiplatform make me excited for the future of Kotlin for software development. I can't wait to build my next Kotlin thing.

11