Go to the content


 Go back to Planet Debian
Full screen Suggest an article

Steinar H. Gunderson: Introducing Narabu, part 1: Introduction

October 18, 2017 8:01 , by Planet Debian - 0no comments yet | No one following this article yet.
Viewed 22 times

Narabu is a new intraframe video codec, from the Japanese verb narabu (並ぶ), which means to line up or be parallel.

Let me first state straight up that Narabu isn't where I hoped it would be at this stage; the encoder isn't fast enough, and I have to turn my attention to other projects for a while. Nevertheless, I think it is interesting as a research project in its own right, and I don't think it should stop me from trying to write up a small series. :-)

In the spirit of Leslie Lamport, I'll be starting off with describing what problem I was trying to solve, which will hopefully make the design decisions a lot clearer. Subsequent posts will dive into background information and then finally Narabu itself.

I want a codec to send signals between different instances of Nageru, my free software video mixer, and also longer-term between other software, such as recording or playout. The reason is pretty obvious for any sort of complex configuration; if you are doing e.g. both a stream mix and a bigscreen mix, they will naturally want to use many of the same sources, and sharing them over a single GigE connection might be easier than getting SDI repeaters/splitters, especially when you have a lot of them. (Also, in some cases, you might want to share synthetic signals, such as graphics, that never existed on SDI in the first place.)

This naturally leads to the following demands:

  • Intraframe-only; every frame must be compressed independently. (This isn't strictly needed for all use cases, but is much more flexible, and common in any kind of broadcast.)
  • Need to handle 4:2:2 color, since that's what most capture sources give out, and we want to transmit the raw signals as much as possible. Fairly flexible in input resolution (divisible by 16 is okay, limited to only a given set of resolutions is not).
  • 720p60 video in less than one CPU core (ideally much less); the CPU can already pretty be busy with other work, like x264 encoding of the finished stream, and sharing four more inputs at the same time is pretty common. What matters is mostly a single encode+decode cycle, so fast decode doesn't help if the encoder is too slow.
  • Target bitrates around 100-150 Mbit/sec, at similar quality to MJPEG (ie. 45 dB PSNR for most content). Multiple signals should fit into a normal GigE link at the same time, although getting it to work over 802.11 isn't a big priority.
  • Both encoder and decoder robust to corrupted or malicious data; a dropped frame is fine, a crash is not.
  • Does not depend on uncommon or expensive hardware, or GPUs from a specific manufacturer.
  • GPLv3-compatible implementation. I already link to GPLv3 software, so I don't have a choice here; I cannot link to something non-free (and no antics with dlopen(), please).

There's a bunch of intraframe formats around. The most obvious thing to do would be to use Intel Quick Sync to produce H.264 (intraframe H.264 blows basically everything else out of the sky in terms of PSNR, and QSV hardly uses any power at all), but sadly, that's limited to 4:2:0. I thought about encoding the three color planes as three different monochrome streams, but monochrome is not supported either.

Then there's a host of software solutions. x264 can do 4:2:2, but even on ultrafast, it gobbles up an entire core or more at 720p60 at the target bitrates (mostly in entropy coding). FFmpeg has implementations of all kinds of other codecs, like DNxHD, CineForm, MJPEG and so on, but they all use much more CPU for encoding than the target. NDI would seem to fit the bill exactly, but fails the licensing check, and also isn't robust to corrupted or malicious data. (That, and their claims about video quality are dramatically overblown for any kinds of real video data I've tried.)

So, sadly, this leaves only really one choice, namely rolling my own. I quickly figured I couldn't beat the world on CPU video codec speed, and didn't really want to spend my life optimizing AVX2 DCTs anyway, so again, the GPU will come to our rescue in the form of compute shaders. (There are some other GPU codecs out there, but all that I've found depend on CUDA, so they are NVIDIA-only, which I'm not prepared to commit to.) Of course, the GPU is quite busy in Nageru, but if one can make an efficient enough codec that one stream can work at only 5% or so of the GPU (meaning 1200 fps or so), it wouldn't really make a dent. (As a spoiler, the current Narabu encoder isn't there for 720p60 on my GTX 950, but the decoder is.)

In the next post, we'll look a bit at the GPU programming model, and what it means for how our codec needs to look like on the design level.

Source: http://blog.sesse.net/blog/tech/2017-10-18-09-25_introducing_narabu_part_1_introduction.html

0no comments yet

Post a comment

The fields are mandatory.

If you are a registered user, you can login and be automatically recognized.