We replaced H.264 streaming with JPEG screenshots (and it worked better)

blog.helix.ml

523 points by quesobob 3 months ago


qbow883 - 3 months ago

Setting aside the various formatting problems and the LLM writing style, this just seems all kinds of wrong throughout.

> “Just lower the bitrate,” you say. Great idea. Now it’s 10Mbps of blocky garbage that’s still 30 seconds behind.

10Mbps should be way more than enough for a mostly static image with some scrolling text. (And 40Mbps are ridiculous.) This is very likely to be caused by bad encoding settings and/or a bad encoder.

> “What if we only send keyframes?” The post goes on to explain how this does not work because some other component needs to see P-frames. If that is the case, just configure your encoder to have very short keyframe intervals.

> And the size! A 70% quality JPEG of a 1080p desktop is like 100-150KB. A single H.264 keyframe is 200-500KB.

A single H.264 keyframe can be whatever size you want, *depending on how you configure your encoder*, which was apparently never seriously attempted. Why are we badly reinventing MJPEG instead of configuring the tools we already have? Lower the bitrate and keyint, use a better encoder for higher quality, lower the frame rate if you need to. (If 10 fps JPEGs are acceptable, surely you should try 10 fps H.264 too?)

But all in all the main problem seems to be squeezing an entire video stream through a single TCP connection. There are plenty of existing solutions for this. For example, this article never mentions DASH, which is made for these exact purposes.

mikepavone - 3 months ago

> When the network is bad, you get... fewer JPEGs. That’s it. The ones that arrive are perfect.

This would make sense... if they were using UDP, but they are using TCP. All the JPEGs they send will get there eventually (unless the connection drops). JPEG does not fix your buffering and congestion control problems. What presumably happened here is the way they implemented their JPEG screenshots, they have some mechanism that minimizes the number of frames that are in-flight. This is not some inherent property of JPEG though.

> And the size! A 70% quality JPEG of a 1080p desktop is like 100-150KB. A single H.264 keyframe is 200-500KB. We’re sending LESS data per frame AND getting better reliability.

h.264 has better coding efficiency than JPEG. For a given target size, you should be able to get better quality from an h.264 IDR frame than a JPEG. There is no fixed size to an IDR frame.

Ultimately, the problem here is a lack of bandwidth estimation (apart from the sort of binary "good network"/"cafe mode" thing they ultimately implemented). To be fair, this is difficult to do and being stuck with TCP makes it a bit more difficult. Still, you can do an initial bandwidth probe and then look for increasing transmission latency as a sign that the network is congested. Back off your bitrate (and if needed reduce frame rate to maintain sufficient quality) until transmission latency starts to decrease again.

WebRTC will do this for you if you can use it, which actually suggests a different solution to this problem: use websockets for dumb corporate network firewall rules and just use WebRTC everything else