How problematic is resampling audio from 44.1 to 48 kHz?
kevinboone.me41 points by brewmarche 4 days ago
41 points by brewmarche 4 days ago
Changing the sample rate of audio only affects the frequency range. All audio signal is _perfectly_ represented in a digital form.
I am ashamed to admit this took me a long time to properly understand. For further reading I'd recommend:
https://people.xiph.org/~xiphmont/demo/neil-young.html https://www.youtube.com/watch?v=cIQ9IXSUzuM
Thanks for the links very interesting to read.
I buy loads of DJ music on Bandcamp and "downsample" (I think the term is) to 16bit if they only offer 24bit for smaller size and wider compatability.
> All audio signal is _perfectly_ represented in a digital form.
That is not true... A 22kHz signal only has 2 data points for a sinusoidal waveform. Those 2 points could be anywhere I.e you could read 0 both times the waveform is sampled.... See Nyquist theorem.
From memory changing the sample rate can cause other issues with sample aliasing sue to the algorithms used...
The xiphmont link is pretty good. Reminded me of the nearly-useless (and growing more so every day) fact that incandescent bulbs not only make some noise, but the noise increases when the bulb is near end of life. I know this from working in an anechoic chamber lit by bare bulbs hanging by cords in the chamber. We would do calibration checks at the start of the day, and sometimes a recording of a silent chamber would be louder than normal and then we'd go in and shut the door and try to figure out which bulb was the loud one.
Does the noise increase when the bulb is near end of life, or does lifespan decrease dramatically when the noise is increased ?
I imagine the noise increases when one of the supports fail, and the filament starts oscillating leading to mechanical stress and failure
(not that it makes a difference, just thinking out loud)
There’s one thing that bothers me about this. Sure, PCM sampling is a lossless representation of the low frequency portions of a continuous signal. But it is not a latency-free representation. To recover a continuous signal covering the low frequencies (up to 20kHz) from PCM pulses at a sampling frequency f_s (f_s >= 40kHz), you turn each pulse into the appropriate kernel (sinc works and is ideal in a sense, but you probably want to low-pass filter the result as well), and that gives you the decoded signal. But it’s not causal! To recover the signal at time t, you need some pulses from times beyond t. If you’re using the sinc kernel, you need quite a lot of lookahead, because sinc decays very slowly and you don’t want to cut it off until it’s decayed enough.
So if you want to take a continuous (analog) signal, digitize it, then convert back to analog, you are fundamentally adding latency. And if you want to do DSP operations on a digital signal, you also generally add some latency. And the higher the sampling rate, the lower the latency you can achieve, because you can use more compact approximations of sinc that are still good enough below 20kHz.
None of this matters, at least in principle, for audio streaming over the Internet or for a stored library — there is a ton of latency, and up to a few ms extra is irrelevant as long as it’s managed correctly when at synchronizing different devices. But for live sound, or for a potentially long chain of DSP effects, I can easily imagine this making a difference, especially at 44.1ksps.
I don’t work in audio or DSP, and I haven’t extensively experimented. And I haven’t run the numbers. But I suspect that a couple passes of DSP effects or digitization at 44.1ksps may become audible to ordinary humans in terms of added latency if there are multiple different speakers with different effects or if A/V sync is carelessly involved.
This is all true, but it is also true for most _other_ filters and effects, too; you always get some added delay. You generally don't have a lot of conversions in your chain, and they are more on the order of 16 samples and such, so the extra delay from chunking/buffering (you never really process sample-by-sample from the sound card, the overhead would be immense) tends to be more significant.
Wouldn't each sample be just an amplitude(say, 16bit), not a since function? You can't recover frequency data without a significant number of pulses but that's what the low pass filter is for. Digital audio is cool but PCM is just a collection of analog samples. There's no reason why it couldnt be an energy signal.
This is the sampling theorem. You start with a continuous band-limited signal (e.g. sound pressure [0], low-pass filtered such that there is essentially no content above 20kHz [1]). You then sample it by measuring and recording the pressure, f_s times per second (e.g. 48 kHz). The result is called PCM (Pulse Code Modulation).
Now you could play it back wrong by emitting a sharp pulse f_s times per second with the indicated level. This will have a lot of frequency content above 20kHz and, in fact, above f_s/2. It will sounds all kinds of nasty. In fact, it’s what you get by multiplying the time-domain signal by a pulse train, which is equivalent to convolving the frequency-domain signal with some sort of comb, and the result is not pretty.
Or you do what the sampling theorem says and emit a sinc-shaped pulse for each sample, and you get exactly the original signal. Except that sinc pulses are infinitely long in both directions.
[0] Energy is proportional to pressure squared. You’re sampling pressure, not energy.
[1] This is necessary to prevent aliasing. If you feed this algorithm a signal at f_s/2 + 5kHz, it would come back out at f_s - 5kHz, which may be audible.
Sampling does not lose information below the Nyquist limit, but quantization does introduce errors that can't be fixed. And resampling at a different rate might introduce extra errors, like when you recompress a JPEG.
I see I lose data on the [18kHz..) range, but at the same time as a male I'm not supposed to hear that past in my early 30s, sprinkle concerts on top and make it more like 16kHz :/
At least I don't have tinnitus.
Here's my test,
```fish
set -l sample ~/Music/your_sample_song.flac # NOTE: Maybe clip a 30s sample beforehand
set -l borked /tmp/borked.flac # WARN: Will get overwritten (but more likely won't exist yet)
cp -f $sample $borked
for i in (seq 10)
echo "$i: Resampling to 44.1kHz..."
ffmpeg -i $borked -ar 44100 -y $borked.tmp.flac 2>/dev/null
mv $borked.tmp.flac $borked
echo "$i: Resampling to 48kHz..."
ffmpeg -i /tmp/borked.flac -ar 48000 -y $borked.tmp.flac 2>/dev/null
mv $borked.tmp.flac $borked
end
echo "Playing original $sample"
ffplay -nodisp -autoexit $sample 2>/dev/null
echo "Playing borked file $borked"
ffplay -nodisp -autoexit $borked 2>/dev/null
echo "Diffing..."
set -l spec_config 's=2048x1024:start=0:stop=22000:scale=log:legend=1'
ffmpeg -i $sample -lavfi showspectrumpic=$spec_config /tmp/sample.png -y 2>/dev/null
ffmpeg -i $borked -lavfi showspectrumpic=$spec_config /tmp/borked.png -y 2>/dev/null
echo "Spectrograms,"
ls -l /tmp/*.spec.png
```In the audio world, quantization is usually discussed in terms of bit-depth rather than sample rate.
Yeah, they know and their comment reflects that knowledge. They're saying that if we had infinite bit depth, we could arbitrarily resample anything to anything as long as the sample rate is above the Nyquist frequency; however we don't have an infinite bit depth, we have a finite bit depth (i.e the samples are quantized), which limits the dynamic range (i.e introduces noise). This noise can compound when resampling.
The key point is that even with finite bit depth (as long as you dither properly), the effect of finite bit depth is easily controlled noise of program chosen spectrum. i.e. as long as your sampling isn't doing anything really dumb, the noise introduced by sampling is well below noise floor.
This is a nice video. But I’m wondering: do we even need to get back the original signal from the samples? The zero-order hold output actually contains the same audible frequencies doesn’t it? If we only want to listen to it, the stepped wave would be enough then
in theory yes, in practice no.
the article explains why.
tldr: formula for regenerating signal at time t uses an infinite amount of samples in the past and future.
As a real world example, on Windows, unless you take exclusive access of the audio output device, everything is already resampled to 48khz in the mixer. Well, technically it gets resampled to the default configured device sample rate, but I haven't seen anything other than 48khz in at least a decade if ever. Practically this is a non-issue, though I could understand wanting bit-perfect reproduction of a 44.1 khz source.
For those looking to delve into this topic more, the term of art is ASRC: Asynchronous Sample Rate Conversion.
> it's probably worth avoiding the resampling of 44.1 to 48 kHz
Ehhm, yeah, duh? You don't resample unless there is a clear need, and even then you don't upsample and only downsample, and you tell anyone that tries to convince you otherwise to go away and find the original (analog) source, so you can do a proper transfer.
That seems a rather shallow - and probably incorrect - reading of the source. This is an efficiency and trust trade off as noted:
> given sufficient computing resources, we can resample 44.1 kHz to 48 kHz perfectly. No loss, no inaccuracies.
and then further
> Your smartphone probably can resample 44.1 kHz to 48 kHz in such a way that the errors are undetectable even in theory, because they are smaller than the noise floor. Proper audio equipment can certainly do so.
That is you don't need the original source to do a proper transfer. The author is simply noting
> Although this conversion can be done in such a way as to produce no audible errors, it's hard to be sure it actually is.
That is that re-sampling is not a bad idea in this case because it's going to have any sort of error if done properly, it's just that the Author notes you cannot trust any random given re-sampler to do so.
Therefore if you do need to resample, you can do so without the analog source, as long as you have a re-sampler you can trust, or do it yourself.
Speaking of a resampler you trust, I’ve had good experience with libsamplerate (http://www.mega-nerd.com/SRC/), which as of 2016 is BSD licensed.
If only it was that simple T_T
I'm working on a game. My game stores audio files as 44.1kHz .ogg files. If my game is the only thing playing audio, then great, the system sound mixer can configure the DAC to work in 44.1kHz mode.
But if other software is trying to play 48kHz sound files at the same time? Either my game has to resample from 44.1kHz to 48kHz before sending it to the system, or the system sound mixer needs to resample it to 48kHz, or the system sound mixer needs to resample the other software from 48kHz to 44.1kHz.
Unless I'm missing something?
You are right; the system sound mixer should handle all resampling unless you explicitly take exclusive control of the audio device. On Windows at least, this means everything generally gets resampled to 48khz. If you are trying to get the lowest latency possible, this can be an obstacle... on the order of single digit milliseconds.
Is this not the job of the operating system or its supporting parts, to deal with audio from various sources? It should not be necessary to inspect the state of the OS your game is running on, to know what kind of audio you can playback. In fact, that could even be considered spying on things you shouldn't. Maybe the OS or its sound system does not abstract that from you and I am wrong about the state of OS in reality, but this seems to me like a pretty big oversight, if true. If I extrapolate from your use-case, then that would mean any application performing any playback of sound, needs to inspect whether something else is running on the system. That seems like a pretty big overreach.
As an example, lets say I change frequency in Audacity and press the play button. Does Audacity now go and inspect, whether anything else on my system is making any sound?
It is and it is done but you might not have control over process.
In PulseAudio you can choose resample method you want to use for the whole mixing daemon but I don't think that's option in windows/macos
Depends on platform. But yes.
It is also the job of the operating system or its supporting parts to allow applications to configure audio devices to specific sample rates if that's what the application needs.
It's fine to just take whatever you get if you are a game app, and either allow the OS to resample, or do the resampling yourself on the fly.
Not so fine if you are authoring audio, where the audio device rate ABSOLUTELY has to match the rate of content that's being created. It is NOT acceptable to have the OS doing resampling when that's the case.
Audacity allows you to force the sample rate of the input and output devices on both Windows and Linux. Much easier on Windows; utterly chaotic and bug-filled and miserable and unpredictable on Linux (although up-to-date versions of Pipewire can almost mostly sometimes do the right thing, usually).
> Is this not the job of the operating system or its supporting parts, to deal with audio from various sources
I think that's the point? In practice the OS (or its supporting parts) resample audio all the time. It's "under the hood" but the only way to actually avoid it would be to limit all audio files and playback systems to a single rate.
I don't understand then, why they need to deal with that when making a game, unless they are not satisfied with the way that the OS resamples under the hood.
My reading is not that they're saying it's something they necessarily have deal with themselves, but that it's something they can't practically avoid.
But they CAN practically avoid it. lol. Just let the system do it for them.
If my audio files are 44.1kHz, and the user plays 48kHz audio at the same time, how do I practically avoid my audio being resampled?
You cannot avoid it either way then, I guess. Either you let the system do it for you, or you take matters into your own hands. But why do you feel it necessary to take matters into your own hands? I think that's the actual question that begs answering. Are you unsatisfied with how the system does the resampling? Does it result in a worse quality than your own implementation of resampling? Or is there another reason?
I suppose, if you interpret "avoid" as "not care about".
I interpret them to mean "avoid doing it oneself" not "avoid it happening entirely".