DSP FILTERS

References

https://www.allaboutcircuits.com/technical-articles/design-of-fir-filters-using-frequency-sampling-method/

https://github.com/T3sl4co1l/Reverb/blob/master/asmdsp.S

Relevant parts, in order of complexity?

dspHighpass:

this is a simple, very low cutoff (like 10Hz) single pole IIR filter. Very simple, easy to understand: the gain factor means previous signals gradually bleed away (weighted towards zero, in a geometric series – hence exponential decay). Note “hp” is 16.16 fixed point: the top half (the “DC” bias) is subtracted from the sample, then a (in this case very small) gain factor is applied, and accumulated into “hp”. This is stored back into RAM for next time. The #if…#else is just an optimization to take advantage of an 8-bit-sized coefficient. (In both cases, the MAC operation is far more compact than avr-gcc’s verbose library-driven solution.)

These are signed operations by the way, for the most part. It may be helpful to note that AVR MUL instructions return their result in r0:r1, and also return a carry bit depending on operand signs so that sign extension can be done instantly with a subsequent carry chain.

See also dspMix at the bottom: a 8x16 MAC operation, setting the gain of “clean” and “reverb” channels. Another quirk: because reverb is by feedback only, it’s always at least 50% “clean”. These settings are signed, so you can cancel out the direct term without messing up the reverb loop, and make it very “dirty” if you like. Or just plain old delayed, if you set a single tap.

More advanced, dspReverbTaps:

Specifically, the .LdspRT_loop section. (“.L” is a local label in GCC / GAS.) Two functions are inlined here, which appear earlier (either unused or commented out), to remove call overhead and register shuffling (GCC’s ABI isn’t very convenient here; instead, we can set up all the pointers and just crank away). This inlines all the GPIO register access; the port functions are descriptively named. (The external RAM is faster than the CPU-GPIO propagation, so a few NOPs are needed. A loop INC is also inlined here (see lines 445, 475, 494). RAM access is two bytes at a time, effectively a 32kWORD memory space, giving a maximum delay of 1.31 seconds.)

The sample retrieved from RAM, is then multiplied by the tap coefficient (loaded from [Z] aka [r30:r31], which was also inlined at (479)). The 8 * 16 multiply should be familiar from above.

(Below the loop, a shift adjusts the 16.8-bit fixed point accumulator once to account for the signed operands (gains are -128 to +127), with a check for saturating arithmetic, then an implicit byte shift for output, truncating the fractional byte.)

Advanced: dspBiquadFilter:

This is a bit of juggling to get everything into a useful form. dsp.c gives a clearer high-level view. The canonical form is a bunch of coefficients, three source samples and two output samples. I’ve moved that around for easier access, so that the core loop can just let it rip over five consecutive pairs of coefficients and samples (no matter where they came from, source or past output!). This is done at 16x16 = 32 bit precision to stave off rounding errors; in practice, I can set a pretty narrow bandpass filter and it’s still stable. (Or, I’m not actually sure where a biquad filter is least stable; if it’s more in terms of proximity to Fs/2 rather than bandwidth, maybe I just haven’t checked the right end?) This also causes some adjustments in magnitude, so the coefficients are a bit non-traditional, and there are additional shifts at the end to make up the difference.

In any case, you can see the core loop .LdspBQ_for: is, even more compact than the tap loop; it’s just that there’s five passes, so it takes a fair chunk of time to compute. (The outer loop is over the two selectable filter taps.)

Convolution

The most important operation is convolution. This has many general applications in signals, and mathematical analysis, but the discrete-time, finite window implementation is simple:

  • Take two arrays, and an accumulator (initialized to zero every time step).

  • For every time step:

  • Input a new sample to the one array, and discard the oldest sample (circular buffer or queue).

  • Take the product of each pair of elements a[k] and b[k] in the arrays.

  • Add this product to the accumulator.

  • Multiply the accumulator by an output coefficient, and output it.

Congratulations, you have an FIR filter. If a[k] is a list of past input samples, then the coefficients b[k] are the filter coefficients, effectively setting the gains on a very dense array of reverb taps. By setting these judiciously, any desired frequency response can be had.

(And for closure with signals, we can treat the input and output series as functions of time, and express the operation for a general time step n, instead of iterating through the process. In that case, the array a[k] is a partial sequence of the total input sequence. Or for continuous-time, the functions are also continuous, and a very similar integral (in the sense of the equivalence between Riemann summation and integration) is used.)

The core operation is also the dot product from linear algebra: treat the arrays as vectors of numbers, and take the sum of pairwise products. It should be no surprise at all that DSP processors have been used for graphics accelerators time and time again (3D geometry calculations involve lots of dot, cross and matrix products); or that “multimedia extensions” are really just short* vectorized operations (SIMD).

  • Relatively speaking; the original MMX was two and four elements wide, IIRC. Nowadays with AVX512 and friends, and likely even bigger ones on the horizon – the amount of numerical power available in single instructions is immense. :o

Back to signals and DSP, the frequency response is given by the Z-transform, the discrete-time equivalent of the Fourier transform. Indeed, the equivalence is so direct, there exists a fairly simple conformal mapping between them.

We can also record output states into a queue c[k], and produce the general sum:

c[n] = c[n-1] * d[0] + c[n-2] * d[1] + ...
+ a[n] * b[0] + a[n-1] * b[1] + ...

where the array d[k] is the set of coefficients of feedback terms. (Note I’m numbering arrays in counter-flow order, i.e. n-minus-something is a prior input or output, while the coefficients are just numbered in order, 0, 1, 2, … They might end up this way in memory, or both along incrementing addresses; whatever works.)

Obviously(?), the feedback system is harder to work with – if you just sit and have a play with this, you’ll find that stable coefficients, for some given frequency response, don’t make much sense, or start to look random rather quickly, especially for high-order filters. You might not even be able to find stable coefficients at all, depending on numerical precision!

The difference is, whereas a continuous-time filter must have its poles on the left half-plane, the discrete-time filter must have its poles* within the unit circle. (The LHP is mapped to the circle interior, RHP exterior, and the vertical axis, the circle itself. Crazy, huh?) Well, solving for poles in very specific locations, tends to need very precise and random-looking coefficients, so that’s why these filters tend to need high precision, especially in high orders.

  • Yup, the roots of the auxiliary polynomial, given by powers of z times the gain coefficients – same as the coefficients on polynomials in terms of omega.

Also, feedback means that a stable response has an impulse response inverse-exponential with time. Hence, infinite impulse response (IIR).

So, because high order IIR filters are very easily unstable, we tend not to use them, and stick to cascades of 2nd order filters – just like active op-amp filters, where we get 2 (sometimes 3) poles per stage, and just stack enough, of whatever Q factor and all that, is needed to get the final overall response.