grafx.processors.core
- class FIRConvolution(mode='causal', flashfftconv=True, max_input_len=131072)
Bases:
Module
A FIR convolution backend, which can use either native FFT-based convolution or
FlashFFTConv
[FKNRe23]. Allows for causal and zero-phase convolution modes.For an input
and a filter the operation is defined as a usual convolution. However, the output length will be the one of the input and the number of the output channels will be determined by broadcasting.- Parameters:
mode (
str
, optional) – The convolution mode, either"causal"
or"zerophase"
(default:"causal"
).flashfftconv (
bool
, optional) – An option to useFlashFFTConv
as a backend (default:True
).max_input_len (
int
, optional) – Whenflashfftconv
is set toTrue
, the max input length must be also given (default:2**17
).
- forward(input_signals, fir)
Performs the convolution operation.
- Parameters:
input_signals (
FloatTensor
, ) – A batch of input audio signals.fir (
FloatTensor
, ) – A batch of FIR filters.
- Returns:
A batch of convolved signals of shape
where .- Return type:
FloatTensor
- class TriangularFilterBank(num_frequency_bins, num_filters=50, scale='bark_traunmuller', f_min=40, f_max=None, sr=44100, low_half_triangle=True)
Bases:
Module
Creates a triangular filterbank for the given frequency range. Code adapted from torchaudio and Diff-MST.
We provide both analysis and synthesis mode. For the synthesis mode, we expand the input energy
with the number of filterbanks to the linear FFT scale . is the standard trainagular filterbank matrix. The analysis mode downsamples the frequency axis by multiplying the normalized filterbank matrix (sum of each filterbank is 1; hence an adaptive weighted average pooling).- Parameters:
num_frequency_bins (
int
) – Number of frequency bins from linear FFT.num_filters (
int
) – Number of the filterbank filters.scale (
str
, optional) – Frequency scale to use:"bark_traunmuller"
,"bark_schroeder"
,"bark_wang"
,"mel_htk"
,"mel_slaney"
,"linear"
,"log"
(default:"bark_traunmuller"
).f_min (
float
, optional) – Minimum frequency (default:40
).f_max (
float
, optional) – Maximum frequency (default:None
).low_half_triangle (
bool
, optional) – Attach the remaining low-freq parts (default:True
).
- forward(energy, mode='synthesis')
Apply the filterbank to the energy tensor.
- Parameters:
energy (
FloatTensor
, ) – A batch of energy tensors.mode (
str
, optional) – Mode of operation:"analysis"
or"synthesis"
(default:"synthesis"
).
- Returns:
The energy tensor after applying the filterbank.
- Return type:
FloatTensor
- static compute_matrix(num_frequency_bins, num_filters, scale, f_min, f_max, sr, low_half_triangle)
Compute the triangular filterbank matrix
.
- class IIRFilter(order=2, backend='fsm', flashfftconv=True, fsm_fir_len=4000, fsm_max_input_len=131072, fsm_regularization=False)
Bases:
Module
A serial stack of second-order filters (biquads) with the given coefficients.
The transfer function of the
stacked biquads is given as [Smi07b]We provide three backends for the filtering. The first one,
"lfilter"
, is the time-domain method that computes the difference equation exactly. It usestorchaudio.lfilter
, which uses the direct form I implementation (the bar denotes the normalized coefficients by ) [YMC+24].The second one,
"fsm"
, is the frequency-sampling method (FSM) that approximates the filter with a finite impulse response (FIR) by sampling the discrete-time Fourier transform (DTFT) of the filter at a finite number of points uniformly [KPE20, RGM70].Here,
so that becomes the -th -point discrete Fourier transform (DFT) bin. Then, the FIR filter is obtained by taking the inverse DFT of the sampled DTFT and the final output signal is computed by convolving the input signal with the FIR filter as . This"fsm"
backend is faster than the former"lfilter"
but only an approximation. This error is called time-domain aliasing; the frequency-sampled FIR is given as follows [Smi07a].where
is the true infinite impulse response (IIR). Clearly, increasing the number of samples reduces the error.The third one,
"ssm"
, is based on the diagonalisation of the state-space model (SSM) of the biquad filter so it only works for the second-order filters. This idea is based on Ben Hayes’s derivation of associative scan for parallel IIR filter computation and implemented by Chin-Yun Yu. The direct form II implementation of the biquad filter can be written in state-space form [Smi07b] as can be decomposed as where is either a diagonal matrix with real poles on the diagonal or a scaled rotation matrix, which can be represented by one of the complex conjugate poles. Using this decomposition, the filter can be implemented as first-order recursive filters on the projected siganl , where we leverage Parallel Scan [MC18] to speed up the computation on the GPU. Finally, the output is projected back to the original basis using .We recommend using the
"ssm"
over the"lfilter"
backend in general, not only because it runs several times faster on the GPU but it’s more numerically stable.- Parameters:
num_filters (
int
, optional) – Number of biquads to use (default:1
).normalized (
bool
, optional) – If set toTrue
, the filter coefficients are assumed to be normalized by , making the number of learnable parameters per biquad instead of (default:False
).backend (
str
, optional) – The backend to use for the filtering, which can either be the frequency-sampling method"fsm"
or exact time-domain filters,"lfilter"
or"ssm"
(default:"fsm"
).fsm_fir_len (
int
, optional) – The length of FIR approximation whenbackend == "fsm"
(default:8192
).
- forward(input_signal, Bs, As)
Apply the IIR filter to the input signal and the given coefficients.
- Parameters:
input_signal (
FloatTensor
, ) – A batch of input audio signals.Bs (
FloatTensor
, ) – A batch of biquad coefficients, , stacked in the last dimension.As (
FloatTensor
, ) – A batch of biquad coefficients, , stacked in the last dimension.
- class SurrogateDelay(N, straight_through=True, radii_loss=True, normalize_gradients=True)
Bases:
Module
A surrogate FIR processor for a learnable delay line.
A single delay can be represented as a FIR filter
where is a delay length we want to optimize and denotes a unit impulse. We exploit the fact that each delay corresponds to a complex sinusoid in the frequency domain. Such a sinusoid’s angular frequency can be optimized with the gradient descent if we allow it to be inside the unit disk, i.e., [HSF23]. We first start with an unconstrained complex parameter and restrict it to be inside the unit disk (in the same way of restricting the poles [NSW21]) with the following activation function.Then, we compute a damped sinusoid with the normalized frequency
then use its inverse FFT as a surrogate of the delay.where
. Clearly, it is not a sparse delay line unless is an integer power of (on the unit circle with an exact angle). Instead it becomes a time-aliased and low-passed sinc kernel. We can use this soft delay as is, or we can use straight-through estimation (STE) [BLC13] so that the forward pass uses the hard delays and the backward pass uses the soft delays .For a stable and faster convergence, we provide two additional options. The first one is to normalize the gradients of the complex conjugate to have a unit norm.
The second one is to use the radii loss
to encourage complex angluar frequency to be near the unit circle, making the delays to be “sharp.” We empirically found this regularization to be helpful especially when we use the STE as it alleviates the discrepancy between the hard and soft delays while still having the benefits of the soft FIR.- Parameters:
N (
int
) – The length surrogate FIR, which is also the largest delay length minus one.straight_through (
bool
, optional) – Use hard delays for the forward passes and surrogate soft delays for the backward passes with straight-through estimation (default:True
).normalize_gradients (
bool
, optional) – Normalize the complex conjugate gradients to unit norm (default:True
).radii_loss (
bool
, optional) – Use the radii loss to encourage the delays to be close to the unit circle (default:True
).
- forward(z)
Computes the surrogate delay FIRs from the complex angular frequencies.
- Parameters:
z (
ComplexTensor
, any shape) – The unnormalized complex angular frequencies.- Returns:
A batch of FIRs either hard (when using the straight-through estimation) of soft surrogate delays. The returned tensor has an additional dimension (last) for the FIR taps.
- Return type:
FloatTensor
orTuple[FloatTensor, FloatTensor]
- class TruncatedOnePoleIIRFilter(iir_len=16384, **backend_kwargs)
Bases:
Module
A one-pole IIR filter with a truncated impulse response.
The true one-pole IIR filter is defined as a recursive filter with a coefficient
. Here, for the speed-up, we calculate its truncated impulse response analytically and convolve it to the input signal.The length of the truncated FIR,
, is given as an argumentiir_len
.- forward(input_signals, z_alpha)
Processes input audio with the processor and given coefficients.
- Parameters:
input_signals (
FloatTensor
, ) – A batch of input audio signals.z_alpha (
FloatTensor
, ) – A batch of one-pole coefficients.
- Returns:
A batch of smoothed signals of shape
.- Return type:
FloatTensor
- class Ballistics
Bases:
Module
A ballistics processor that smooths the input signal with a recursive filter.
An input signal
is smoothed with recursively, with a different coefficient for an “attack” and “release”.We calculate the coefficients from the inputs with the sigmoid function, i.e.,
and . We usediffcomp
for the optimized forward and backward computation [YMC+24].- forward(input_signals, z_alpha)
Processes input audio with the processor and given coefficients.
- Parameters:
input_signals (
FloatTensor
, ) – A batch of input audio signals.z_alpha (
FloatTensor
, ) – A batch of attack and release coefficients stacked in the last dimension.
- Returns:
A batch of smoothed signals of shape
.- Return type:
FloatTensor