grafx.processors.reverb
- class STFTMaskedNoiseReverb(ir_len=60000, processor_channel='pseudo_midside', n_fft=384, hop_length=192, fixed_noise=True, gain_envelope=False, flashfftconv=True, max_input_len=131072)
Bases:
ModuleA filtered noise model [EHGR20] (or pseudo-random noise method) with mid/side controls.
We employ two fixed-length uniform noise signal, \(v_{\mathrm{m}}[n]\) and \(v_{\mathrm{s}}[n] \sim \mathcal{U}[-1, 1)\), that correpond to a mid and side chananels, respoenctively. Next, we apply a magnitude mask \(M_{\mathrm{x}}[k, m] \in \mathbb{R}^{K\times M}\) to each noise’s short-time Fourier transform (STFT) \(V_{\mathrm{x}}[k, m] \in \mathbb{C}^{K\times M}\).
\[ H_{\mathrm{x}}[k, m] = V_{\mathrm{x}}[k, m] \odot M_{\mathrm{x}}[k, m] \quad (\mathrm{x} \in \{\mathrm{m}, \mathrm{s}\}). \]Here, \(k\) and \(m\) denote frequency and time frame index, respectively. Each mask is parameterized with an initial \(H^0_{\mathrm{x}}[k] \in \mathbb{R}^K\) and an absorption filter \(H^\Delta_{\mathrm{x}}[k] \in \mathbb{R}^K\) both in log-magnitudes. Also, a frequency-independent gain enevelope \(G_{\mathrm{x}}[m] \in \mathbb{R}^{M}\) can be optionally added.
Next, we convert the masked noises to the time-domain responses, \(h_\mathrm{m}[n]\) and \(h_\mathrm{s}[n]\), via inverse STFT. We obtain the desired FIR \(h[n]\) by converting the mid/side to stereo. Finally, we apply channel-wise convolutions (not a full 2-by-2 stereo convolution) to the input \(u[n]\) and obtain the wet output \(y[n]\). Hence, the learnable parameter is \(p = \{ H^0_{\mathrm{m}}, H^0_{\mathrm{s}}, H^\Delta_{\mathrm{m}}, H^\Delta_{\mathrm{s}}, G_{\mathrm{m}}, G_{\mathrm{s}} \}\) where the latter two are optional.\[ M_{\mathrm{x}}[k, m] = \exp ({H^0_{\mathrm{x}}[k] + (m-1) H^\Delta_{\mathrm{x}}[k]} + \underbrace{G_{\mathrm{x}}[m]}_{\mathrm{optional}}). \]- Parameters:
ir_len (
int, optional) – The length of the impulse response (default:60000).n_fft (
int, optional) – FFT size of the STFT (default:384).hop_length (
int, optional) – Hop length of the STFT (default:192).fixed_noise (
bool, optional) – If set toTrue, we use fixed-seed random noises (\(v_{\mathrm{m}}[n]\) and \(v_{\mathrm{s}}[n]\)) for every forward pass. If set toFalse, we create different uniform noises for every forward pass (default:True).gain_envelope (
bool, optional) – If set toTrue, we use the log-magnitude gain envelope \(G[m]\) (default:False).flashfftconv (
bool, optional) – An option to useFlashFFTConv[FKNRe23] as a backend to perform the causal convolution efficiently (default:True).max_input_len (
int, optional) – Whenflashfftconvis set toTrue, the max input length must be also given (default:2**17).
- forward(input_signals, init_log_magnitude, delta_log_magnitude, gain_env_log_magnitude=None)
Processes input audio with the processor and given parameters.
- Parameters:
input_signals (
FloatTensor, \(B\times 2\times L\)) – A batch of input audio signals.init_log_magnitude (
FloatTensor, \(B\times 2\times K \:\!\)) – A batch of log-magnitudes of the initial filters. We assume that the mid- and side-channel responses, \(H^0_{\mathrm{m}}\) and \(H^0_{\mathrm{s}}\) repectively, are stacked together (the same applies to the remaining tensors).delta_log_magnitude (
FloatTensor, \(B\times 2\times K \:\!\)) – A batch of log-magnitudes of the absorption filters.gain_env_log_magnitude (
FloatTensor, \(B\times 2\times M\), optional) – A batch of log-gain envelopes.
- Returns:
A batch of output signals of shape \(B \times C \times L\).
- Return type:
FloatTensor
- parameter_size()
- Returns:
A dictionary that contains each parameter tensor’s shape.
- Return type:
Dict[str, Tuple[int, ...]]
- class FilteredNoiseShapingReverb(ir_len=60000, num_bands=12, processor_channel='midside', f_min=31.5, f_max=15000, scale='log', sr=30000, zerophase=True, order=2, noise_randomness='pseudo-random', use_fade_in=False, min_decay_ms=50, max_decay_ms=2000, flashfftconv=True, max_input_len=131072)
Bases:
ModuleA time-domain FIR filter based on the envelope “shaping” of filterbank noise signals [SIC21].
From a noise signal \(v[n] \sim \mathcal{U}[-1, 1)\), we apply a \(K\)-band filterbank to obtain a set of filtered noise signals \(v_1[n], \cdots, v_K[n]\). Then, we apply a time-domain envelope shaping, \(a_i[n]\), to each filtered noise signal as follows,
\[ h[n] = \sum_{i=1}^K a_i[n] v_i[n]. \]Each envelope shaping is parameterized by a decay \(r_i\) and an initial gain \(g_i\). Furthermore, we can set a fade-in envelope to the shaping which is set to be shorter than the decay time.
- Parameters:
ir_len (
int, optional) – The length of the impulse response (default:60000).num_bands (
int, optional) – The number of frequency bands (default:12).processor_channel (
str, optional) – The channel type of the processor, either"midside","stereo", or"mono"(default:"midside"f_min (
float, optional) – The minimum frequency of the filtered noise (default:31.5).f_max (
float, optional) – The maximum frequency of the filtered noise (default:15000).scale (
str, optional) – Frequency scale to use:"bark_traunmuller","bark_schroeder","bark_wang","mel_htk","mel_slaney","linear","log"(default:"log").sr (
int, optional) – The sample rate of the filtered noise (default:30000).zerophase (
bool, optional) – If set toTrue, we use a zero-phase crossover filter (default:True).order (
int, optional) – The order of the crossover filter (default:2).noise_randomness (
str, optional) – The randomness of the filtered noise, either"pseudo-random","fixed", or"random"(default:"pseudo-random").use_fade_in (
bool, optional) – If set toTrue, we use a fade-in envelope (default:False).min_decay_ms (
float, optional) – The minimum decay time in milliseconds (default:50).max_decay_ms (
float, optional) – The maximum decay time in milliseconds (default:2000).flashfftconv (
bool, optional) – An option to useFlashFFTConv[FKNRe23] as a backend to perform the causal convolution efficiently (default:True).max_input_len (
int, optional) – Whenflashfftconvis set toTrue, the max input length must be also given (default:2**17).
- forward(input_signals, log_decay, log_gain, log_fade_in=None, z_fade_in_gain=None)
Processes input audio with the processor and given parameters.
- Parameters:
input_signals (
FloatTensor, \(B\times C\times L\)) – A batch of input audio signals.log_decay (
FloatTensor, \(B\times C_{\mathrm{rev}}\times K \:\!\)) – A batch of log-decay values.log_gain (
FloatTensor, \(B\times C_{\mathrm{rev}}\times K \:\!\)) – A batch of log-gain values.log_fade_in (
FloatTensor, \(B\times C_{\mathrm{rev}}\times K\), optional) – A batch of log-fade-in values (default:None).
- Returns:
A batch of output signals of shape \(B \times C \times L\).
- Return type:
FloatTensor
- parameter_size()
- Returns:
A dictionary that contains each parameter tensor’s shape.
- Return type:
Dict[str, Tuple[int, ...]]