grafx.processors.reverb

class STFTMaskedNoiseReverb(ir_len=60000, processor_channel='pseudo_midside', n_fft=384, hop_length=192, fixed_noise=True, gain_envelope=False, flashfftconv=True, max_input_len=131072)

Bases: Module

A filtered noise model [EHGR20] (or pseudo-random noise method) with mid/side controls.

We employ two fixed-length uniform noise signal, \(v_{\mathrm{m}}[n]\) and \(v_{\mathrm{s}}[n] \sim \mathcal{U}[-1, 1)\), that correpond to a mid and side chananels, respoenctively. Next, we apply a magnitude mask \(M_{\mathrm{x}}[k, m] \in \mathbb{R}^{K\times M}\) to each noise’s short-time Fourier transform (STFT) \(V_{\mathrm{x}}[k, m] \in \mathbb{C}^{K\times M}\).

\[ H_{\mathrm{x}}[k, m] = V_{\mathrm{x}}[k, m] \odot M_{\mathrm{x}}[k, m] \quad (\mathrm{x} \in \{\mathrm{m}, \mathrm{s}\}). \]

Here, \(k\) and \(m\) denote frequency and time frame index, respectively. Each mask is parameterized with an initial \(H^0_{\mathrm{x}}[k] \in \mathbb{R}^K\) and an absorption filter \(H^\Delta_{\mathrm{x}}[k] \in \mathbb{R}^K\) both in log-magnitudes. Also, a frequency-independent gain enevelope \(G_{\mathrm{x}}[m] \in \mathbb{R}^{M}\) can be optionally added.

\[ M_{\mathrm{x}}[k, m] = \exp ({H^0_{\mathrm{x}}[k] + (m-1) H^\Delta_{\mathrm{x}}[k]} + \underbrace{G_{\mathrm{x}}[m]}_{\mathrm{optional}}). \]
Next, we convert the masked noises to the time-domain responses, \(h_\mathrm{m}[n]\) and \(h_\mathrm{s}[n]\), via inverse STFT. We obtain the desired FIR \(h[n]\) by converting the mid/side to stereo. Finally, we apply channel-wise convolutions (not a full 2-by-2 stereo convolution) to the input \(u[n]\) and obtain the wet output \(y[n]\). Hence, the learnable parameter is \(p = \{ H^0_{\mathrm{m}}, H^0_{\mathrm{s}}, H^\Delta_{\mathrm{m}}, H^\Delta_{\mathrm{s}}, G_{\mathrm{m}}, G_{\mathrm{s}} \}\) where the latter two are optional.

Parameters:
  • ir_len (int, optional) – The length of the impulse response (default: 60000).

  • n_fft (int, optional) – FFT size of the STFT (default: 384).

  • hop_length (int, optional) – Hop length of the STFT (default: 192).

  • fixed_noise (bool, optional) – If set to True, we use fixed-seed random noises (\(v_{\mathrm{m}}[n]\) and \(v_{\mathrm{s}}[n]\)) for every forward pass. If set to False, we create different uniform noises for every forward pass (default: True).

  • gain_envelope (bool, optional) – If set to True, we use the log-magnitude gain envelope \(G[m]\) (default: False).

  • flashfftconv (bool, optional) – An option to use FlashFFTConv [FKNRe23] as a backend to perform the causal convolution efficiently (default: True).

  • max_input_len (int, optional) – When flashfftconv is set to True, the max input length must be also given (default: 2**17).

forward(input_signals, init_log_magnitude, delta_log_magnitude, gain_env_log_magnitude=None)

Processes input audio with the processor and given parameters.

Parameters:
  • input_signals (FloatTensor, \(B\times 2\times L\)) – A batch of input audio signals.

  • init_log_magnitude (FloatTensor, \(B\times 2\times K \:\!\)) – A batch of log-magnitudes of the initial filters. We assume that the mid- and side-channel responses, \(H^0_{\mathrm{m}}\) and \(H^0_{\mathrm{s}}\) repectively, are stacked together (the same applies to the remaining tensors).

  • delta_log_magnitude (FloatTensor, \(B\times 2\times K \:\!\)) – A batch of log-magnitudes of the absorption filters.

  • gain_env_log_magnitude (FloatTensor, \(B\times 2\times M\), optional) – A batch of log-gain envelopes.

Returns:

A batch of output signals of shape \(B \times C \times L\).

Return type:

FloatTensor

parameter_size()
Returns:

A dictionary that contains each parameter tensor’s shape.

Return type:

Dict[str, Tuple[int, ...]]

class FilteredNoiseShapingReverb(ir_len=60000, num_bands=12, processor_channel='midside', f_min=31.5, f_max=15000, scale='log', sr=30000, zerophase=True, order=2, noise_randomness='pseudo-random', use_fade_in=False, min_decay_ms=50, max_decay_ms=2000, flashfftconv=True, max_input_len=131072)

Bases: Module

A time-domain FIR filter based on the envelope “shaping” of filterbank noise signals [SIC21].

From a noise signal \(v[n] \sim \mathcal{U}[-1, 1)\), we apply a \(K\)-band filterbank to obtain a set of filtered noise signals \(v_1[n], \cdots, v_K[n]\). Then, we apply a time-domain envelope shaping, \(a_i[n]\), to each filtered noise signal as follows,

\[ h[n] = \sum_{i=1}^K a_i[n] v_i[n]. \]

Each envelope shaping is parameterized by a decay \(r_i\) and an initial gain \(g_i\). Furthermore, we can set a fade-in envelope to the shaping which is set to be shorter than the decay time.

Parameters:
  • ir_len (int, optional) – The length of the impulse response (default: 60000).

  • num_bands (int, optional) – The number of frequency bands (default: 12).

  • processor_channel (str, optional) – The channel type of the processor, either "midside", "stereo", or "mono" (default: "midside"

  • f_min (float, optional) – The minimum frequency of the filtered noise (default: 31.5).

  • f_max (float, optional) – The maximum frequency of the filtered noise (default: 15000).

  • scale (str, optional) – Frequency scale to use: "bark_traunmuller", "bark_schroeder", "bark_wang", "mel_htk", "mel_slaney", "linear", "log" (default: "log").

  • sr (int, optional) – The sample rate of the filtered noise (default: 30000).

  • zerophase (bool, optional) – If set to True, we use a zero-phase crossover filter (default: True).

  • order (int, optional) – The order of the crossover filter (default: 2).

  • noise_randomness (str, optional) – The randomness of the filtered noise, either "pseudo-random", "fixed", or "random" (default: "pseudo-random").

  • use_fade_in (bool, optional) – If set to True, we use a fade-in envelope (default: False).

  • min_decay_ms (float, optional) – The minimum decay time in milliseconds (default: 50).

  • max_decay_ms (float, optional) – The maximum decay time in milliseconds (default: 2000).

  • flashfftconv (bool, optional) – An option to use FlashFFTConv [FKNRe23] as a backend to perform the causal convolution efficiently (default: True).

  • max_input_len (int, optional) – When flashfftconv is set to True, the max input length must be also given (default: 2**17).

forward(input_signals, log_decay, log_gain, log_fade_in=None, z_fade_in_gain=None)

Processes input audio with the processor and given parameters.

Parameters:
  • input_signals (FloatTensor, \(B\times C\times L\)) – A batch of input audio signals.

  • log_decay (FloatTensor, \(B\times C_{\mathrm{rev}}\times K \:\!\)) – A batch of log-decay values.

  • log_gain (FloatTensor, \(B\times C_{\mathrm{rev}}\times K \:\!\)) – A batch of log-gain values.

  • log_fade_in (FloatTensor, \(B\times C_{\mathrm{rev}}\times K\), optional) – A batch of log-fade-in values (default: None).

Returns:

A batch of output signals of shape \(B \times C \times L\).

Return type:

FloatTensor

parameter_size()
Returns:

A dictionary that contains each parameter tensor’s shape.

Return type:

Dict[str, Tuple[int, ...]]