grafx.processors.reverb
- class STFTMaskedNoiseReverb(ir_len=60000, processor_channel='pseudo_midside', n_fft=384, hop_length=192, fixed_noise=True, gain_envelope=False, flashfftconv=True, max_input_len=131072)
Bases:
Module
A filtered noise model [EHGR20] (or pseudo-random noise method) with mid/side controls.
We employ two fixed-length uniform noise signal, \(v_{\mathrm{m}}[n]\) and \(v_{\mathrm{s}}[n] \sim \mathcal{U}[-1, 1)\), that correpond to a mid and side chananels, respoenctively. Next, we apply a magnitude mask \(M_{\mathrm{x}}[k, m] \in \mathbb{R}^{K\times M}\) to each noise’s short-time Fourier transform (STFT) \(V_{\mathrm{x}}[k, m] \in \mathbb{C}^{K\times M}\).
\[ H_{\mathrm{x}}[k, m] = V_{\mathrm{x}}[k, m] \odot M_{\mathrm{x}}[k, m] \quad (\mathrm{x} \in \{\mathrm{m}, \mathrm{s}\}). \]Here, \(k\) and \(m\) denote frequency and time frame index, respectively. Each mask is parameterized with an initial \(H^0_{\mathrm{x}}[k] \in \mathbb{R}^K\) and an absorption filter \(H^\Delta_{\mathrm{x}}[k] \in \mathbb{R}^K\) both in log-magnitudes. Also, a frequency-independent gain enevelope \(G_{\mathrm{x}}[m] \in \mathbb{R}^{M}\) can be optionally added.
\[ M_{\mathrm{x}}[k, m] = \exp ({H^0_{\mathrm{x}}[k] + (m-1) H^\Delta_{\mathrm{x}}[k]} + \underbrace{G_{\mathrm{x}}[m]}_{\mathrm{optional}}). \]- Parameters:
ir_len (
int
, optional) – The length of the impulse response (default:60000
).n_fft (
int
, optional) – FFT size of the STFT (default:384
).hop_length (
int
, optional) – Hop length of the STFT (default:192
).fixed_noise (
bool
, optional) – If set toTrue
, we use fixed-seed random noises (\(v_{\mathrm{m}}[n]\) and \(v_{\mathrm{s}}[n]\)) for every forward pass. If set toFalse
, we create different uniform noises for every forward pass (default:True
).gain_envelope (
bool
, optional) – If set toTrue
, we use the log-magnitude gain envelope \(G[m]\) (default:False
).flashfftconv (
bool
, optional) – An option to useFlashFFTConv
[FKNRe23] as a backend to perform the causal convolution efficiently (default:True
).max_input_len (
int
, optional) – Whenflashfftconv
is set toTrue
, the max input length must be also given (default:2**17
).
- forward(input_signals, init_log_magnitude, delta_log_magnitude, gain_env_log_magnitude=None)
Processes input audio with the processor and given parameters.
- Parameters:
input_signals (
FloatTensor
, \(B\times 2\times L\)) – A batch of input audio signals.init_log_magnitude (
FloatTensor
, \(B\times 2\times K \:\!\)) – A batch of log-magnitudes of the initial filters. We assume that the mid- and side-channel responses, \(H^0_{\mathrm{m}}\) and \(H^0_{\mathrm{s}}\) repectively, are stacked together (the same applies to the remaining tensors).delta_log_magnitude (
FloatTensor
, \(B\times 2\times K \:\!\)) – A batch of log-magnitudes of the absorption filters.gain_env_log_magnitude (
FloatTensor
, \(B\times 2\times M\), optional) – A batch of log-gain envelopes.
- Returns:
A batch of output signals of shape \(B \times C \times L\).
- Return type:
FloatTensor
- parameter_size()
- Returns:
A dictionary that contains each parameter tensor’s shape.
- Return type:
Dict[str, Tuple[int, ...]]
- class FilteredNoiseShapingReverb(ir_len=60000, num_bands=12, processor_channel='midside', f_min=31.5, f_max=15000, scale='log', sr=30000, zerophase=True, order=2, noise_randomness='pseudo-random', use_fade_in=False, min_decay_ms=50, max_decay_ms=2000, flashfftconv=True, max_input_len=131072)
Bases:
Module
A time-domain FIR filter based on the envelope “shaping” of filterbank noise signals [SIC21].
From a noise signal \(v[n] \sim \mathcal{U}[-1, 1)\), we apply a \(K\)-band filterbank to obtain a set of filtered noise signals \(v_1[n], \cdots, v_K[n]\). Then, we apply a time-domain envelope shaping, \(a_i[n]\), to each filtered noise signal as follows,
\[ h[n] = \sum_{i=1}^K a_i[n] v_i[n]. \]Each envelope shaping is parameterized by a decay \(r_i\) and an initial gain \(g_i\). Furthermore, we can set a fade-in envelope to the shaping which is set to be shorter than the decay time.
- Parameters:
ir_len (
int
, optional) – The length of the impulse response (default:60000
).num_bands (
int
, optional) – The number of frequency bands (default:12
).processor_channel (
str
, optional) – The channel type of the processor, either"midside"
,"stereo"
, or"mono"
(default:"midside"
f_min (
float
, optional) – The minimum frequency of the filtered noise (default:31.5
).f_max (
float
, optional) – The maximum frequency of the filtered noise (default:15000
).scale (
str
, optional) – Frequency scale to use:"bark_traunmuller"
,"bark_schroeder"
,"bark_wang"
,"mel_htk"
,"mel_slaney"
,"linear"
,"log"
(default:"log"
).sr (
int
, optional) – The sample rate of the filtered noise (default:30000
).zerophase (
bool
, optional) – If set toTrue
, we use a zero-phase crossover filter (default:True
).order (
int
, optional) – The order of the crossover filter (default:2
).noise_randomness (
str
, optional) – The randomness of the filtered noise, either"pseudo-random"
,"fixed"
, or"random"
(default:"pseudo-random"
).use_fade_in (
bool
, optional) – If set toTrue
, we use a fade-in envelope (default:False
).min_decay_ms (
float
, optional) – The minimum decay time in milliseconds (default:50
).max_decay_ms (
float
, optional) – The maximum decay time in milliseconds (default:2000
).flashfftconv (
bool
, optional) – An option to useFlashFFTConv
[FKNRe23] as a backend to perform the causal convolution efficiently (default:True
).max_input_len (
int
, optional) – Whenflashfftconv
is set toTrue
, the max input length must be also given (default:2**17
).
- forward(input_signals, log_decay, log_gain, log_fade_in=None, z_fade_in_gain=None)
Processes input audio with the processor and given parameters.
- Parameters:
input_signals (
FloatTensor
, \(B\times C\times L\)) – A batch of input audio signals.log_decay (
FloatTensor
, \(B\times C_{\mathrm{rev}}\times K \:\!\)) – A batch of log-decay values.log_gain (
FloatTensor
, \(B\times C_{\mathrm{rev}}\times K \:\!\)) – A batch of log-gain values.log_fade_in (
FloatTensor
, \(B\times C_{\mathrm{rev}}\times K\), optional) – A batch of log-fade-in values (default:None
).
- Returns:
A batch of output signals of shape \(B \times C \times L\).
- Return type:
FloatTensor
- parameter_size()
- Returns:
A dictionary that contains each parameter tensor’s shape.
- Return type:
Dict[str, Tuple[int, ...]]