grafx.processors.eq
- class ZeroPhaseFIREqualizer(num_magnitude_bins=1024)
Bases:
Module
A single-channel zero-phase finite impulse response (FIR) filter [EHGR20, Smi07b, Smi11].
From the input log-magnitude \(H_{\mathrm{log}}\), we compute inverse FFT (IFFT) of the magnitude response and multiply it with a zero-centered window \(v[n]\). Each input channel is convolved with the following FIR.
\[ h[n] = v[n] \cdot \frac{1}{N} \sum_{k=0}^{N-1} \exp H_{\mathrm{log}}[k] \cdot w_{N}^{kn}. \]Here, \(-(N+1)/2 \leq n \leq (N+1)/2\) and \(w_{N} = \exp(j\cdot 2\pi/N)\). This equalizer’s learnable parameter is \(p = \{ H_{\mathrm{log}} \}\).
- Parameters:
num_magnitude_bins (
int
, optional) – The number of FFT magnitude bins (default:1024
).window (
str
orFloatTensor
, optional) – The window function to use for the FIR filter. Ifstr
is given, we create the window internally. It can be:"hann"
,"hamming"
,"blackman"
,"bartlett"
, and"kaiser"
. IfFloatTensor
is given, we use it as a window (default:"hann"
).**window_kwargs (
Dict[str, Any]
, optional) – Additional keyword arguments for the window function.
- forward(input_signals, log_magnitude)
Processes input audio with the processor and given parameters.
- Parameters:
input_signals (
FloatTensor
, \(B \times C \times L\)) – A batch of input audio signals.log_magnitude (
FloatTensor
, \(B \times K \:\!\)) – A batch of log-magnitude vectors of the FIR filter.
- Returns:
A batch of output signals of shape \(B \times C \times L\).
- Return type:
FloatTensor
- parameter_size()
- Returns:
A dictionary that contains each parameter tensor’s shape.
- Return type:
Dict[str, Tuple[int, ...]]
- class NewZeroPhaseFIREqualizer(num_frequency_bins=1024, processor_channel='mono', use_filterbank=False, filterbank_kwargs={}, window='hann', window_kwargs={}, eps=1e-07, flashfftconv=False)
Bases:
Module
A single-channel zero-phase finite impulse response (FIR) filter [EHGR20, Smi07b, Smi11].
From the input log-magnitude \(H_{\mathrm{log}}\), we compute inverse FFT (IFFT) of the magnitude response and multiply it with a zero-centered window \(w[n]\). Each input channel is convolved with the following FIR.
\[ h[n] = w[n] \cdot \frac{1}{N} \sum_{k=0}^{N-1} \exp H_{\mathrm{log}}[k] \cdot z_{N}^{kn}. \]Here, \(-(N+1)/2 \leq n \leq (N+1)/2\) and \(z_{N} = \exp(j\cdot 2\pi/N)\). This equalizer’s learnable parameter is \(p = \{ H_{\mathrm{log}} \}\).
From the input log-energy \(H_{\mathrm{fb}} \in \mathbb{R}^{K_{\mathrm{fb}}}\), we compute the FFT magnitudes as
\[ H_{\mathrm{log}} = \sqrt { M \exp (H_{\mathrm{fb}}) + \epsilon} \]where \(M \in \mathbb{R}^{K \times K_{\mathrm{fb}}}\) is the filterbank matrix (\(K\) and \(K_{\mathrm{fb}}\) are the number of FFT magnitude bins and filterbank bins, respectively). We use the standard triangular filterbank. This equalizer’s learnable parameter is \(p = \{ H_{\mathrm{fb}} \}\).
- Parameters:
num_frequency_bins (
int
, optional) – The number of FFT energy bins (default:1024
).processor_channel (
str
, optional) – The channel configuration of the equalizer, which can be"mono"
,"stereo"
,"midside"
, or"pseudo_midside"
(default:"mono"
).filterbank (
bool
, optional) – Whether to use the filterbank (default:False
).scale (
str
, optional) – The frequency scale to use, which can be:"bark_traunmuller"
,"bark_schroeder"
,"bark_wang"
,"mel_htk"
,"mel_slaney"
,"linear"
, and"log"
(default:"bark_traunmuller"
).n_filters (
int
, optional) – Number of filterbank bins (default:80
).f_min (
float
, optional) – Minimum frequency in Hz. (default:40
).f_max (
float
orNone
, optional) – Maximum frequency in Hz. IfNone
, the sampling ratesr
must be provided and we use the half of the sampling rate (default:None
).sr (
float
orNone
, optional) – The underlying sampling rate. Only used when using the filterbank (default:None
).window (
str
orFloatTensor
, optional) – The window function to use for the FIR filter. Ifstr
is given, we create the window internally. It can be:"hann"
,"hamming"
,"blackman"
,"bartlett"
, and"kaiser"
. IfFloatTensor
is given, we use it as a window (default:"hann"
).**window_kwargs (
Dict[str, Any]
, optional) – Additional keyword arguments for the window function.
- forward(input_signals, log_magnitude)
Processes input audio with the processor and given parameters.
- Parameters:
input_signals (
FloatTensor
, \(B \times C \times L\)) – A batch of input audio signals.log_magnitude (
FloatTensor
, \(B \times C_\mathrm{eq} \times K\) or \(B \times C_\mathrm{eq} \times K_\mathrm{fb}\)) – A batch of log-magnitude vectors of the FIR filter.
- Returns:
A batch of output signals of shape \(B \times C \times L\).
- Return type:
FloatTensor
- parameter_size()
- Returns:
A dictionary that contains each parameter tensor’s shape.
- Return type:
Dict[str, Tuple[int, ...]]
- class ParametricEqualizer(num_filters=10, processor_channel='mono', use_shelving_filters=True, **backend_kwargs)
Bases:
Module
A parametric equalizer (PEQ) based on second-order filters.
We cascade \(K\) biquad filters to form a parametric equalizer,
\[ H(z) = \prod_{k=1}^{K} H_k(z) \]By default, \(k=1\) and \(k=K\) are low-shelf and high-shelf filters, respectively, and the remainings are peaking filters. See
LowShelf
,PeakingFilter
, andHighShelf
for the filter details.- Parameters:
num_filters (
int
, optional) – The number of filters to use (default:10
).processor_channel (
str
, optional) – The channel configuration of the equalizer, which can be"mono"
,"stereo"
, or"midside"
(default:"mono"
).use_shelving_filters (
bool
, optional) – Whether to use a low-shelf and high-shelf filter. If false, we use only peaking filters (default:True
) (default:True
).**backend_kwargs (
Dict[str, Any]
, optional) – Additional keyword arguments for the backend.
- forward(input_signals, w0, q_inv, log_gain)
Processes input audio with the processor and given parameters.
- Parameters:
input_signals (
FloatTensor
, \(B \times C \times L\)) – A batch of input audio signals.w0 (
FloatTensor
, \(B \times K\)) – A batch of cutoff frequencies.q_inv (
FloatTensor
, \(B \times K\)) – A batch of quality factors (or resonance).log_gain (
FloatTensor
, \(B \times K\)) – A batch of log-gains.
- Returns:
A batch of output signals of shape \(B \times C \times L\).
- Return type:
FloatTensor
- parameter_size()
- Returns:
A dictionary that contains each parameter tensor’s shape.
- Return type:
Dict[str, Tuple[int, ...]]
- class GraphicEqualizer(processor_channel='mono', scale='bark', sr=44100, **backend_kwargs)
Bases:
Module
A graphic equalizer (GEQ) based on second-order peaking filters [LV+17].
We cascade \(K\) biquad filters to form a graphic equalizer, whose transfer function is given as \(H(z) = \prod_{k=1}^{K} H_k(z)\) where each biquad \(H_k(z)\) is as follows,
\[ H_k(z)=\frac{1+g_k \beta_k-2 \cos (\omega_k) z^{-1}+(1-g_k \beta_k) z^{-2}}{1+\beta_k-2 \cos (\omega_k) z^{-1}+(1-\beta_k) z^{-2}}. \]Here, \(g_k\) is the linear gain and \(\omega_k\) is the center frequency. \(\beta_k\) is given as
\[ \beta_k = \sqrt{\frac{\left|\tilde{g}_k^2-1\right|}{\left|g_k^2-\tilde{g}_k^2\right|}} \tan {\frac{B_k}{2}} \]where \(B_k\) is the bandwidth frequency and \(\tilde{g}_k\) is the gain at the neighboring band frequency, pre-determined to be \(\tilde{g}_k = g_k^{0.4}\). The frequency values (\(\omega_k\) and \(B_k\)) and the number of bands \(K\) are also determined by the frequency scale. The learnable parameter is a concatenation of the log-magnitudes, i.e., \(\smash{p = \{ \mathbf{g}^{\mathrm{log}} \}}\) where \(\smash{g_k = \exp g_k^{\mathrm{log}}}\).
Note that the log-gain parameters are different to the equalizer’s log-magnitude response values at the center frequencies known as “control points”. To set the log-gains to match the control points, we can use least-square optimization methods [LV+17, VR19].
- Parameters:
scale (
str
, optional) – The frequency scale to use, which can be: 24-band"bark"
and 31-band"third_oct"
(default:"bark"
).sr (
int
, optional) – The underlying sampling rate of the input signal (default:44100
).backend (
str
, optional) – The backend to use for the filtering, which can either be the frequency-sampling method"fsm"
or exact time-domain filter"lfilter"
(default:"fsm"
).fsm_fir_len (
int
, optional) – The length of FIR approximation whenbackend == "fsm"
(default:8192
).
- forward(input_signals, log_gains)
Processes input audio with the processor and given parameters.
- Parameters:
input_signals (
FloatTensor
, \(B \times C \times L\)) – A batch of input audio signals.log_gains (
FloatTensor
, \(B \times K \:\!\)) – A batch of log-gain vectors of the GEQ.
- Returns:
A batch of output signals of shape \(B \times C \times L\).
- Return type:
FloatTensor
- parameter_size()
- Returns:
A dictionary that contains each parameter tensor’s shape.
- Return type:
Dict[str, Tuple[int, ...]]