grafx.processors.eq
- class ZeroPhaseFIREqualizer(num_magnitude_bins=1024)
Bases:
ModuleA single-channel zero-phase finite impulse response (FIR) filter [EHGR20, Smi07b, Smi11].
From the input log-magnitude \(H_{\mathrm{log}}\), we compute inverse FFT (IFFT) of the magnitude response and multiply it with a zero-centered window \(v[n]\). Each input channel is convolved with the following FIR.
\[ h[n] = v[n] \cdot \frac{1}{N} \sum_{k=0}^{N-1} \exp H_{\mathrm{log}}[k] \cdot w_{N}^{kn}. \]Here, \(-(N+1)/2 \leq n \leq (N+1)/2\) and \(w_{N} = \exp(j\cdot 2\pi/N)\). This equalizer’s learnable parameter is \(p = \{ H_{\mathrm{log}} \}\).
- Parameters:
num_magnitude_bins (
int, optional) – The number of FFT magnitude bins (default:1024).window (
strorFloatTensor, optional) – The window function to use for the FIR filter. Ifstris given, we create the window internally. It can be:"hann","hamming","blackman","bartlett", and"kaiser". IfFloatTensoris given, we use it as a window (default:"hann").**window_kwargs (
Dict[str, Any], optional) – Additional keyword arguments for the window function.
- forward(input_signals, log_magnitude)
Processes input audio with the processor and given parameters.
- Parameters:
input_signals (
FloatTensor, \(B \times C \times L\)) – A batch of input audio signals.log_magnitude (
FloatTensor, \(B \times K \:\!\)) – A batch of log-magnitude vectors of the FIR filter.
- Returns:
A batch of output signals of shape \(B \times C \times L\).
- Return type:
FloatTensor
- parameter_size()
- Returns:
A dictionary that contains each parameter tensor’s shape.
- Return type:
Dict[str, Tuple[int, ...]]
- class NewZeroPhaseFIREqualizer(num_frequency_bins=1024, processor_channel='mono', use_filterbank=False, filterbank_kwargs={}, window='hann', window_kwargs={}, eps=1e-07, flashfftconv=False)
Bases:
ModuleA single-channel zero-phase finite impulse response (FIR) filter [EHGR20, Smi07b, Smi11].
From the input log-magnitude \(H_{\mathrm{log}}\), we compute inverse FFT (IFFT) of the magnitude response and multiply it with a zero-centered window \(w[n]\). Each input channel is convolved with the following FIR.
\[ h[n] = w[n] \cdot \frac{1}{N} \sum_{k=0}^{N-1} \exp H_{\mathrm{log}}[k] \cdot z_{N}^{kn}. \]Here, \(-(N+1)/2 \leq n \leq (N+1)/2\) and \(z_{N} = \exp(j\cdot 2\pi/N)\). This equalizer’s learnable parameter is \(p = \{ H_{\mathrm{log}} \}\).
From the input log-energy \(H_{\mathrm{fb}} \in \mathbb{R}^{K_{\mathrm{fb}}}\), we compute the FFT magnitudes as
\[ H_{\mathrm{log}} = \sqrt { M \exp (H_{\mathrm{fb}}) + \epsilon} \]where \(M \in \mathbb{R}^{K \times K_{\mathrm{fb}}}\) is the filterbank matrix (\(K\) and \(K_{\mathrm{fb}}\) are the number of FFT magnitude bins and filterbank bins, respectively). We use the standard triangular filterbank. This equalizer’s learnable parameter is \(p = \{ H_{\mathrm{fb}} \}\).
- Parameters:
num_frequency_bins (
int, optional) – The number of FFT energy bins (default:1024).processor_channel (
str, optional) – The channel configuration of the equalizer, which can be"mono","stereo","midside", or"pseudo_midside"(default:"mono").filterbank (
bool, optional) – Whether to use the filterbank (default:False).scale (
str, optional) – The frequency scale to use, which can be:"bark_traunmuller","bark_schroeder","bark_wang","mel_htk","mel_slaney","linear", and"log"(default:"bark_traunmuller").n_filters (
int, optional) – Number of filterbank bins (default:80).f_min (
float, optional) – Minimum frequency in Hz. (default:40).f_max (
floatorNone, optional) – Maximum frequency in Hz. IfNone, the sampling ratesrmust be provided and we use the half of the sampling rate (default:None).sr (
floatorNone, optional) – The underlying sampling rate. Only used when using the filterbank (default:None).window (
strorFloatTensor, optional) – The window function to use for the FIR filter. Ifstris given, we create the window internally. It can be:"hann","hamming","blackman","bartlett", and"kaiser". IfFloatTensoris given, we use it as a window (default:"hann").**window_kwargs (
Dict[str, Any], optional) – Additional keyword arguments for the window function.
- forward(input_signals, log_magnitude)
Processes input audio with the processor and given parameters.
- Parameters:
input_signals (
FloatTensor, \(B \times C \times L\)) – A batch of input audio signals.log_magnitude (
FloatTensor, \(B \times C_\mathrm{eq} \times K\) or \(B \times C_\mathrm{eq} \times K_\mathrm{fb}\)) – A batch of log-magnitude vectors of the FIR filter.
- Returns:
A batch of output signals of shape \(B \times C \times L\).
- Return type:
FloatTensor
- parameter_size()
- Returns:
A dictionary that contains each parameter tensor’s shape.
- Return type:
Dict[str, Tuple[int, ...]]
- class ParametricEqualizer(num_filters=10, processor_channel='mono', use_shelving_filters=True, **backend_kwargs)
Bases:
ModuleA parametric equalizer (PEQ) based on second-order filters.
We cascade \(K\) biquad filters to form a parametric equalizer,
\[ H(z) = \prod_{k=1}^{K} H_k(z) \]By default, \(k=1\) and \(k=K\) are low-shelf and high-shelf filters, respectively, and the remainings are peaking filters. See
LowShelf,PeakingFilter, andHighShelffor the filter details.- Parameters:
num_filters (
int, optional) – The number of filters to use (default:10).processor_channel (
str, optional) – The channel configuration of the equalizer, which can be"mono","stereo", or"midside"(default:"mono").use_shelving_filters (
bool, optional) – Whether to use a low-shelf and high-shelf filter. If false, we use only peaking filters (default:True) (default:True).**backend_kwargs (
Dict[str, Any], optional) – Additional keyword arguments for the backend.
- forward(input_signals, w0, q_inv, log_gain)
Processes input audio with the processor and given parameters.
- Parameters:
input_signals (
FloatTensor, \(B \times C \times L\)) – A batch of input audio signals.w0 (
FloatTensor, \(B \times K\)) – A batch of cutoff frequencies.q_inv (
FloatTensor, \(B \times K\)) – A batch of quality factors (or resonance).log_gain (
FloatTensor, \(B \times K\)) – A batch of log-gains.
- Returns:
A batch of output signals of shape \(B \times C \times L\).
- Return type:
FloatTensor
- parameter_size()
- Returns:
A dictionary that contains each parameter tensor’s shape.
- Return type:
Dict[str, Tuple[int, ...]]
- class GraphicEqualizer(processor_channel='mono', scale='bark', sr=44100, **backend_kwargs)
Bases:
ModuleA graphic equalizer (GEQ) based on second-order peaking filters [LV+17].
We cascade \(K\) biquad filters to form a graphic equalizer, whose transfer function is given as \(H(z) = \prod_{k=1}^{K} H_k(z)\) where each biquad \(H_k(z)\) is as follows,
\[ H_k(z)=\frac{1+g_k \beta_k-2 \cos (\omega_k) z^{-1}+(1-g_k \beta_k) z^{-2}}{1+\beta_k-2 \cos (\omega_k) z^{-1}+(1-\beta_k) z^{-2}}. \]Here, \(g_k\) is the linear gain and \(\omega_k\) is the center frequency. \(\beta_k\) is given as
\[ \beta_k = \sqrt{\frac{\left|\tilde{g}_k^2-1\right|}{\left|g_k^2-\tilde{g}_k^2\right|}} \tan {\frac{B_k}{2}} \]where \(B_k\) is the bandwidth frequency and \(\tilde{g}_k\) is the gain at the neighboring band frequency, pre-determined to be \(\tilde{g}_k = g_k^{0.4}\). The frequency values (\(\omega_k\) and \(B_k\)) and the number of bands \(K\) are also determined by the frequency scale. The learnable parameter is a concatenation of the log-magnitudes, i.e., \(\smash{p = \{ \mathbf{g}^{\mathrm{log}} \}}\) where \(\smash{g_k = \exp g_k^{\mathrm{log}}}\).
Note that the log-gain parameters are different to the equalizer’s log-magnitude response values at the center frequencies known as “control points”. To set the log-gains to match the control points, we can use least-square optimization methods [LV+17, VR19].
- Parameters:
scale (
str, optional) – The frequency scale to use, which can be: 24-band"bark"and 31-band"third_oct"(default:"bark").sr (
int, optional) – The underlying sampling rate of the input signal (default:44100).backend (
str, optional) – The backend to use for the filtering, which can either be the frequency-sampling method"fsm"or exact time-domain filter"lfilter"(default:"fsm").fsm_fir_len (
int, optional) – The length of FIR approximation whenbackend == "fsm"(default:8192).
- forward(input_signals, log_gains)
Processes input audio with the processor and given parameters.
- Parameters:
input_signals (
FloatTensor, \(B \times C \times L\)) – A batch of input audio signals.log_gains (
FloatTensor, \(B \times K \:\!\)) – A batch of log-gain vectors of the GEQ.
- Returns:
A batch of output signals of shape \(B \times C \times L\).
- Return type:
FloatTensor
- parameter_size()
- Returns:
A dictionary that contains each parameter tensor’s shape.
- Return type:
Dict[str, Tuple[int, ...]]