grafx.processors.eq

class ZeroPhaseFIREqualizer(num_magnitude_bins=1024)

Bases: Module

A single-channel zero-phase finite impulse response (FIR) filter [EHGR20, Smi07b, Smi11].

From the input log-magnitude \(H_{\mathrm{log}}\), we compute inverse FFT (IFFT) of the magnitude response and multiply it with a zero-centered window \(v[n]\). Each input channel is convolved with the following FIR.

\[ h[n] = v[n] \cdot \frac{1}{N} \sum_{k=0}^{N-1} \exp H_{\mathrm{log}}[k] \cdot w_{N}^{kn}. \]

Here, \(-(N+1)/2 \leq n \leq (N+1)/2\) and \(w_{N} = \exp(j\cdot 2\pi/N)\). This equalizer’s learnable parameter is \(p = \{ H_{\mathrm{log}} \}\).

Parameters:
  • num_magnitude_bins (int, optional) – The number of FFT magnitude bins (default: 1024).

  • window (str or FloatTensor, optional) – The window function to use for the FIR filter. If str is given, we create the window internally. It can be: "hann", "hamming", "blackman", "bartlett", and "kaiser". If FloatTensor is given, we use it as a window (default: "hann").

  • **window_kwargs (Dict[str, Any], optional) – Additional keyword arguments for the window function.

forward(input_signals, log_magnitude)

Processes input audio with the processor and given parameters.

Parameters:
  • input_signals (FloatTensor, \(B \times C \times L\)) – A batch of input audio signals.

  • log_magnitude (FloatTensor, \(B \times K \:\!\)) – A batch of log-magnitude vectors of the FIR filter.

Returns:

A batch of output signals of shape \(B \times C \times L\).

Return type:

FloatTensor

parameter_size()
Returns:

A dictionary that contains each parameter tensor’s shape.

Return type:

Dict[str, Tuple[int, ...]]

class NewZeroPhaseFIREqualizer(num_frequency_bins=1024, processor_channel='mono', use_filterbank=False, filterbank_kwargs={}, window='hann', window_kwargs={}, eps=1e-07, flashfftconv=False)

Bases: Module

A single-channel zero-phase finite impulse response (FIR) filter [EHGR20, Smi07b, Smi11].

From the input log-magnitude \(H_{\mathrm{log}}\), we compute inverse FFT (IFFT) of the magnitude response and multiply it with a zero-centered window \(w[n]\). Each input channel is convolved with the following FIR.

\[ h[n] = w[n] \cdot \frac{1}{N} \sum_{k=0}^{N-1} \exp H_{\mathrm{log}}[k] \cdot z_{N}^{kn}. \]

Here, \(-(N+1)/2 \leq n \leq (N+1)/2\) and \(z_{N} = \exp(j\cdot 2\pi/N)\). This equalizer’s learnable parameter is \(p = \{ H_{\mathrm{log}} \}\).

From the input log-energy \(H_{\mathrm{fb}} \in \mathbb{R}^{K_{\mathrm{fb}}}\), we compute the FFT magnitudes as

\[ H_{\mathrm{log}} = \sqrt { M \exp (H_{\mathrm{fb}}) + \epsilon} \]

where \(M \in \mathbb{R}^{K \times K_{\mathrm{fb}}}\) is the filterbank matrix (\(K\) and \(K_{\mathrm{fb}}\) are the number of FFT magnitude bins and filterbank bins, respectively). We use the standard triangular filterbank. This equalizer’s learnable parameter is \(p = \{ H_{\mathrm{fb}} \}\).

Parameters:
  • num_frequency_bins (int, optional) – The number of FFT energy bins (default: 1024).

  • processor_channel (str, optional) – The channel configuration of the equalizer, which can be "mono", "stereo", "midside", or "pseudo_midside" (default: "mono").

  • filterbank (bool, optional) – Whether to use the filterbank (default: False).

  • scale (str, optional) – The frequency scale to use, which can be: "bark_traunmuller", "bark_schroeder", "bark_wang", "mel_htk", "mel_slaney", "linear", and "log" (default: "bark_traunmuller").

  • n_filters (int, optional) – Number of filterbank bins (default: 80).

  • f_min (float, optional) – Minimum frequency in Hz. (default: 40).

  • f_max (float or None, optional) – Maximum frequency in Hz. If None, the sampling rate sr must be provided and we use the half of the sampling rate (default: None).

  • sr (float or None, optional) – The underlying sampling rate. Only used when using the filterbank (default: None).

  • window (str or FloatTensor, optional) – The window function to use for the FIR filter. If str is given, we create the window internally. It can be: "hann", "hamming", "blackman", "bartlett", and "kaiser". If FloatTensor is given, we use it as a window (default: "hann").

  • **window_kwargs (Dict[str, Any], optional) – Additional keyword arguments for the window function.

forward(input_signals, log_magnitude)

Processes input audio with the processor and given parameters.

Parameters:
  • input_signals (FloatTensor, \(B \times C \times L\)) – A batch of input audio signals.

  • log_magnitude (FloatTensor, \(B \times C_\mathrm{eq} \times K\) or \(B \times C_\mathrm{eq} \times K_\mathrm{fb}\)) – A batch of log-magnitude vectors of the FIR filter.

Returns:

A batch of output signals of shape \(B \times C \times L\).

Return type:

FloatTensor

parameter_size()
Returns:

A dictionary that contains each parameter tensor’s shape.

Return type:

Dict[str, Tuple[int, ...]]

class ParametricEqualizer(num_filters=10, processor_channel='mono', use_shelving_filters=True, **backend_kwargs)

Bases: Module

A parametric equalizer (PEQ) based on second-order filters.

We cascade \(K\) biquad filters to form a parametric equalizer,

\[ H(z) = \prod_{k=1}^{K} H_k(z) \]

By default, \(k=1\) and \(k=K\) are low-shelf and high-shelf filters, respectively, and the remainings are peaking filters. See LowShelf, PeakingFilter, and HighShelf for the filter details.

Parameters:
  • num_filters (int, optional) – The number of filters to use (default: 10).

  • processor_channel (str, optional) – The channel configuration of the equalizer, which can be "mono", "stereo", or "midside" (default: "mono").

  • use_shelving_filters (bool, optional) – Whether to use a low-shelf and high-shelf filter. If false, we use only peaking filters (default: True) (default: True).

  • **backend_kwargs (Dict[str, Any], optional) – Additional keyword arguments for the backend.

forward(input_signals, w0, q_inv, log_gain)

Processes input audio with the processor and given parameters.

Parameters:
  • input_signals (FloatTensor, \(B \times C \times L\)) – A batch of input audio signals.

  • w0 (FloatTensor, \(B \times K\)) – A batch of cutoff frequencies.

  • q_inv (FloatTensor, \(B \times K\)) – A batch of quality factors (or resonance).

  • log_gain (FloatTensor, \(B \times K\)) – A batch of log-gains.

Returns:

A batch of output signals of shape \(B \times C \times L\).

Return type:

FloatTensor

parameter_size()
Returns:

A dictionary that contains each parameter tensor’s shape.

Return type:

Dict[str, Tuple[int, ...]]

class GraphicEqualizer(processor_channel='mono', scale='bark', sr=44100, **backend_kwargs)

Bases: Module

A graphic equalizer (GEQ) based on second-order peaking filters [LV+17].

We cascade \(K\) biquad filters to form a graphic equalizer, whose transfer function is given as \(H(z) = \prod_{k=1}^{K} H_k(z)\) where each biquad \(H_k(z)\) is as follows,

\[ H_k(z)=\frac{1+g_k \beta_k-2 \cos (\omega_k) z^{-1}+(1-g_k \beta_k) z^{-2}}{1+\beta_k-2 \cos (\omega_k) z^{-1}+(1-\beta_k) z^{-2}}. \]

Here, \(g_k\) is the linear gain and \(\omega_k\) is the center frequency. \(\beta_k\) is given as

\[ \beta_k = \sqrt{\frac{\left|\tilde{g}_k^2-1\right|}{\left|g_k^2-\tilde{g}_k^2\right|}} \tan {\frac{B_k}{2}} \]

where \(B_k\) is the bandwidth frequency and \(\tilde{g}_k\) is the gain at the neighboring band frequency, pre-determined to be \(\tilde{g}_k = g_k^{0.4}\). The frequency values (\(\omega_k\) and \(B_k\)) and the number of bands \(K\) are also determined by the frequency scale. The learnable parameter is a concatenation of the log-magnitudes, i.e., \(\smash{p = \{ \mathbf{g}^{\mathrm{log}} \}}\) where \(\smash{g_k = \exp g_k^{\mathrm{log}}}\).

Note that the log-gain parameters are different to the equalizer’s log-magnitude response values at the center frequencies known as “control points”. To set the log-gains to match the control points, we can use least-square optimization methods [LV+17, VR19].

Parameters:
  • scale (str, optional) – The frequency scale to use, which can be: 24-band "bark" and 31-band "third_oct" (default: "bark").

  • sr (int, optional) – The underlying sampling rate of the input signal (default: 44100).

  • backend (str, optional) – The backend to use for the filtering, which can either be the frequency-sampling method "fsm" or exact time-domain filter "lfilter" (default: "fsm").

  • fsm_fir_len (int, optional) – The length of FIR approximation when backend == "fsm" (default: 8192).

forward(input_signals, log_gains)

Processes input audio with the processor and given parameters.

Parameters:
  • input_signals (FloatTensor, \(B \times C \times L\)) – A batch of input audio signals.

  • log_gains (FloatTensor, \(B \times K \:\!\)) – A batch of log-gain vectors of the GEQ.

Returns:

A batch of output signals of shape \(B \times C \times L\).

Return type:

FloatTensor

parameter_size()
Returns:

A dictionary that contains each parameter tensor’s shape.

Return type:

Dict[str, Tuple[int, ...]]