grafx.processors.nonlinear

class TanhDistortion(pre_post_gain=True, inverse_post_gain=True, remove_dc=False, use_bias=False)

Bases: Module

A simple distortion processor based on the hyperbolic tangent function. [PP24]

In the simplest setting, the processor only applies the hyperbolic tangent function to the input signal: \( y[n] = \tanh(u[n]). \) The processor can be set to apply pre-gain \(g_{\mathrm{pre}}\) and post-gain \(g_{\mathrm{post}}\) before and after the nonlinearity, respectively. We can also add bias \(b\) for asymmetric and increased distortion. The full processing is then given by

\[ y[n] = g_{\mathrm{post}} (\tanh(g_{\mathrm{pre}} \cdot u[n] + b) - \tanh b ). \]

This processor’s parameters are \(p = \{\tilde{g}_{\mathrm{pre}}, \tilde{g}_{\mathrm{post}}, b\}\), where \(\tilde{g}_{\mathrm{pre}} = \log g_{\mathrm{pre}}\) and \(\tilde{g}_{\mathrm{post}} = \log g_{\mathrm{post}}\). Depending on the initializations, each parameter can be omitted.

Parameters:
  • pre_post_gain (bool, optional) – If True, we apply the pre- and post-gain (default: True).

  • inverse_post_gain (bool, optional) – If True, we set the post-gain as an inverse of the pre-gain (default: True).

  • remove_dc (bool, optional) – If True, we pre-process the input signal to remove the DC component (default: False).

  • use_bias (bool, optional) – If True, we apply the bias term (default: False).

forward(input_signals, log_pre_gain=None, log_post_gain=None, bias=None)

Processes input audio with the processor and given parameters.

Parameters:
  • input_signals (FloatTensor, \(B \times C \times L\)) – A batch of input audio signals.

  • log_pre_gain (FloatTensor, \(B \times 1\), optional) – A batch of log pre-gain values, only required if pre_post_gain is True (default: None).

  • log_post_gain (FloatTensor, \(B \times 1\), optional) – A batch of log post-gain values, only required if both pre_post_gain and inverse_post_gain are False (default: None).

  • bias (FloatTensor, \(B \times 1\), optional) – A batch of bias values, only required if use_bias is True (default: None).

Returns:

A batch of output signals of shape \(B \times C \times L\).

Return type:

FloatTensor

parameter_size()
Returns:

A dictionary that contains each parameter tensor’s shape.

Return type:

Dict[str, Tuple[int, ...]]

class PiecewiseTanhDistortion(pre_post_gain=True, inverse_post_gain=True, remove_dc=False)

Bases: Module

A distortion processor based on the piecewise hyperbolic tangent function [Eic20].

The nonlinearity is split into three parts. The middle part is the standard hyperbolic tangent function, and the other two parts are its scaled and shifted versions. From two segment thresholds \(0 < k_p, k_n < 1\) and hardness controls \(h_p, h_n > 0\), the nonlinearity is given as

\[\begin{split} \xi(u[n]) = \begin{cases} a_p \cdot \tanh \left(h_p \cdot\left(u[n]-k_p\right)\right)+b_p & k_p < u[n], \\ \tanh (u[n]) & -k_n \leq u[n] \leq k_p, \\ a_n \cdot \tanh \left(h_n \cdot\left(u[n]+k_n\right)\right)+b_n & u[n]<-k_n \end{cases} \end{split}\]

where \(a_p = (1-\tanh^2 k_p) / h_p\), \(a_n = (1-\tanh^2 k_n) / h_n\), \(b_p = \tanh k_p\), and \(b_n = -\tanh k_n\). In the simplest setting, the output is given as \(y[n] = \xi(u[n])\). Same as TanhDistortion, we can optionally apply pre- and post-gain as \(y[n] = g_{\mathrm{post}} \cdot \xi(g_{\mathrm{pre}} \cdot u[n])\). This processor has parameters of \(\smash{p = \{\tilde{g}_{\mathrm{pre}}, \tilde{g}_{\mathrm{post}}, \smash{\tilde{\mathbf{k}}}, \tilde{\mathbf{h}}\}}\), where \(\smash{\tilde{\mathbf{k}} = [\tilde{k}_p, \tilde{k}_n]}\) and \(\smash{\tilde{\mathbf{h}} = [\tilde{h}_p, \tilde{h}_n]}\). The internal parameters are recovered with \(\smash{k_p = \sigma (\tilde{k}_p)}\), \(\smash{k_n = \sigma (\tilde{k}_n)}\), \(\smash{h_p = \exp \tilde{h}_p}\), and \(\smash{h_n = \exp \tilde{h}_n}\).

forward(input_signals, log_hardness, z_threshold, log_pre_gain=None, log_post_gain=None)

Processes input audio with the processor and given parameters.

Parameters:
  • input_signals (FloatTensor, \(B \times C \times L\)) – A batch of input audio signals.

  • log_hardness (FloatTensor, \(B \times 2\)) – A batch of hardness controls stacked to the last dimension.

  • z_threshold (FloatTensor, \(B \times 2\)) – A batch of threshold values stacked to the last dimension.

  • log_pre_gain (FloatTensor, \(B \times 1\), optional) – A batch of log pre-gain values, only required if pre_post_gain is True (default: None).

  • log_post_gain (FloatTensor, \(B \times 1\), optional) – A batch of log post-gain values, only required if both pre_post_gain and inverse_post_gain are False (default: None).

Returns:

A batch of output signals of shape \(B \times C \times L\).

Return type:

FloatTensor

parameter_size()
Returns:

A dictionary that contains each parameter tensor’s shape.

Return type:

Dict[str, Tuple[int, ...]]

class PowerDistortion(max_order=10, pre_gain=True, remove_dc=False, use_tanh=False)

Bases: Module

A distortion processor based on polynomials [PP24].

The distortion output is simply give as an elementwise (memoryless) polynomial of each sample of the input signal.

\[ y[n] = \sum_{k=0}^{K-1} w_k u^k[n]. \]

where \(w_k\) is the polynomial coefficients, which we also call basis_weights in this implementation. We allow optional use of hypertangent and its pre-gain after the polynomial [CCR22]. In this case, the output will be given as follows,

\[ y[n] = \sum_{k=0}^{K-1} w_k \tanh (g_{\mathrm{pre}} u^k[n]). \]

The processor has parameters of \(p = \{\mathbf{w}, \tilde{\mathbf{g}}_{\mathrm{pre}}\}\), where the former is a stack of the coefficients and the latter is the optional log pre-gain values.

forward(input_signals, basis_weights, log_pre_gain=None)

Processes input audio with the processor and given parameters.

Parameters:
  • input_signals (FloatTensor, \(B \times C \times L\)) – A batch of input audio signals.

  • basis_weights (FloatTensor, \(B \times K\)) – A batch of polynomial coefficients.

  • log_pre_gain (FloatTensor, \(B \times 1\), optional) – A batch of log pre-gain values, only required if pre_gain is True (default: None).

Returns:

A batch of output signals of shape \(B \times C \times L\).

Return type:

FloatTensor

parameter_size()
Returns:

A dictionary that contains each parameter tensor’s shape.

Return type:

Dict[str, Tuple[int, ...]]

class ChebyshevDistortion(max_order=10, pre_gain=True, remove_dc=False, use_tanh=False)

Bases: Module

A distortion processor based on Chebyshev polynomials [PP24].

We combine the outputs of Chebyshev polynomials of the input signal.

\[ y[n] = \sum_{k=0}^{K-1} w_k T_k(u[n]). \]

where \(T_k(u[n])\) is the Chebyshev polynomial of order \(k\) defined as follows,

\[\begin{split} \begin{aligned} T_0(u[n])&=1, \\ T_1(u[n])&=u[n], \\ T_k(u[n])&=2 x T_{k-1}(u[n])-T_{k-2}(u[n]) \end{aligned} \end{split}\]

Same as the PowerDistortion, we allow optional use of hypertangent and its pre-gain after the Chebyshev [CCR22].

\[ y[n] = \sum_{k=0}^{K-1} w_k \tanh T_k(g_{\mathrm{pre}} u[n]). \]

The processor has parameters of \(p = \{\mathbf{w}, \tilde{\mathbf{g}}_{\mathrm{pre}}\}\), where the latter is optional.

forward(input_signals, basis_weights, log_pre_gain=None)

Processes input audio with the processor and given parameters.

Parameters:
  • input_signals (FloatTensor, \(B \times C \times L\)) – A batch of input audio signals.

  • basis_weights (FloatTensor, \(B \times K\)) – A batch of Chebyshev polynomial coefficients.

  • log_pre_gain (FloatTensor, \(B \times 1\), optional) – A batch of log pre-gain values, only required if pre_gain is True (default: None).

Returns:

A batch of output signals of shape \(B \times C \times L\).

Return type:

FloatTensor

parameter_size()
Returns:

A dictionary that contains each parameter tensor’s shape.

Return type:

Dict[str, Tuple[int, ...]]