grafx.processors.container

class DryWet(processor, external_param=True)

Bases: Module

An utility module that mixes the input (dry) with the wrapped processor’s output (wet).

For each pair of input \(u[n]\) and output signal \(y[n] = f(u[n], p)\) where \(f\) and \(p\) denote the wrapped processor and the parameters, respectively, we mix the input and output with a dry/wet mix \(0 < w < 1\) as follows,

\[ y[n] = (1 - w)u[n] + w y[n]. \]

Here, the dry/wet is further parameterized as \(w = \sigma(z_w)\) where \(z_w\) is an unbounded logit and \(\sigma\) is logistic sigmoid. Hence, this processor’s learnable parameter is \(p \cup \{z_w\}\).

Parameters:

processor (Module) – Any SISO processor with forward and parameter_size method implemented properly.
external_param (bool, optional) – If set to True, we do not add our dry/wet weight shape to the parameter_size method. This is useful when every processor uses DryWet and it is more convinient to have a single dry/wet tensor for entire nodes instead of keeping a tensor for each type (default: True).

forward(input_signals, drywet_weight, **processor_kwargs)

Processes input audio with the processor and given parameters.

Parameters:

input_signals (FloatTensor, \(B \times C \times L\)) – A batch of input audio signals that will be passed to the processor.
**processor_kwargs (optional) – Keyword arguments (i.e., mostly parameters) that will be passed to the processor.

Returns:

A batch of output signals of shape \(B \times C \times L\).

Return type:

FloatTensor

parameter_size()

Returns:: The wrapped processor’s parameter_size(), optionally added with the dry/wet weight when external_param is set to False.
Return type:: Dict[str, Tuple[int, ...]]

class SerialChain(processors)

Bases: Module

A utility module that serially connects the provided processors.

For processors \(f_1, \cdots, f_K\) with their respective parameters \(p_1, \cdots, p_K\), the serial chain \(f = f_K \circ \cdots \circ f_1\) applies each processor in order, where the output of the previous processor is fed to the next one.

\[ y[n] = (f_K \circ \cdots \circ f_1)(s[n]; p_1, \cdots, p_K). \]

The set of all learnable parameters is given as \(p = \{p_1, \cdots, p_K\}\).

Note that, from the audio processing perspective, exactly the same result can be achieved by connecting the processors \(f_1, \cdots, f_K\) as individual nodes in a graph. Yet, this module can be useful when we use the same chain of processors repeatedly so that encapsulating them in a single node is more convenient.

Parameters:: processors (Dict[str, Module]) – A dictionary of processors with their names as keys. The order of the processors will be the same as the dictionary order. We assume that each processor has forward() and parameter_size() method implemented properly.

forward(input_signals, **processors_kwargs)

Processes input audio with the processor and given parameters.

Parameters:

input_signals (FloatTensor, \(B \times C \times L\)) – A batch of input audio signals.
**processors_kwargs (optional) – Keyword arguments (i.e., mostly parameters) that will be passed to the processor.

Returns:

A batch of output signals of shape \(B \times C \times L\).

Return type:

Tuple[FloatTensor, Dict[str, Any]]

parameter_size()

Returns:: A nested dictionary of depth at least 2 that contains each processor name as key and its parameter_size() as value.
Return type:: Dict[str, Dict[str, Union[dict, Tuple[int, ...]]]]

class ParallelMix(processors, activation='softmax')

Bases: Module

A container that mixes the multiple processor outputs.

We create a single processor with \(K\) processors \(f_1, \cdots, f_K\), mixing their outputs with weights \(w_1, \cdots, w_K\).

\[ y[n] = \sum_{k=1}^K w_k f_k(s[n]; p_k). \]

By default, we take the pre-activation weights \(\tilde{w}_1, \cdots, \tilde{w}_K\) as input. Then, for each \(\tilde{w}_k\), we apply \(w_k = \log (1 + \exp(\tilde{w}_k)) / K \log{2}\), making it non-negative and have value of \(1/K\) if the pre-activation input is near zero. Also, we can force the weights to have a sum of 1 by applying softmax, \(w_k = \exp(\tilde{w}_k)/\sum_{i=1}^K \exp(\tilde{w}_i)\). This resembles the Differentiable architecture search (DARTS) [LSY19], if our aim is to select the best one among the \(K\) processors. The set of all learnable parameters is given as \(p = \{\tilde{\mathbf{w}}, p_1, \cdots, p_K\}\).

forward(input_signals, parallel_weights, **processors_kwargs)

Processes input audio with the processor and given parameters.

Parameters:

input_signals (FloatTensor, \(B \times C \times L\)) – A batch of input audio signals.
log_gains (FloatTensor, \(B \times K \:\!\)) – A batch of log-gain vectors of the GEQ.

Returns:

A batch of output signals of shape \(B \times C \times L\).

Return type:

FloatTensor

parameter_size()

Returns:: A nested dictionary of depth at least 2 that contains each processor name as key and its parameter_size() as value.
Return type:: Dict[str, Dict[str, Union[dict, Tuple[int, ...]]]]

class GainStagingRegularization(processor, key='gain_reg')

Bases: Module

A regularization module that wraps an audio processor and calculates the energy differences between the input and output audio. It can be used guide the processors to mimic gain-staging, a practice that aims to keep the signal energy roughly the same throughout the processing chain.

For each pair of input \(u[n]\) and output signal \(y[n] = f(u[n], p)\) where \(f\) and \(p\) denote the wrapped processor and the parameters, respeectively, we calculate their loudness difference with an energy function \(\sigma\) as follows,

\[ d = \left| g(y[n]) - g(u[n]) \right|. \]

The energy function \(g\) computes log of mean energy across the time and channel axis. If the signals are stereo, then it is equivalent to calculating the log of mid-channel energy.

Parameters:

processor (Module) – Any SISO processor with forward and parameter_size method implemented properly.
key (str, optional) – A dictionary key that will be used to store energy difference in the intermediate results. (default: "gain_reg")

forward(input_signals, **processor_kwargs)

Processes input audio with the processor and given parameters.

Parameters:

input_signals (FloatTensor, \(B \times C \times L\)) – A batch of input audio signals that will be passed to the processor.
**processor_kwargs (optional) – Keyword arguments (i.e., mostly parameters) that will be passed to the processor.

Returns:

A batch of output signals of shape \(B \times C \times L\) and dictionary of intermediate/auxiliary results added with the regularization loss.

Return type:

Tuple[FloatTensor, dict]

parameter_size()

Returns:: The wrapped processor’s parameter_size().
Return type:: Dict[str, Tuple[int, ...]]