grafx.processors.container
- class DryWet(processor, external_param=True)
Bases:
Module
An utility module that mixes the input (dry) with the wrapped processor’s output (wet).
For each pair of input \(u[n]\) and output signal \(y[n] = f(u[n], p)\) where \(f\) and \(p\) denote the wrapped processor and the parameters, respectively, we mix the input and output with a dry/wet mix \(0 < w < 1\) as follows,
\[ y[n] = (1 - w)u[n] + w y[n]. \]Here, the dry/wet is further parameterized as \(w = \sigma(z_w)\) where \(z_w\) is an unbounded logit and \(\sigma\) is logistic sigmoid. Hence, this processor’s learnable parameter is \(p \cup \{z_w\}\).
- Parameters:
processor (
Module
) – Any SISO processor withforward
andparameter_size
method implemented properly.external_param (
bool
, optional) – If set toTrue
, we do not add our dry/wet weight shape to theparameter_size
method. This is useful when every processor usesDryWet
and it is more convinient to have a single dry/wet tensor for entire nodes instead of keeping a tensor for each type (default:True
).
- forward(input_signals, drywet_weight, **processor_kwargs)
Processes input audio with the processor and given parameters.
- Parameters:
input_signals (
FloatTensor
, \(B \times C \times L\)) – A batch of input audio signals that will be passed to the processor.**processor_kwargs (optional) – Keyword arguments (i.e., mostly parameters) that will be passed to the processor.
- Returns:
A batch of output signals of shape \(B \times C \times L\).
- Return type:
FloatTensor
- parameter_size()
- Returns:
The wrapped processor’s
parameter_size()
, optionally added with the dry/wet weight whenexternal_param
is set toFalse
.- Return type:
Dict[str, Tuple[int, ...]]
- class SerialChain(processors)
Bases:
Module
A utility module that serially connects the provided processors.
For processors \(f_1, \cdots, f_K\) with their respective parameters \(p_1, \cdots, p_K\), the serial chain \(f = f_K \circ \cdots \circ f_1\) applies each processor in order, where the output of the previous processor is fed to the next one.
\[ y[n] = (f_K \circ \cdots \circ f_1)(s[n]; p_1, \cdots, p_K). \]The set of all learnable parameters is given as \(p = \{p_1, \cdots, p_K\}\).
Note that, from the audio processing perspective, exactly the same result can be achieved by connecting the processors \(f_1, \cdots, f_K\) as individual nodes in a graph. Yet, this module can be useful when we use the same chain of processors repeatedly so that encapsulating them in a single node is more convenient.
- Parameters:
processors (
Dict[str, Module]
) – A dictionary of processors with their names as keys. The order of the processors will be the same as the dictionary order. We assume that each processor hasforward()
andparameter_size()
method implemented properly.
- forward(input_signals, **processors_kwargs)
Processes input audio with the processor and given parameters.
- Parameters:
input_signals (
FloatTensor
, \(B \times C \times L\)) – A batch of input audio signals.**processors_kwargs (optional) – Keyword arguments (i.e., mostly parameters) that will be passed to the processor.
- Returns:
A batch of output signals of shape \(B \times C \times L\).
- Return type:
Tuple[FloatTensor, Dict[str, Any]]
- parameter_size()
- Returns:
A nested dictionary of depth at least 2 that contains each processor name as key and its
parameter_size()
as value.- Return type:
Dict[str, Dict[str, Union[dict, Tuple[int, ...]]]]
- class ParallelMix(processors, activation='softmax')
Bases:
Module
A container that mixes the multiple processor outputs.
We create a single processor with \(K\) processors \(f_1, \cdots, f_K\), mixing their outputs with weights \(w_1, \cdots, w_K\).
\[ y[n] = \sum_{k=1}^K w_k f_k(s[n]; p_k). \]By default, we take the pre-activation weights \(\tilde{w}_1, \cdots, \tilde{w}_K\) as input. Then, for each \(\tilde{w}_k\), we apply \(w_k = \log (1 + \exp(\tilde{w}_k)) / K \log{2}\), making it non-negative and have value of \(1/K\) if the pre-activation input is near zero. Also, we can force the weights to have a sum of 1 by applying softmax, \(w_k = \exp(\tilde{w}_k)/\sum_{i=1}^K \exp(\tilde{w}_i)\). This resembles the Differentiable architecture search (DARTS) [LSY19], if our aim is to select the best one among the \(K\) processors. The set of all learnable parameters is given as \(p = \{\tilde{\mathbf{w}}, p_1, \cdots, p_K\}\).
- forward(input_signals, parallel_weights, **processors_kwargs)
Processes input audio with the processor and given parameters.
- Parameters:
input_signals (
FloatTensor
, \(B \times C \times L\)) – A batch of input audio signals.log_gains (
FloatTensor
, \(B \times K \:\!\)) – A batch of log-gain vectors of the GEQ.
- Returns:
A batch of output signals of shape \(B \times C \times L\).
- Return type:
FloatTensor
- parameter_size()
- Returns:
A nested dictionary of depth at least 2 that contains each processor name as key and its
parameter_size()
as value.- Return type:
Dict[str, Dict[str, Union[dict, Tuple[int, ...]]]]
- class GainStagingRegularization(processor, key='gain_reg')
Bases:
Module
A regularization module that wraps an audio processor and calculates the energy differences between the input and output audio. It can be used guide the processors to mimic gain-staging, a practice that aims to keep the signal energy roughly the same throughout the processing chain.
For each pair of input \(u[n]\) and output signal \(y[n] = f(u[n], p)\) where \(f\) and \(p\) denote the wrapped processor and the parameters, respeectively, we calculate their loudness difference with an energy function \(\sigma\) as follows,
\[ d = \left| g(y[n]) - g(u[n]) \right|. \]The energy function \(g\) computes log of mean energy across the time and channel axis. If the signals are stereo, then it is equivalent to calculating the log of mid-channel energy.
- Parameters:
processor (
Module
) – Any SISO processor withforward
andparameter_size
method implemented properly.key (
str
, optional) – A dictionary key that will be used to store energy difference in the intermediate results. (default:"gain_reg"
)
- forward(input_signals, **processor_kwargs)
Processes input audio with the processor and given parameters.
- Parameters:
input_signals (
FloatTensor
, \(B \times C \times L\)) – A batch of input audio signals that will be passed to the processor.**processor_kwargs (optional) – Keyword arguments (i.e., mostly parameters) that will be passed to the processor.
- Returns:
A batch of output signals of shape \(B \times C \times L\) and dictionary of intermediate/auxiliary results added with the regularization loss.
- Return type:
Tuple[FloatTensor, dict]
- parameter_size()
- Returns:
The wrapped processor’s
parameter_size()
.- Return type:
Dict[str, Tuple[int, ...]]