References

[BSEP23]

Anders R Bargum, Stefania Serafin, Cumhur Erkut, and Julian D Parker. Differentiable allpass filters for phase response estimation and automatic signal alignment. arXiv:2306.00860, 2023.

[BLC13]

Yoshua Bengio, Nicholas Leonard, and Aaron Courville. Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv:1308.3432, 2013.

[CKBB23]

Alistair Carson, Simon King, Cassia Valentini Botinhao, and Stefan Bilbao. Differentiable grey-box modelling of phaser effects using frame-based spectral processing. In DAFx. 2023.

[CMS22]

Franco Caspe, Andrew McPherson, and Mark Sandler. DDX7: differentiable FM synthesis of musical instrument sounds. In ISMIR. 2022.

[CWL+23]

Guangyu Chen, Yu Wu, Shujie Liu, Tao Liu, Xiaoyong Du, and Furu Wei. Wavmark: watermarking for audio generation. arXiv preprint arXiv:2308.12770, 2023.

[CZS23]

Hongrong Cheng, Miao Zhang, and Javen Qinfeng Shi. A survey on deep neural network pruning-taxonomy, comparison, analysis, and recommendations. arXiv:2308.06767, 2023.

[CKM+22]

Hyungjin Chung, Jeongsol Kim, Michael T Mccann, Marc L Klasky, and Jong Chul Ye. Diffusion posterior sampling for general noisy inverse problems. arXiv preprint arXiv:2209.14687, 2022.

[Col23]

J Colonel. Music production behaviour modelling. 2023.

[CCR22]

Joseph T Colonel, Marco Comunita, and Joshua Reiss. Reverse engineering memoryless distortion effects with differentiable waveshapers. In AES Convention 153. 2022.

[CR23]

Joseph T Colonel and Joshua Reiss. Reverse engineering a nonlinear mix of a multitrack recording. Journal of the AES, 71(9):586–595, 2023.

[CR21]

Joseph T. Colonel and Joshua Reiss. Reverse engineering of a recording mix with differentiable digital signal processing. JASA, 150(1):608–619, 2021.

[DSPSValimaki23]

Gloria Dal Santo, Karolina Prawda, Sebastian Schlecht, and Vesa Välimäki. Differentiable feedback delay network for colorless reverberation. In International Conference on Digital Audio Effects, 244–251. Aalborg University, 2023.

[Eic20]

Felix Eichas. System Identification of Nonlinear Audio Circuits. PhD thesis, Universität der Bundeswehr Hamburg, 2020.

[EHGR20]

Jesse Engel, Lamtharn (Hanoi) Hantrakul, Chenjie Gu, and Adam Roberts. DDSP: differentiable digital signal processing. In ICLR. 2020.

[FL19]

Matthias Fey and Jan Eric Lenssen. Fast graph representation learning with pytorch geometric. arXiv:1903.02428, 2019.

[FKNRe23]

Daniel Y Fu, Hermann Kumbong, Eric Nguyen, and Christopher Ré. FlashFFTConv: efficient convolutions for long sequences with tensor cores. ICLR, 2023.

[GMR12]

Dimitrios Giannoulis, Michael Massberg, and Joshua D Reiss. Digital dynamic range compressor design—a tutorial and analysis. JAES, 60(6):399–408, 2012.

[GM23]

Jinyue Guo and Brian McFee. Automatic recognition of cascaded guitar effects. In DAFx. 2023.

[HSS08]

Aric Hagberg, Pieter J Swart, and Daniel A Schult. Exploring network structure, dynamics, and function using networkx. Technical Report, Los Alamos National Laboratory (LANL), Los Alamos, NM (United States), 2008.

[HSF23]

Ben Hayes, Charalampos Saitis, and György Fazekas. Sinusoidal frequency estimation by gradient descent. In IEEE ICASSP, 1–5. 2023.

[HSF+23]

Ben Hayes, Jordie Shier, Gyorgy Fazekas, Andrew McPherson, and Charalampos Saitis. A review of differentiable digital signal processing for music & speech synthesis. Frontiers in Signal Process., pages 1284100, 2023.

[HX23]

Yang He and Lingao Xiao. Structured pruning for deep convolutional neural networks: a survey. arXiv:2303.00566, 2023.

[KPE20]

Boris Kuznetsov, Julian D Parker, and Fabián Esqueda. Differentiable iir filters for machine learning applications. In Proc. Int. Conf. Digital Audio Effects (eDAFx-20), 297–303. 2020.

[LMKGotz+24]

Kyung Yun Lee, Nils Meyer-Kahlen, Georg Götz, U Peter Svensson, Sebastian J Schlecht, and Vesa Välimäki. Fade-in reverberation in multi-room environments using the common-slope model. arXiv preprint arXiv:2407.13242, 2024.

[LCL22]

Sungho Lee, Hyeong-Seok Choi, and Kyogu Lee. Differentiable artificial reverberation. IEEE/ACM TASLP, 30:2541–2556, 2022.

[LMRL+24a]

Sungho Lee, Marco A Martinez-Ramirez, Wei-Hsiang Liao, Stefan Uhlich, Giorgio Fabbro, Kyogu Lee, and Yuki Mitsufuji. Searching for music mixing graphs: a pruning approach. In DAFx. 2024.

[LMRL+24b]

Sungho Lee, Marco A Martinez-Ramirez, Wei-Hsiang Liao, Stefan Uhlich, Giorgio Fabbro, Kyogu Lee, and Yuki Mitsufuji. GRAFX: an open-source library for audio processing graphs in Pytorch. In DAFx. 2024.

[LPPL23]

Sungho Lee, Jaehyun Park, Seungryeol Paik, and Kyogu Lee. Blind estimation of audio processing graph. In IEEE ICASSP, 1–5. 2023.

[LV+17]

Juho Liski, Vesa Valimaki, and others. The quest for the best graphic equalizer. In Proc. Int. Conf. Digital Audio Effects (DAFx-17), Edinburgh, UK, 95–102. 2017.

[LSY19]

Hanxiao Liu, Karen Simonyan, and Yiming Yang. DARTS: differentiable architecture search. In ICLR. 2019.

[MC18]

Eric Martin and Chris Cundy. Parallelizing linear recurrent neural nets over sequence length. In International Conference on Learning Representations. 2018.

[MRBR20]

Marco A Martinez Ramirez, Emmanouil Benetos, and Joshua D Reiss. Deep learning for black-box modeling of audio effects. Applied Sciences, 10(2):638, 2020.

[MRWSB21]

Marco A Martinez-Ramirez, Oliver Wang, Paris Smaragdis, and Nicholas J Bryan. Differentiable signal processing with black-box audio effects. In IEEE ICASSP. 2021.

[MS23]

Naotake Masuda and Daisuke Saito. Improving semi-supervised differentiable synthesizer sound matching for practical applications. IEEE/ACM TASLP, 31:863–875, 2023.

[MK21]

Christopher Mitcheltree and Hideki Koike. SerumRNN: step by step audio VST effect programming. In Artificial Intelligence in Music, Sound, Art and Design, pages 218–234. 2021.

[Ner20]

Shahan Nercessian. Neural parametric equalizer matching using differentiable biquads. In DAFx, 265–272. 2020.

[NSW21]

Shahan Nercessian, Andy Sarroff, and Kurt James Werner. Lightweight and interpretable neural modeling of an audio distortion effect using hyperconditioned differentiable biquads. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 890–894. IEEE, 2021.

[P+19]

Adam Paszke and others. PyTorch: an imperative style, high-performance deep learning library. NeurIPS, 2019.

[PP24]

Côme Peladeau and Geoffroy Peeters. Blind estimation of audio effects using an auto-encoder approach and differentiable digital signal processing. In ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 856–860. IEEE, 2024.

[RGM70]

L. Rabiner, B. Gold, and C. McGonegal. An approach to the approximation problem for nonrecursive digital filters. IEEE Transactions on Audio and Electroacoustics, 18(2):83–106, 1970.

[RFDefossez+24]

Robin San Roman, Pierre Fernandez, Alexandre Défossez, Teddy Furon, Tuan Tran, and Hady Elsahar. Proactive detection of voice cloning with localized watermarking. arXiv preprint arXiv:2401.17264, 2024.

[SM23]

Simon Schwar and Meinard Muller. Multi-scale spectral loss revisited. IEEE Signal Processing Letters, 30:1712–1716, 2023.

[Smi07a]

Julius O Smith. Mathematics of the discrete Fourier transform (DFT): with audio applications. Julius Smith, 2007.

[Smi07b]

Julius Orion Smith. Introduction to digital filters: with audio applications. Volume 2. Julius Smith, 2007.

[Smi11]

Julius Orion Smith. Spectral Audio Signal Processing. Volume 4. Julius Smith, 2011.

[SBR22]

Christian J Steinmetz, Nicholas J Bryan, and Joshua D Reiss. Style transfer of audio effects with differentiable signal processing. JAES, 70(9):708–721, 2022.

[SIC21]

Christian J Steinmetz, Vamsi Krishna Ithapu, and Paul Calamia. Filtered noise shaping for time domain room impulse response estimation from reverberant speech. In WASPAA. IEEE, 2021.

[SWR23]

Christian J Steinmetz, Thomas Walther, and Joshua D Reiss. High-fidelity noise reduction with differentiable signal processing. In AES Convention 155. 2023.

[SPPS21]

Christian J. Steinmetz, Jordi Pons, Santiago Pascual, and Joan Serrà. Automatic multitrack mixing with a differentiable mixing console of neural audio effects. In IEEE ICASSP, volume, 71–75. 2021.

[TPR24]

Bernardo Torres, Geoffroy Peeters, and Gaël Richard. Unsupervised harmonic parameter estimation using differentiable dsp and spectral optimal transport. In ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 1176–1180. IEEE, 2024.

[U+24]

Noy Uzrad and others. DiffMoog: a differentiable modular synthesizer for sound matching. arXiv:2401.12570, 2024.

[VR19]

Vesa Valimaki and Jussi Ramo. Neurally controlled graphic equalizer. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 27(12):2140–2149, 2019.

[WM20]

Kurt James Werner and Russell McClellan. Moog ladder filter generalizations based on state variable filters. In Proceedings of the International Conference on Digital Audio Effects, 70–77. 2020.

[YXT+23]

Zhen Ye, Wei Xue, Xu Tan, Qifeng Liu, and Yike Guo. NAS-FM: neural architecture search for tunable and interpretable sound synthesis based on frequency modulation. arXiv:2305.12868, 2023.

[YMC+24]

Chin-Yun Yu, Christopher Mitcheltree, Alistair Carson, Stefan Bilbao, Joshua D Reiss, and György Fazekas. Differentiable all-pole filters for time-varying audio systems. arXiv preprint arXiv:2404.07970, 2024.

[Zav20]

V. Zavalishin. The Art of VA Filter Design. V. Zavalishin, 2020.