Audio Samples: Singing Voice Effect Estimation

Results on seen speakers

① Autoencoding: graph decoder with another graph encoder.
② Unconditioned: graph decoder with dummy zero latents.
③ Estimation (proposed: token, 2-stage): token-by-token decoding + 2-stage (categorical/continuous) decoding.
④ Node, 2-stage: node-by-node decoding + 2-stage decoding.
⑤ Token, 1-stage: token-by-token decoding + single-stage autoregressive decoding.
⑥ Oracle source: the proposed method ③ with dry source conditioned reference encoder.
Dry source: (sum of) dry source(s) without any processing.


Sample #31
Ground-truth
gt-img
full
① Autoencoding
prototype full
② Unconditioned
prototype full
Dry source
(bypass graph)
③ Estimation (proposed: token, 2-stage)
pred-img
full

④ Node, 2-stage
prototype full
⑤ Token, 1-stage
prototype full
⑥ Oracle source
prototype full


Sample #32
Ground-truth
gt-img
full
① Autoencoding
prototype full
② Unconditioned
prototype full
Dry source
(bypass graph)
③ Estimation (proposed: token, 2-stage)
pred-img
full

④ Node, 2-stage
prototype full
⑤ Token, 1-stage
prototype full
⑥ Oracle source
prototype full


Sample #33
Ground-truth
gt-img
full
① Autoencoding
prototype full
② Unconditioned
prototype full
Dry source
(bypass graph)
③ Estimation (proposed: token, 2-stage)
pred-img
full

④ Node, 2-stage
prototype full
⑤ Token, 1-stage
prototype full
⑥ Oracle source
prototype full


Sample #34
Ground-truth
gt-img
full
① Autoencoding
prototype full
② Unconditioned
prototype full
Dry source
(bypass graph)
③ Estimation (proposed: token, 2-stage)
pred-img
full

④ Node, 2-stage
prototype full
⑤ Token, 1-stage
prototype full
⑥ Oracle source
prototype full


Sample #35
Ground-truth
gt-img
full
① Autoencoding
prototype full
② Unconditioned
prototype full
Dry source
(bypass graph)
③ Estimation (proposed: token, 2-stage)
pred-img
full

④ Node, 2-stage
prototype full
⑤ Token, 1-stage
prototype full
⑥ Oracle source
prototype full


Sample #36
Ground-truth
gt-img
full
① Autoencoding
prototype full
② Unconditioned
prototype full
Dry source
(bypass graph)
③ Estimation (proposed: token, 2-stage)
pred-img
full

④ Node, 2-stage
prototype full
⑤ Token, 1-stage
prototype full
⑥ Oracle source
prototype full


Sample #37
Ground-truth
gt-img
full
① Autoencoding
prototype full
② Unconditioned
prototype full
Dry source
(bypass graph)
③ Estimation (proposed: token, 2-stage)
pred-img
full

④ Node, 2-stage
prototype full
⑤ Token, 1-stage
prototype full
⑥ Oracle source
prototype full


Sample #38
Ground-truth
gt-img
full
① Autoencoding
prototype full
② Unconditioned
prototype full
Dry source
(bypass graph)
③ Estimation (proposed: token, 2-stage)
pred-img
full

④ Node, 2-stage
prototype full
⑤ Token, 1-stage
prototype full
⑥ Oracle source
prototype full


Sample #39
Ground-truth
gt-img
full
① Autoencoding
prototype full
② Unconditioned
prototype full
Dry source
(bypass graph)
③ Estimation (proposed: token, 2-stage)
pred-img
full

④ Node, 2-stage
prototype full
⑤ Token, 1-stage
prototype full
⑥ Oracle source
prototype full


Sample #40
Ground-truth
gt-img
full
① Autoencoding
prototype full
② Unconditioned
prototype full
Dry source
(bypass graph)
③ Estimation (proposed: token, 2-stage)
pred-img
full

④ Node, 2-stage
prototype full
⑤ Token, 1-stage
prototype full
⑥ Oracle source
prototype full