Audio Samples

Results on


Sample #1
Ground-truth RIR Wet (target) Dry (source)
AudioLM-like (RIR) RQ-Transformer-like (RIR) VALL-E-like (RIR)
AudioLM-like (convolved) RQ-Transf. (convolved) VALL-E-like (convolved)
FVN (RIR) Non-AR, discrete (RIR) Non-AR, continuous (RIR)
FVN (convolved) Non-AR, discrete (convolved) Non-AR, cont. (convolved)


Sample #2
Ground-truth RIR Wet (target) Dry (source)
AudioLM-like (RIR) RQ-Transformer-like (RIR) VALL-E-like (RIR)
AudioLM-like (convolved) RQ-Transf. (convolved) VALL-E-like (convolved)
FVN (RIR) Non-AR, discrete (RIR) Non-AR, continuous (RIR)
FVN (convolved) Non-AR, discrete (convolved) Non-AR, cont. (convolved)


Sample #3
Ground-truth RIR Wet (target) Dry (source)
AudioLM-like (RIR) RQ-Transformer-like (RIR) VALL-E-like (RIR)
AudioLM-like (convolved) RQ-Transf. (convolved) VALL-E-like (convolved)
FVN (RIR) Non-AR, discrete (RIR) Non-AR, continuous (RIR)
FVN (convolved) Non-AR, discrete (convolved) Non-AR, cont. (convolved)


Sample #4
Ground-truth RIR Wet (target) Dry (source)
AudioLM-like (RIR) RQ-Transformer-like (RIR) VALL-E-like (RIR)
AudioLM-like (convolved) RQ-Transf. (convolved) VALL-E-like (convolved)
FVN (RIR) Non-AR, discrete (RIR) Non-AR, continuous (RIR)
FVN (convolved) Non-AR, discrete (convolved) Non-AR, cont. (convolved)


Sample #5
Ground-truth RIR Wet (target) Dry (source)
AudioLM-like (RIR) RQ-Transformer-like (RIR) VALL-E-like (RIR)
AudioLM-like (convolved) RQ-Transf. (convolved) VALL-E-like (convolved)
FVN (RIR) Non-AR, discrete (RIR) Non-AR, continuous (RIR)
FVN (convolved) Non-AR, discrete (convolved) Non-AR, cont. (convolved)


Sample #6
Ground-truth RIR Wet (target) Dry (source)
AudioLM-like (RIR) RQ-Transformer-like (RIR) VALL-E-like (RIR)
AudioLM-like (convolved) RQ-Transf. (convolved) VALL-E-like (convolved)
FVN (RIR) Non-AR, discrete (RIR) Non-AR, continuous (RIR)
FVN (convolved) Non-AR, discrete (convolved) Non-AR, cont. (convolved)


Sample #7
Ground-truth RIR Wet (target) Dry (source)
AudioLM-like (RIR) RQ-Transformer-like (RIR) VALL-E-like (RIR)
AudioLM-like (convolved) RQ-Transf. (convolved) VALL-E-like (convolved)
FVN (RIR) Non-AR, discrete (RIR) Non-AR, continuous (RIR)
FVN (convolved) Non-AR, discrete (convolved) Non-AR, cont. (convolved)


Sample #8
Ground-truth RIR Wet (target) Dry (source)
AudioLM-like (RIR) RQ-Transformer-like (RIR) VALL-E-like (RIR)
AudioLM-like (convolved) RQ-Transf. (convolved) VALL-E-like (convolved)
FVN (RIR) Non-AR, discrete (RIR) Non-AR, continuous (RIR)
FVN (convolved) Non-AR, discrete (convolved) Non-AR, cont. (convolved)


Sample #9
Ground-truth RIR Wet (target) Dry (source)
AudioLM-like (RIR) RQ-Transformer-like (RIR) VALL-E-like (RIR)
AudioLM-like (convolved) RQ-Transf. (convolved) VALL-E-like (convolved)
FVN (RIR) Non-AR, discrete (RIR) Non-AR, continuous (RIR)
FVN (convolved) Non-AR, discrete (convolved) Non-AR, cont. (convolved)


Sample #10
Ground-truth RIR Wet (target) Dry (source)
AudioLM-like (RIR) RQ-Transformer-like (RIR) VALL-E-like (RIR)
AudioLM-like (convolved) RQ-Transf. (convolved) VALL-E-like (convolved)
FVN (RIR) Non-AR, discrete (RIR) Non-AR, continuous (RIR)
FVN (convolved) Non-AR, discrete (convolved) Non-AR, cont. (convolved)