|
|
| Sample #1 | |||
|---|---|---|---|
| Ground-truth RIR | Wet (target) | Dry (source) | |
| AudioLM-like (RIR) | RQ-Transformer-like (RIR) | VALL-E-like (RIR) | |
| AudioLM-like (convolved) | RQ-Transf. (convolved) | VALL-E-like (convolved) | |
| FVN (RIR) | Non-AR, discrete (RIR) | Non-AR, continuous (RIR) | |
| FVN (convolved) | Non-AR, discrete (convolved) | Non-AR, cont. (convolved) | |
| Sample #2 | |||
|---|---|---|---|
| Ground-truth RIR | Wet (target) | Dry (source) | |
| AudioLM-like (RIR) | RQ-Transformer-like (RIR) | VALL-E-like (RIR) | |
| AudioLM-like (convolved) | RQ-Transf. (convolved) | VALL-E-like (convolved) | |
| FVN (RIR) | Non-AR, discrete (RIR) | Non-AR, continuous (RIR) | |
| FVN (convolved) | Non-AR, discrete (convolved) | Non-AR, cont. (convolved) | |
| Sample #3 | |||
|---|---|---|---|
| Ground-truth RIR | Wet (target) | Dry (source) | |
| AudioLM-like (RIR) | RQ-Transformer-like (RIR) | VALL-E-like (RIR) | |
| AudioLM-like (convolved) | RQ-Transf. (convolved) | VALL-E-like (convolved) | |
| FVN (RIR) | Non-AR, discrete (RIR) | Non-AR, continuous (RIR) | |
| FVN (convolved) | Non-AR, discrete (convolved) | Non-AR, cont. (convolved) | |
| Sample #4 | |||
|---|---|---|---|
| Ground-truth RIR | Wet (target) | Dry (source) | |
| AudioLM-like (RIR) | RQ-Transformer-like (RIR) | VALL-E-like (RIR) | |
| AudioLM-like (convolved) | RQ-Transf. (convolved) | VALL-E-like (convolved) | |
| FVN (RIR) | Non-AR, discrete (RIR) | Non-AR, continuous (RIR) | |
| FVN (convolved) | Non-AR, discrete (convolved) | Non-AR, cont. (convolved) | |
| Sample #5 | |||
|---|---|---|---|
| Ground-truth RIR | Wet (target) | Dry (source) | |
| AudioLM-like (RIR) | RQ-Transformer-like (RIR) | VALL-E-like (RIR) | |
| AudioLM-like (convolved) | RQ-Transf. (convolved) | VALL-E-like (convolved) | |
| FVN (RIR) | Non-AR, discrete (RIR) | Non-AR, continuous (RIR) | |
| FVN (convolved) | Non-AR, discrete (convolved) | Non-AR, cont. (convolved) | |
| Sample #6 | |||
|---|---|---|---|
| Ground-truth RIR | Wet (target) | Dry (source) | |
| AudioLM-like (RIR) | RQ-Transformer-like (RIR) | VALL-E-like (RIR) | |
| AudioLM-like (convolved) | RQ-Transf. (convolved) | VALL-E-like (convolved) | |
| FVN (RIR) | Non-AR, discrete (RIR) | Non-AR, continuous (RIR) | |
| FVN (convolved) | Non-AR, discrete (convolved) | Non-AR, cont. (convolved) | |
| Sample #7 | |||
|---|---|---|---|
| Ground-truth RIR | Wet (target) | Dry (source) | |
| AudioLM-like (RIR) | RQ-Transformer-like (RIR) | VALL-E-like (RIR) | |
| AudioLM-like (convolved) | RQ-Transf. (convolved) | VALL-E-like (convolved) | |
| FVN (RIR) | Non-AR, discrete (RIR) | Non-AR, continuous (RIR) | |
| FVN (convolved) | Non-AR, discrete (convolved) | Non-AR, cont. (convolved) | |
| Sample #8 | |||
|---|---|---|---|
| Ground-truth RIR | Wet (target) | Dry (source) | |
| AudioLM-like (RIR) | RQ-Transformer-like (RIR) | VALL-E-like (RIR) | |
| AudioLM-like (convolved) | RQ-Transf. (convolved) | VALL-E-like (convolved) | |
| FVN (RIR) | Non-AR, discrete (RIR) | Non-AR, continuous (RIR) | |
| FVN (convolved) | Non-AR, discrete (convolved) | Non-AR, cont. (convolved) | |
| Sample #9 | |||
|---|---|---|---|
| Ground-truth RIR | Wet (target) | Dry (source) | |
| AudioLM-like (RIR) | RQ-Transformer-like (RIR) | VALL-E-like (RIR) | |
| AudioLM-like (convolved) | RQ-Transf. (convolved) | VALL-E-like (convolved) | |
| FVN (RIR) | Non-AR, discrete (RIR) | Non-AR, continuous (RIR) | |
| FVN (convolved) | Non-AR, discrete (convolved) | Non-AR, cont. (convolved) | |
| Sample #10 | |||
|---|---|---|---|
| Ground-truth RIR | Wet (target) | Dry (source) | |
| AudioLM-like (RIR) | RQ-Transformer-like (RIR) | VALL-E-like (RIR) | |
| AudioLM-like (convolved) | RQ-Transf. (convolved) | VALL-E-like (convolved) | |
| FVN (RIR) | Non-AR, discrete (RIR) | Non-AR, continuous (RIR) | |
| FVN (convolved) | Non-AR, discrete (convolved) | Non-AR, cont. (convolved) | |