|
|
| Sample #1 | |||
|---|---|---|---|
| Ground-truth RIR | Reverberant speech | Dry speech | |
| AudioLM-like (RIR) | RQ-Transformer-like (RIR) | VALL-E-like (RIR) | FVN (RIR) |
| AudioLM-like (convolved) | RQ-Transf. (convolved) | VALL-E-like (convolved) | FVN (convolved) |
| Sample #2 | |||
|---|---|---|---|
| Ground-truth RIR | Reverberant speech | Dry speech | |
| AudioLM-like (RIR) | RQ-Transformer-like (RIR) | VALL-E-like (RIR) | FVN (RIR) |
| AudioLM-like (convolved) | RQ-Transf. (convolved) | VALL-E-like (convolved) | FVN (convolved) |
| Sample #3 | |||
|---|---|---|---|
| Ground-truth RIR | Reverberant speech | Dry speech | |
| AudioLM-like (RIR) | RQ-Transformer-like (RIR) | VALL-E-like (RIR) | FVN (RIR) |
| AudioLM-like (convolved) | RQ-Transf. (convolved) | VALL-E-like (convolved) | FVN (convolved) |
| Sample #4 | |||
|---|---|---|---|
| Ground-truth RIR | Reverberant speech | Dry speech | |
| AudioLM-like (RIR) | RQ-Transformer-like (RIR) | VALL-E-like (RIR) | FVN (RIR) |
| AudioLM-like (convolved) | RQ-Transf. (convolved) | VALL-E-like (convolved) | FVN (convolved) |
| Sample #5 | |||
|---|---|---|---|
| Ground-truth RIR | Reverberant speech | Dry speech | |
| AudioLM-like (RIR) | RQ-Transformer-like (RIR) | VALL-E-like (RIR) | FVN (RIR) |
| AudioLM-like (convolved) | RQ-Transf. (convolved) | VALL-E-like (convolved) | FVN (convolved) |
| Sample #6 | |||
|---|---|---|---|
| Ground-truth RIR | Reverberant speech | Dry speech | |
| AudioLM-like (RIR) | RQ-Transformer-like (RIR) | VALL-E-like (RIR) | FVN (RIR) |
| AudioLM-like (convolved) | RQ-Transf. (convolved) | VALL-E-like (convolved) | FVN (convolved) |
| Sample #7 | |||
|---|---|---|---|
| Ground-truth RIR | Reverberant speech | Dry speech | |
| AudioLM-like (RIR) | RQ-Transformer-like (RIR) | VALL-E-like (RIR) | FVN (RIR) |
| AudioLM-like (convolved) | RQ-Transf. (convolved) | VALL-E-like (convolved) | FVN (convolved) |
| Sample #8 | |||
|---|---|---|---|
| Ground-truth RIR | Reverberant speech | Dry speech | |
| AudioLM-like (RIR) | RQ-Transformer-like (RIR) | VALL-E-like (RIR) | FVN (RIR) |
| AudioLM-like (convolved) | RQ-Transf. (convolved) | VALL-E-like (convolved) | FVN (convolved) |
| Sample #9 | |||
|---|---|---|---|
| Ground-truth RIR | Reverberant speech | Dry speech | |
| AudioLM-like (RIR) | RQ-Transformer-like (RIR) | VALL-E-like (RIR) | FVN (RIR) |
| AudioLM-like (convolved) | RQ-Transf. (convolved) | VALL-E-like (convolved) | FVN (convolved) |
| Sample #10 | |||
|---|---|---|---|
| Ground-truth RIR | Reverberant speech | Dry speech | |
| AudioLM-like (RIR) | RQ-Transformer-like (RIR) | VALL-E-like (RIR) | FVN (RIR) |
| AudioLM-like (convolved) | RQ-Transf. (convolved) | VALL-E-like (convolved) | FVN (convolved) |