|
Sample #1 | |||
---|---|---|---|
Ground-truth RIR | Reverberant speech | Dry speech | |
AudioLM-like (RIR) | RQ-Transformer-like (RIR) | VALL-E-like (RIR) | FVN (RIR) |
AudioLM-like (convolved) | RQ-Transf. (convolved) | VALL-E-like (convolved) | FVN (convolved) |
Sample #2 | |||
---|---|---|---|
Ground-truth RIR | Reverberant speech | Dry speech | |
AudioLM-like (RIR) | RQ-Transformer-like (RIR) | VALL-E-like (RIR) | FVN (RIR) |
AudioLM-like (convolved) | RQ-Transf. (convolved) | VALL-E-like (convolved) | FVN (convolved) |
Sample #3 | |||
---|---|---|---|
Ground-truth RIR | Reverberant speech | Dry speech | |
AudioLM-like (RIR) | RQ-Transformer-like (RIR) | VALL-E-like (RIR) | FVN (RIR) |
AudioLM-like (convolved) | RQ-Transf. (convolved) | VALL-E-like (convolved) | FVN (convolved) |
Sample #4 | |||
---|---|---|---|
Ground-truth RIR | Reverberant speech | Dry speech | |
AudioLM-like (RIR) | RQ-Transformer-like (RIR) | VALL-E-like (RIR) | FVN (RIR) |
AudioLM-like (convolved) | RQ-Transf. (convolved) | VALL-E-like (convolved) | FVN (convolved) |
Sample #5 | |||
---|---|---|---|
Ground-truth RIR | Reverberant speech | Dry speech | |
AudioLM-like (RIR) | RQ-Transformer-like (RIR) | VALL-E-like (RIR) | FVN (RIR) |
AudioLM-like (convolved) | RQ-Transf. (convolved) | VALL-E-like (convolved) | FVN (convolved) |
Sample #6 | |||
---|---|---|---|
Ground-truth RIR | Reverberant speech | Dry speech | |
AudioLM-like (RIR) | RQ-Transformer-like (RIR) | VALL-E-like (RIR) | FVN (RIR) |
AudioLM-like (convolved) | RQ-Transf. (convolved) | VALL-E-like (convolved) | FVN (convolved) |
Sample #7 | |||
---|---|---|---|
Ground-truth RIR | Reverberant speech | Dry speech | |
AudioLM-like (RIR) | RQ-Transformer-like (RIR) | VALL-E-like (RIR) | FVN (RIR) |
AudioLM-like (convolved) | RQ-Transf. (convolved) | VALL-E-like (convolved) | FVN (convolved) |
Sample #8 | |||
---|---|---|---|
Ground-truth RIR | Reverberant speech | Dry speech | |
AudioLM-like (RIR) | RQ-Transformer-like (RIR) | VALL-E-like (RIR) | FVN (RIR) |
AudioLM-like (convolved) | RQ-Transf. (convolved) | VALL-E-like (convolved) | FVN (convolved) |
Sample #9 | |||
---|---|---|---|
Ground-truth RIR | Reverberant speech | Dry speech | |
AudioLM-like (RIR) | RQ-Transformer-like (RIR) | VALL-E-like (RIR) | FVN (RIR) |
AudioLM-like (convolved) | RQ-Transf. (convolved) | VALL-E-like (convolved) | FVN (convolved) |
Sample #10 | |||
---|---|---|---|
Ground-truth RIR | Reverberant speech | Dry speech | |
AudioLM-like (RIR) | RQ-Transformer-like (RIR) | VALL-E-like (RIR) | FVN (RIR) |
AudioLM-like (convolved) | RQ-Transf. (convolved) | VALL-E-like (convolved) | FVN (convolved) |