R-MOS
Score each retrieved prompt by emotional similarity to the paired GT reference. Ignore lexical mismatch.
EmoPilot MOS Listening Set
Five target-emotion cases selected for subjective MOS evaluation. Each case includes the paired target audio reference, retrieved emotion prompts, and generated speech from the evaluated systems.
Evaluation protocol
Score each retrieved prompt by emotional similarity to the paired GT reference. Ignore lexical mismatch.
Score each generated speech by emotional similarity to the paired GT reference and fit to the transcript.