资讯

Target Speech Extraction (TSE) traditionally relies on explicit clues about the speaker’s identity like enrollment audio, face images, or videos, which may not always be available. In this paper, we ...