资讯

Open-vocabulary semantic segmentation strives to distinguish pixels into different semantic groups from an open set of categories. Most existing methods explore utilizing pre-trained vision-language ...
The SP-AED method contains acoustic pre-training for the encoder, linguistic pre-training for the decoder, and an adaptive combination fine-tuning for the whole system. We first design a linguistic ...