资讯
Answering semantically complicated questions according to an image is challenging in a visual question answering (VQA) task. Although the image can be well represented by deep learning, the question ...
Audio-visual event localization (AVEL) aims to identify both the categories and temporal boundaries of events that are both audible and visible in unconstrained videos. However, the inherent semantic ...
一些您可能无法访问的结果已被隐去。
显示无法访问的结果