Abstract: We present Relocate, a simple training-free baseline designed to perform the challenging task of visual query localization in long videos. To eliminate the need for task-specific training ...
AquaVLM: A Domain-Specific Vision–Language Model for Structured Understanding of Oceanarium Scenes
Abstract: Vision-Language Models (VLMs) have advanced cross-modal understanding and generation, yet their domain adaptability remains limited. To address the lack of high-quality captions for fish ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results