Cyclic Contrastive Knowledge Transfer for Open-Vocabulary Object Detection

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

In pursuit of detecting unstinted objects that extend beyond predefined categories, prior arts of open-vocabulary object detection (OVD) typically resort to pretrained vision-language models (VLMs) for base-to-novel category generalization. However, to mitigate the misalignment between upstream image-text pretraining and downstream region-level perception, additional supervisions are indispensable, eg, image-text pairs or pseudo annotations generated via self-training strategies. In this work, we propose CCKT-Det trained without any extra supervision. The proposed framework constructs a cyclic and dynamic knowledge transfer from language queries and visual region features extracted from VLMs, which forces the detector to closely align with the visual-semantic space of VLMs. Specifically, 1) we prefilter and inject semantic priors to guide the learning of queries, and 2) introduce a regional contrastive loss to improve the awareness of queries on novel objects. CCKT-Det can consistently improve performance as the scale of VLMs increases, all while requiring the detector at a moderate level of computation overhead. Comprehensive experimental results demonstrate that our method achieves performance gain of +2.9% and +10.2% AP50 over previous state-of-the-arts on the challenging COCO benchmark, both without and with a stronger teacher model. The code is provided at https://github.com/ZCHUHan/CCKT-Det.

Related collections

Author and article information

Journal

Publication date Created: 13 March 2025

Article

ArXiV ID: 2503.11005

SO-VID: a59ffbc2-d71a-4d57-86f1-49453dab49e6

License:

http://arxiv.org/licenses/nonexclusive-distrib/1.0/

History

Custom metadata

Journal reference Proceedings of the 13th International Conference on Learning Representations (ICLR 2025), Paper ID: 4226

Cyclic Contrastive Knowledge Transfer for Open-Vocabulary Object Detection

Read this article at

Abstract

Related collections

Semantic Knowledge Base

Author and article information

Journal

Article

History

Custom metadata

Comments

Comment on this article

Similar content 79