VirTex is a pretraining approach which uses semantically dense captions to learn visual representations. We train CNN + Transformers from scratch on COCO Captions, and transfer the CNN to downstream ...
Abstract: Convolutional neural networks (CNNs) have been successfully applied to the single target tracking task in recent years. Generally, training a deep CNN model requires numerous labeled ...
25 years ago, Jianbo Shi introduced Normalized Cuts (spectral clustering), a graph-theoretic approach to perceptual grouping that became a staple in unsupervised image segmentation. While the original ...
Abstract: Compositional Zero-Shot Learning (CZSL) aims to learn visual concepts (i.e., attributes and objects) from seen compositions and combine them to predict unseen compositions. Existing visual ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results
Feedback