Learning to Better Segment Objects from Unseen Classes with Unlabeled Videos

In this paper, we explore the use of unlabeled videos to improve the performance of instance segmentation model on unseen classes. The ability to localize and segment objects from unseen classes would open the door to new applications, such as autonomous object learning in active vision. Given some unlabeled videos containing unseen classes, our baseline instance segmentation model pretrained on seen classes is able to generate masks for objects in the videos, then we use our proposed Bayesian method to remove the false positives by exploring the consistency among consecutive frames. The generated masks can be then used to help the model to better localize and recognize the objects from unseen classes. Our method starts from a set of object proposals and relies on (non-realistic) analysis-by-synthesis to select the correct ones by performing an efficient optimization over all the frames simultaneously. Through extensive experiments, we show that our method can generate a high-quality training set which significantly boosts the performance of segmenting objects of unseen classes. We thus believe that our method could open the door for open-world instance segmentation using abundant Internet videos.

Bibtex

@InProceedings{Du_2021_ICCV, author = {Du, Yuming and Xiao, Yang and Lepetit, Vincent}, title = {Learning To Better Segment Objects From Unseen Classes With Unlabeled Videos}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2021}, pages = {3375-3384} }