University of Electronic Science and Technology of China
Institute of Software, Chinese Academy of Sciences
Wuhan University
Tongji University
Mirror detection (MD) aims to overcome interference caused by reflections and locate mirror regions. Existing methods focus on designing components to explicitly establish the associations between physical entities and corresponding imagings, or utilizing rotation to construct symmetric consistency. We observe that: a) incomplete and incorrect correspondence between entities and imagings; b) other physical materials (\textit{e.g.,} glass) exhibit characteristics partially similar to mirrors, causing confusion when they co-occur; c) complex interfering factors (\textit{e.g.,} occlusion) and reflection mechanisms may expand vector space several times over. To address these issues in a unified manner, we formulate the scene-aware visual reasoning network (SVRNet) based on visual prompts. Specifically, we construct the prototype-guided prompt chain reasoning (PPCR) that generates a mixed chain of thought reasoning based on maximal difference heterogeneous prototypes to construct comprehensive spatial location and semantic perception. Noise may accumulate gradually through the chain, and crucial clues may also disappear. Therefore, we design the prompt evolution (PE) to filter out noise and enhance the coupling between prompts. We further develop the mixture of prompt injection expert (MPIE) to dynamically select the optimal injection strategy in the low-rank space based on specific scene. Due to reflection interference and random parameter space introducing potential ambiguity, we formulate the three-way evidence-aware (TEA) loss to quantify the uncertainty, thereby providing reliable predictions. To leverage historical knowledge and further disentangle representations, we propose the frequency prototype contrastive (FPC) loss for learning more generalizable features across images. Finally, we relabel 25,828 images and formulate the first point-supervised MD framework. Extensive experiments conducted on four mirror benchmarks under three settings demonstrate that our method surpasses state-of-the-art approaches. Promising results are also achieved on six related benchmarks, showing its generality.
If you find the above works helpful, please consider citing them. Thank you! :)
@article{zha2026svrnet,
title={Think Twice Before Determining: Towards Scene-aware Visual Reasoning for Mirror Detection},
author={Zha, Mingfeng and Wang, Guoqing and Pei, Yunqiang and Li, Tianyu and Tang, Xiongxin and Ma, Jiayi and Yang, Yang and Shen, Heng Tao},
journal={IEEE Transactions on Circuits and Systems for Video Technology},
year={2026},
}