SPSVO: a self-supervised surgical perception stereo visual odometer for endoscopy

Junjie Zhao; Yang Luo; Qimin Li; Natalie Baddour; Md Sulayman Hossen

doi:10.1017/S026357472300125X

SPSVO: a self-supervised surgical perception stereo visual odometer for endoscopy

Published online by Cambridge University Press: 29 September 2023

Junjie Zhao ,

Yang Luo

Qimin Li ,

Natalie Baddour and

Md Sulayman Hossen

Show author details

Junjie Zhao: Affiliation:
College of Mechanical and Vehicle Engineering, Chongqing University, Chongqing, China
Yang Luo*: Affiliation:
College of Mechanical and Vehicle Engineering, Chongqing University, Chongqing, China
Qimin Li: Affiliation:
College of Mechanical and Vehicle Engineering, Chongqing University, Chongqing, China
Natalie Baddour: Affiliation:
Department of Mechanical Engineering, University of Ottawa, Ottawa, ON, Canada
Md Sulayman Hossen: Affiliation:
College of civil engineering Chongqing university, Chongqing University, Chongqing, China
*: Corresponding author: Yang Luo; Email: yluo688@cqu.edu.cn.

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Accurate tracking and reconstruction of surgical scenes is a critical enabling technology toward autonomous robotic surgery. In endoscopic examinations, computer vision has provided assistance in many aspects, such as aiding in diagnosis or scene reconstruction. Estimation of camera motion and scene reconstruction from intra-abdominal images are challenging due to irregular illumination and weak texture of endoscopic images. Current surgical 3D perception algorithms for camera and object pose estimation rely on geometric information (e.g., points, lines, and surfaces) obtained from optical images. Unfortunately, standard hand-crafted local features for pose estimation usually do not perform well in laparoscopic environments. In this paper, a novel self-supervised Surgical Perception Stereo Visual Odometer (SPSVO) framework is proposed to accurately estimate endoscopic pose and better assist surgeons in locating and diagnosing lesions. The proposed SPSVO system combines a self-learning feature extraction method and a self-supervised matching procedure to overcome the adverse effects of irregular illumination in endoscopic images. The framework of the proposed SPSVO includes image pre-processing, feature extraction, stereo matching, feature tracking, keyframe selection, and pose graph optimization. The SPSVO can simultaneously associate the appearance of extracted feature points and textural information for fast and accurate feature tracking. A nonlinear pose graph optimization method is adopted to facilitate the backend process. The effectiveness of the proposed SPSVO framework is demonstrated on a public endoscopic dataset, with the obtained root mean square error of trajectory tracking reaching 0.278 to 0.690 mm. The computation speed of the proposed SPSVO system can reach 71ms per frame.

Keywords

Visual odometer (VO)virtual endoscopy self-supervision pose estimation

Type: Research Article
Information: Robotica , Volume 41 , Issue 12 , December 2023 , pp. 3724 - 3745

DOI: https://doi.org/10.1017/S026357472300125X [Opens in a new window]
Copyright: © Chongqing University, 2023. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Bernhardt, S., Nicolau, S. A., Soler, L. and Doignon, C., “ The status of augmented reality in laparoscopic surgery as of 2016,” Med. Image Anal. 37, 66–90 (2017).CrossRef Google Scholar PubMed

Shao, S., Pei, Z., Chen, W., Zhu, W., Wu, X., Sun, D. and Zhang, B., “ Self-supervised monocular depth and ego-motion estimation in endoscopy: Appearance flow to the rescue,” Med. Image Anal. 77, 102338 (2022).CrossRef Google Scholar PubMed

Feuerstein, M.. Augmented Reality in Laparoscopic Surgery (Vdm Verlag Dr.mller Aktiengesellschaft & Co.kg, 2007).Google Scholar

Lim, P. K., Stephenson, G. S., Keown, T. W., Byrne, C., Lin, C. C., Marecek, G. S. and Scolaro, J. A., “ Use of 3D printed models in resident education for the classification of acetabulum fractures,” J. Surg. Educ. 75(6), 1679–1684 (2018).CrossRef Google Scholar PubMed

Zhang, Z., Xie, Y., Xing, F, McGough, M., Yang, L., “MDNet: A semantically and visually interpretable medical image diagnosis network,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017) pp. 3549--3557.Google Scholar

Low, C. M., Morris, J. M., Matsumoto, J. S., Stokken, J. K., O’Brien, E. K. and Choby, G., “ Use of 3D-printed and 2D-illustrated international frontal sinus anatomy classification anatomic models for resident education,” Otolaryngol. Head Neck Surg. 161(4), 705–713 (2019).CrossRef Google Scholar PubMed

Afifi, A., Takada, C., Yoshimura, Y. and Nakaguchi, T., “ Real-time expanded field-of-view for minimally invasive surgery using multi-camera visual simultaneous localization and mapping,” Sensors 21(6), 2106 (2021).CrossRef Google Scholar PubMed

Tatar, F., Mollinger, J. R., Den Dulk, R. C., van Duyl, W. A., Goosen, J. F. L. and Bossche, A., “Ultrasonic Sensor System for Measuring Position and Orientation of Laproscopic Instruments in Minimal Invasive Surgery,” 2nd Annual International IEEE-EMBS Special Topic Conference on Microtechnologies in Medicine and Biology. Proceedings (Cat. No. 02EX578), (2002) pp. 301–304.Google Scholar

Lamata, P., Morvan, T., Reimers, M., E. Samset and J. Declerck, “Addressing Shading-based Laparoscopic Registration,” World Congress on Medical Physics and Biomedical Engineering, September 7-12, 2009, Munich, Germany: Vol. 25/6 Surgery, Nimimal Invasive Interventions, Endoscopy and Image Guided Therapy, (2009) pp. 189–192.Google Scholar

Wu, C.-H., Sun, Y.-N. and Chang, C.-C., “Three-dimensional modeling from endoscopic video using geometric co Qax‘ nstraints via feature positioning,” IEEE Trans. Biomed. Eng. 54(7), 1199–1211 (2007).Google Scholar

Seshamani, S., Lau, W. and Hager, G.. Real-time endoscopic mosaicking. In: Medical Image Computing and Computer-Assisted Intervention-MICCAI 2006: 9th International Conference, , Copenhagen, Denmark, October 1-6, 2006. Proceedings, Part I 9, (2006) pp. 355–363,Google Scholar

Thormahlen, T., Broszio, H. and Meier, P. N., “Three-dimensional Endoscopy,” Falk Symposium, (2002), 2002-01.Google Scholar

Koppel, D., Chen, C.-I., Wang, Y.-F., H. Lee, J. Gu, A. Poirson and R. Wolters, “Toward Automated Model Building from Video in Computer-assisted Diagnoses in Colonoscopy,” Medical Imaging 2007: Visualization and Image-Guided Procedures, (2007) pp. 567–575.Google Scholar

Mirota, D., Wang, H., Taylor, R. H., M. Ishii and G. D. Hager, “Toward Video-based Navigation for Endoscopic Endonasal Skull Base Surgery,” Medical Image Computing and Computer-Assisted Intervention-MICCAI 2009: 12th International Conference, London, UK, September 20-24, 2009, Proceedings, Part I 12, (2009) pp. 91–99.Google Scholar

Kaufman, A. and Wang, J., “3D surface reconstruction from endoscopic videos,” Math. Visual., 61–74 (2008).CrossRef Google Scholar

Hong, D., Tavanapong, W., Wong, J., J. Oh, P.-C. De Groen, “3D reconstruction of colon segments from colonoscopy images, "2009 Ninth IEEE International Conference on Bioinformatics and BioEngineering, (2009) pp. 53–60.Google Scholar

Jang, J. Y., Han, H.-S., Yoon, Y.-S., Jai, Y. and Choi, Y., “Retrospective comparison of outcomes of laparoscopic and open surgery for T2 gallbladder cancer - thirteen-year experience,” Surg. Oncol. 29, 29–147 (2019).CrossRef Google Scholar PubMed

Wu, H., Zhao, J., Xu, K., Zhang, Y., Xu, R., Wang, A. and Iwahori, Y., “ Semantic SLAM based on deep learning in endocavity environment,” Symmetry-Basel 14(3), 614 (2022).CrossRef Google Scholar

Xie, C., Yao, T., Wang, J. and Liu, Q., “ Endoscope localization and gastrointestinal feature map construction based on monocular SLAM technology,” J. Infect Public Health 13(9), 1314–1321 (2020).CrossRef Google Scholar PubMed

Mountney, P., Stoyanov, D., Davison, A., and G.-Z. Yang, “Simultaneous Stereoscope Localization and Soft-tissue Mapping for Minimal Invasive Surgery, ” Medical Image Computing and Computer-Assisted Intervention-MICCAI 2006: 9th International Conference, Copenhagen, Denmark, October 1-6, 2006. Proceedings, Part I 9, (2006) pp. 347–354.Google Scholar

Mountney, P. and Yang, G.-Z., “Motion Compensated SLAM for Image Guided Surgery,” Medical Image Computing and Computer-Assisted Intervention-MICCAI 2010: 13th International Conference, Beijing, China, September 20-24, 2010, Proceedings, Part II 13, (2010) pp. 496–504.Google Scholar

Klein, G., Murray, D., “Parallel Tracking and Mapping for Small AR Workspaces,” 2007 6th IEEE and ACM International Symposium on Mixed and Augmented Reality, (2007) pp. 225–234.Google Scholar

Lin, B., Johnson, A., Qian, X., J. Sanchez and Y. Sun, “Simultaneous Tracking, 3D Reconstruction and Deforming Point Detection for Stereoscope Guided Surgery,” Augmented Reality Environments for Medical Imaging and Computer-Assisted Interventions: 6th International Workshop, MIAR 2013 and 8th International Workshop, AE-CAI 2013, Held in Conjunction with MICCAI 2013, Nagoya, Japan, September 22, 2013. Proceedings, (2013) pp. 35–44.Google Scholar

Lin, B., Sun, Y., Sanchez, J. E. and Qian, X., “ Efficient vessel feature detection for endoscopic image analysis,” IEEE Trans. Biomed. Eng. 62(4), 1141–1150 (2014).CrossRef Google Scholar

Mur-Artal, R., Montiel J.M., M. and Tardos, J. D., “ORB-SLAM: A versatile and accurate monocular SLAM system,” IEEE Trans. Robot. 31(5), 1147–1163 (2015).CrossRef Google Scholar

Mahmoud, N., Cirauqui, I., Hostettler, A., C. Doignon, L. Soler, J. Marescaux and J. M. M. Montiel, “ORBSLAM-based Endoscope Tracking and 3D Reconstruction,” Computer-Assisted and Robotic Endoscopy: Third International Workshop, CARE 2016, Held in Conjunction with MICCAI 2016, Athens, Greece, October 17, 2016, Revised Selected Papers 3, (2017) pp. 72–83.Google Scholar

Mahmoud, N., Collins, T., Hostettler, A., Soler, L., Doignon, C. and Montiel, J. M. M, “Live tracking and dense reconstruction for handheld monocular endoscopy,” IEEE Trans. Med. Imag. 38(1), 79–89 (2019).CrossRef Google Scholar PubMed

Recasens, D., Lamarca, J., Facil, J. M., Montiel, J. M. M. and Civera, J., “ Endo-depth-and-motion: Localization and reconstruction in endoscopic videos using depth networks and photometric constraints,” IEEE Robot. Automat. Lett. 6(4), 7225–7232 (2021).CrossRef Google Scholar

Rublee, E., Rabaud, V., Konolige, K., and G. Bradski, “ORB: An Efficient Alternative to SIFT or SURF,” IEEE International Conference on Computer Vision, ICCV 2011, (2011).Google Scholar

Campos, C., Elvira, R., Rodriguez, J., M. Montiel and J. D. Tardós, “ORB-SLAM3: An accurate open-source library for visual, visual-inertial, and multimap SLAM,” IEEE Trans. Robot. Publ. IEEE Robot. Automat. Soc 37(6), 1874–1890 (2021).Google Scholar

Detone, D., Malisiewicz, T. and Rabinovich, A., “SuperPoint: Self-supervised interest point detection and description (2017), arXiv: 1712.07629.Google Scholar

Chang, P.-L., Stoyanov, D., Davison, A. J., and P. E. Edwards, “Real-time Dense Stereo Reconstruction Using Convex Optimisation with a Cost-volume for Image-guided Robotic Surgery,” Medical Image Computing and Computer-Assisted Intervention-MICCAI 2013: 16th International Conference, Nagoya, Japan, September 22-26, 2013, Proceedings, Part I 16, (2013) pp. 42–49.Google Scholar

Lin, B., Sun, Y., Sanchez, J. and X. Qian “Vesselness based Feature Extraction for Endoscopic Image Analysis, "2014 IEEE 11th International Symposium on Biomedical Imaging (ISBI), (2014) pp. 1295–1298.Google Scholar

Engel, J., Koltun, V. and Cremers, D., “Direct sparse odometry,” (2016): arXiv e-prints.Google Scholar

Zubizarreta, J., Aguinaga, I. and Montiel, J., “Direct sparse mapping,” (2019): arXiv:1904.06577.Google Scholar

Forster, C., Pizzoli, M. and Scaramuzza, D., “SVO: Fast Semi-direct Monocular Visual Odometry,” IEEE International Conference on Robotics & Automation, (2014).Google Scholar

Sarlin, P. E., Detone, D., Malisiewicz, T., and A. Rabinovich, “SuperGlue: Learning Feature Matching With Graph Neural Networks,” 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2020).Google Scholar

Zhang, S., Zhao, L., Huang, S. and Q. Hao, “A template-based 3D reconstruction of colon structures and textures from stereo colonoscopic images,” IEEE Trans. Med. Robot. Bionics 3(1), 85--95 (2021).CrossRef Google Scholar

Mur-Artal, R. and Tardós, J., “ORB-SLAM2: An open-source SLAM system for monocular, stereo, and RGB-D cameras,” IEEE Trans. Robot 33(5), 1255–1262 (2017).CrossRef Google Scholar

https://github.com/UZ-SLAMLab/ORB_SLAM2_Endoscopy.Google Scholar

Durrant-Whyte, H. and Bailey, T., “Simultaneous localization and mapping: Part I,” IEEE Robot. Automat. Magaz. 13(2), 99–110 (2006).CrossRef Google Scholar

Bartoli, A., Montiel, J., Lamarca, J., and Q. Hao, DefSLAM: Tracking and Mapping of Deforming Scenes from Monocular Sequences, (2019): arXiv: 1908.08918.Google Scholar

Gong, H., Chen, L., Li, C., J. Zeng, X. Tao and Y. Wang, “Online tracking and relocation based on a new rotation-invariant haar-like statistical descriptor in endoscopic examination,” IEEE Access 8, 101867–101883 (2020).CrossRef Google Scholar

Song, J., Wang, J., Zhao, L., S. Huang and G. Dissanayake, “Mis-slam: Real-time large-scale dense deformable slam system in minimal invasive surgery based on heterogeneous computing,” IEEE Robot. Automat. Lett. 3(4), 4068–4075 (2018).CrossRef Google Scholar

Wei, G., Feng, G., Li, H., T. Chen, W. Shi and Z. Jiang, “A Novel SLAM Method for Laparoscopic Scene Reconstruction with Feature Patch Tracking,” 2020 International Conference on Virtual Reality and Visualization (ICVRV), (2020) pp. 287–291.Google Scholar

Song, J., Zhu, Q., Lin, J., and M. Ghaffari, “BDIS: Bayesian dense inverse searching method for real-time stereo surgical image matching,” IEEE Trans. Robot., 39(2), 1388--1406 (2022).Google Scholar

Wei, G., Yang, H., Shi, W., Z. Jiang, T. Chen and Y. Wang, “Laparoscopic Scene Reconstruction based on Multiscale Feature Patch Tracking Method,” International Conference on Electronic Information Engineering and Computer Science (EIECS), (2021) pp. 588–592.Google Scholar

Yadav, R. and Kala, R., “Fusion of visual odometry and place recognition for slam in extreme conditions,” Appl. Intell. 52(10), 11928–11947 (2022).CrossRef Google Scholar

Bruno H.M., S. and Colombini, E. L., “LIFT-SLAM: A deep-learning feature-based monocular visual SLAM method,” Neurocomputing 455, 97–110 (2021).CrossRef Google Scholar

Schmidt, A. and Salcudean, S. E., “Real-time rotated convolutional descriptor for surgical environments,” Medical Image Computing and Computer Assisted Intervention-MICCAI 2021: 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part IV 24, (2021) pp. 279–289.Google Scholar

Xu, K., Hao, Y., Wang, C., and L. Xie, “AirVO: An illumination-robust point-line visual odometry, (2022): arXiv preprint arXiv: 2212.07595.Google Scholar

Li, D., Shi, X., Long, Q., S. Liu, W. Yang, F. Wang and F. Qiao, “DXSLAM: A Robust and Efficient Visual SLAM System with Deep Features,” 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), (2020) pp. 4958–4965.Google Scholar

Liu, X., Zheng, Y., Killeen, B., M. Ishii, G. D. Hager, R. H. Taylor and M. Unberath, “Extremely Dense Point Correspondences using a Learned Feature Descriptor,” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2020) pp. 4847–4856.Google Scholar

Liu, X., Li, Z., Ishii, M., G. D. Hager, R. H. Taylor and M. Unberath, “Sage: Slam with Appearance and Geometry Prior for Endoscopy,” 2022 International Conference on Robotics and Automation (ICRA), (2022) pp. 5587–5593.Google Scholar

Muja, M. and Lowe, D. G., “Fast approximate nearest neighbors with automatic algorithm configuration,” Proceedings of the Fourth International Conference on Computer Vision Theory and Applications, Lisboa, Portugal (February 5-8, 2009).Google Scholar

Barbed, O. L., Chadebecq, F., Morlana, J., J. M. Montiel and A. C. Murillo, “ SuperPoint Features in Endoscopy,” MICCAI Workshop on Imaging Systems for GI Endoscopy, (2022) pp. 45–55.Google Scholar

Sarlin, P.-E., Cadena, C., Siegwart, R. and M. Dymczyk, “From Coarse to Fine: Robust Hierarchical Localization at Large Scale,” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2019) pp. 12716–12725.Google Scholar

Oliva Maza, L., Steidle, F., Klodmann, J., K. Strobl and R. Triebel, “An ORB-SLAM3-based approach for surgical navigation in ureteroscopy,” Comput. Methods Biomech. Biomed. Eng. Imag. Visual., 11(4), 1005--1011 (2022).Google Scholar

Zuiderveld, K., “Contrast limited adaptive histogram equalization,” Graphics Gems, 474–485 (1994).CrossRef Google Scholar

Jang, H., Yoon, S. and Kim, A., “Multi-session Underwater Pose-graph Slam using Inter-session Opti-acoustic Two-view Factor,” 2021 IEEE International Conference on Robotics and Automation (ICRA), (2021) pp. 11668–11674.Google Scholar

Rao, S., “SuperVO: A Monocular Visual Odometry based on Learned Feature Matching with GNN,” 2021 IEEE International Conference on Consumer Electronics and Computer Engineering (ICCECE), 2021) pp. 18–26.CrossRef Google Scholar

Su, Y. and Yu, L., “A dense RGB-D SLAM algorithm based on convolutional neural network of multi-layer image invariant feature,” Meas. Sci. Technol. 33(2), 025402 (2021).CrossRef Google Scholar

Strasdat, H., Montiel, J., Davison, A. J., “Real-time Monocular SLAM: Why Filter?,” IEEE International Conference on Robotics and Automation, (2010) pp. 2657–2664.Google Scholar

Szeliski, R.. Computer Vision: Algorithms and Applications (Springer Nature, 2022).CrossRef Google Scholar

Article contents

SPSVO: a self-supervised surgical perception stereo visual odometer for endoscopy

Abstract

Keywords

Access options

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests