A visual SLAM-based lightweight multi-modal semantic framework for an intelligent substation robot

Shaohu Li; Jason Gu; Zhijun Li; Shaofeng Li; Bixiang Guo; Shangbing Gao; Feng Zhao; Yuwei Yang; Guoxin Li; Lanfang Dong

doi:10.1017/S0263574724000511

A visual SLAM-based lightweight multi-modal semantic framework for an intelligent substation robot

Published online by Cambridge University Press: 19 April 2024

Shaohu Li

Jason Gu ,

Zhijun Li ,

Shaofeng Li ,

Bixiang Guo ,

Shangbing Gao ,

Feng Zhao ,

Yuwei Yang ,

Guoxin Li

and

Lanfang Dong

Show author details

Shaohu Li: Affiliation:
Institute of Advanced Technology, University of Science and Technology of China, Hefei, China
Jason Gu: Affiliation:
Department of Electrical and Computer Engineering, Dalhousie University, Halifax, Canada
Zhijun Li*: Affiliation:
School of Mechanical Engineering, Tongji University, Shanghai, China Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei, China
Shaofeng Li: Affiliation:
State Grid Anhui Electric Power Company Suzhou Power Supply Company, Suzhou, China
Bixiang Guo: Affiliation:
State Grid Anhui Electric Power Company Fuyang Power Supply Company, Fuyang, China
Shangbing Gao: Affiliation:
State Grid Anhui Electric Power Company Fuyang Power Supply Company, Fuyang, China
Feng Zhao: Affiliation:
State Grid Anhui Electric Power Company Fuyang Power Supply Company, Fuyang, China
Yuwei Yang: Affiliation:
Department of Automation, University of Science and Technology of China, Hefei, China
Guoxin Li: Affiliation:
Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei, China
Lanfang Dong: Affiliation:
School of Computer Science and Technology, University of Science and Technology of China, Hefei, China
*: Corresponding author: Zhijun Li; Email: zjli@ieee.org

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Visual simultaneous localisation and mapping (vSLAM) has shown considerable promise in positioning and navigating across a variety of indoor and outdoor settings, significantly enhancing the mobility of robots employed in industrial and everyday services. Nonetheless, the prevalent reliance of vSLAM technology on the assumption of static environments has led to suboptimal performance in practical implementations, particularly in unstructured and dynamically noisy environments such as substations. Despite advancements in mitigating the influence of dynamic objects through the integration of geometric and semantic information, existing approaches have struggled to strike an equilibrium between performance and real-time responsiveness. This study introduces a lightweight, multi-modal semantic framework predicated on vSLAM, designed to enable intelligent robots to adeptly navigate the dynamic environments characteristic of substations. The framework notably enhances vSLAM performance by mitigating the impact of dynamic objects through a synergistic combination of object detection and instance segmentation techniques. Initially, an enhanced lightweight instance segmentation network is deployed to ensure both the real-time responsiveness and accuracy of the algorithm. Subsequently, the algorithm’s performance is further refined by amalgamating the outcomes of detection and segmentation processes. With a commitment to maximising performance, the framework also ensures the algorithm’s real-time capability. Assessments conducted on public datasets and through empirical experiments have demonstrated that the proposed method markedly improves both the accuracy and real-time performance of vSLAM in dynamic environments.

Keywords

vSLAM deep learning multi-modal framework substation inspection

Type: Research Article
Information: Robotica , First View , pp. 1 - 15

DOI: https://doi.org/10.1017/S0263574724000511 [Opens in a new window]
Copyright: © The Author(s), 2024. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Lu, S., Zhang, Y. and Su, J., “Mobile robot for power substation inspection: A survey,” IEEE/CAA J Automat Sin 4(4), 830–847 (2017).CrossRef Google Scholar

Li, G., Li, Z., Su, C.-Y. and Xu, T., “Active human-following control of an exoskeleton robot with body weight support,” IEEE Trans Cybernet 53(11), 7367–7379 (2023).CrossRef Google Scholar PubMed

Li, Z., Li, G., Wu, X., Kan, Z., Su, H. and Liu, Y., “Asymmetric cooperation control of dual-arm exoskeletons using human collaborative manipulation models,” IEEE Trans Cybernet 52(11), 12126–12139 (2022).CrossRef Google Scholar PubMed

Li, J., Cong, M., Liu, D. and Du, Y., “Enhanced task parameterized dynamic movement primitives by GMM to solve manipulation tasks,” Robot Intell Automa 43(2), 85–95 (2023).CrossRef Google Scholar

Wang, H., Zhang, C., Song, Y., Pang, B. and Zhang, G., “Three-dimensional reconstruction based on visual SLAM of mobile robot in search and rescue disaster scenarios,” Robotica 38(2), 350–373 (2020).CrossRef Google Scholar

Cai, J., Yan, F., Shi, Y., Zhang, M. and Guo, L., “Autonomous robot navigation based on a hierarchical cognitive model,” Robotica 41(2), 690–712 (2023).CrossRef Google Scholar

Hao, C., Chengju, L. and Qijun, C., “Self-localization in highly dynamic environments based on dual-channel unscented particle filter,” Robotica 39(7), 1216–1229 (2021).CrossRef Google Scholar

Naudet-Collette, S., Melbouci, K., Gay-Bellile, V., Ait-Aider, O. and Dhome, M., “Constrained RGBD-SLAM,” Robotica 39(2), 277–290 (2021).CrossRef Google Scholar

Campos, C., Elvira, R., Rodríguez, J. J. G., Montiel, J. M. M. and Tardós, J. D., “ORB-SLAM3: An accurate open-source library for visual, visual-inertial, and multimap SLAM,” IEEE Trans Robot 37(6), 1874–1890 (2021).CrossRef Google Scholar

Fei, H., Wang, Z., Tedeschi, S. and Kennedy, A., “Boosting visual servoing performance through RGB-based methods,” Robot Intell Automat 43(4), 468–475 (2023).CrossRef Google Scholar

Zhong, F., Wang, S., Zhang, Z., Chen, C. and Wang, Y., “Detect-SLAM: Making Object Detection and SLAM Mutually Beneficial,” In: Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV), (2018) pp. 1001–1010.Google Scholar

Ul Islam, Q., Ibrahim, H., Chin, P. K., Lim, K. and Abdullah, M. Z., “FADM-SLAM: A fast and accurate dynamic intelligent motion SLAM for autonomous robot exploration involving movable objects,” Robot Intell Automat 43(3), 254–266 (2023).CrossRef Google Scholar

Liu, Y. and Miura, J., “KMOP-vSLAM: Dynamic Visual SLAM for RGB-D Cameras using K-means and OpenPose,” In: Proceedings of the IEEE/SICE International Symposium on System Integration (SII), (2021) pp. 415–420.Google Scholar

Ji, T., Wang, C. and Xie, L., “Towards Real-time Semantic RGB-D SLAM in Dynamic Environments,” In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), (2021) pp. 11175–11181.Google Scholar

Xie, W., Liu, P. X. and Zheng, M., “Moving object segmentation and detection for robust RGBD-SLAM in dynamic environments,” IEEE Trans Instru Measure 70, 1–8 (2021).Google Scholar

Kenye, L. and Kala, R., “Improving RGB-D SLAM in dynamic environments using semantic aided segmentation,” Robotica 40(6), 2065–2090 (2022).CrossRef Google Scholar

Yu, C., Liu, Z., Liu, X.-J., Xie, F., Yang, Y., Wei, Q. and Fei, Q., “DS-SLAM: A Semantic Visual SLAM towards Dynamic Environments,” In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), (2018) pp. 1168–1174.Google Scholar

Henein, M., Zhang, J., Mahony, R. and Ila, V., “Dynamic SLAM: The Need For Speed,” In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), (2020) pp. 2123–2129.Google Scholar

Li, J., Cong, M., Liu, D. and Du, Y., “Robot task programming in complex task scenarios based on spatio-temporal constraint calculus,” Robot Intell Automat 43(4), 476–488 (2023).CrossRef Google Scholar

Miao, R., Jia, Q. and Sun, F., “Long-term robot manipulation task planning with scene graph and semantic knowledge,” Robot Intell Automat 43(1), 12–22 (2023).CrossRef Google Scholar

Zhang, T., Zhang, H., Li, Y., Nakamura, Y. and Zhang, L., “FlowFusion: Dynamic Dense RGB-D SLAM Based on Optical Flow,” In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), (2020) pp. 7322–7328.Google Scholar

Dai, W., Zhang, Y., Li, P., Fang, Z. and Scherer, S., “RGB-D SLAM in dynamic environments using point correlations,” IEEE Trans Pattern Anal Mach Intell 44(1), 373–389 (2022).CrossRef Google Scholar PubMed

Hu, J., Fang, H., Yang, Q. and Zha, W., “MOD-SLAM: Visual SLAM with Moving Object Detection in Dynamic Environments,” In: Proceedings of the 40th Chinese Control Conference (CCC), (2021) pp. 4302–4307.Google Scholar

Bao, R., Komatsu, R., Miyagusuku, R., Chino, M., Yamashita, A. and Asama, H., “Stereo camera visual SLAM with hierarchical masking and motion-state classification at outdoor construction sites containing large dynamic objects,” Adv Robotics 35(3), 228–241 (2021).CrossRef Google Scholar

Bescos, B., Fácil, J. M., Civera, J. and Neira, J., “DynaSLAM: Tracking, mapping, and inpainting in dynamic scenes,” IEEE Robot Automat Lett 3(4), 4076–4083 (2018).CrossRef Google Scholar

He, K., Gkioxari, G., Dollár, P. and Girshick, R., “Mask R-CNN,” In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), (2017) pp. 2980–2988.Google Scholar

Badrinarayanan, V., Kendall, A. and Cipolla, R., “SegNet: A deep convolutional encoder-decoder architecture for image segmentation,” IEEE Trans Pattern Anal Mach Intell 39(12), 2481–2495 (2017).CrossRef Google Scholar PubMed

Wong, J. A. H. A., “Algorithm as 136: A k-means clustering algorithm,” J R Stat Soc 28(1), 100–108 (1979).Google Scholar

Dumitriu, A., Tatui, F., Miron, F., Ionescu, R. T. and Timofte, R., “Rip Current Segmentation: A Novel Benchmark and YOLOv8 Baseline Results,” In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2022) pp. 1261–1271.Google Scholar

Li, H., Li, J., Wei, H., Liu, Z., Zhan, Z. and Ren, Q., “Slim-neck by GSConv: A better design paradigm of detector architectures for autonomous vehicles,” (2022). arXiv preprint, 2022 arXiv: 2206.02424.Google Scholar

Soudy, M., Afify, Y. and Badr, N., “RepConv: A novel architecture for image scene classification on intel scenes dataset,” Int J Intell Comp Infor Sci 22(2), 63–73 (2022).Google Scholar

Burri, M., Nikolic, J., Gohl, P., Schneider, T., Rehder, J., Omari, S., Achtelik, M. W. and Siegwart, R., “The EuRoC micro aerial vehicle datasets,” Int J Robot Res 35(10), 1157–1163 (2016).CrossRef Google Scholar

Sturm, J., Burgard, W. and Cremers, D., “Evaluating egomotion and structure-from-motion approaches using the TUM RGB-D benchmark,” In: Proc. of the Workshop on Color-Depth Camera Fusion in Robotics at the IEEE/RJS International Conference on Intelligent Robot Systems (IROS), (2012) pp. 13.Google Scholar

Li, S., Gu, J. and Feng, Y., “Visual SLAM with a Multi-Modal Semantic Framework for the Visually Impaired Navigation-Aided Device”,” In: Proceedings of the IEEE International Conference on Advanced Robotics and Mechatronics (ICARM), (2023) pp. 870–876.Google Scholar

Li, B., Zhang, C., Ye, C., Lin, W., Yu, X. and Meng, L., “A Robust Odometry Algorithm for Intelligent Railway Vehicles Based on Data Fusion of Encoder and IMU”,” In: The 46th Annual Conference of the IEEE Industrial Electronics Society In IECON, (2020) pp. 2749–2753.Google Scholar

Girbés-Juan, V., Armesto, L., Hernández-Ferrándiz, D., Dols, J. F. and Sala, A., “Asynchronous sensor fusion of GPS, IMU and CAN-based odometry for heavy-duty vehicles,” IEEE Trans Veh Technol 70(9), 8617–8626 (2021).CrossRef Google Scholar

Zhang, X., Mononen, T., Mattila, J. and Aref, M. M., “Mobile Robotic Spatial Odometry by Low-Cost IMUs,” In: 14th IEEE/ASME International Conference on Mechatronic and Embedded Systems and Applications, (2018) pp. 1–6.Google Scholar

Article contents

A visual SLAM-based lightweight multi-modal semantic framework for an intelligent substation robot

Abstract

Keywords

Access options

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests