Hostname: page-component-848d4c4894-wzw2p Total loading time: 0 Render date: 2024-06-02T19:25:40.547Z Has data issue: false hasContentIssue false

A visual SLAM-based lightweight multi-modal semantic framework for an intelligent substation robot

Published online by Cambridge University Press:  19 April 2024

Shaohu Li
Affiliation:
Institute of Advanced Technology, University of Science and Technology of China, Hefei, China
Jason Gu
Affiliation:
Department of Electrical and Computer Engineering, Dalhousie University, Halifax, Canada
Zhijun Li*
Affiliation:
School of Mechanical Engineering, Tongji University, Shanghai, China Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei, China
Shaofeng Li
Affiliation:
State Grid Anhui Electric Power Company Suzhou Power Supply Company, Suzhou, China
Bixiang Guo
Affiliation:
State Grid Anhui Electric Power Company Fuyang Power Supply Company, Fuyang, China
Shangbing Gao
Affiliation:
State Grid Anhui Electric Power Company Fuyang Power Supply Company, Fuyang, China
Feng Zhao
Affiliation:
State Grid Anhui Electric Power Company Fuyang Power Supply Company, Fuyang, China
Yuwei Yang
Affiliation:
Department of Automation, University of Science and Technology of China, Hefei, China
Guoxin Li
Affiliation:
Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei, China
Lanfang Dong
Affiliation:
School of Computer Science and Technology, University of Science and Technology of China, Hefei, China
*
Corresponding author: Zhijun Li; Email: zjli@ieee.org

Abstract

Visual simultaneous localisation and mapping (vSLAM) has shown considerable promise in positioning and navigating across a variety of indoor and outdoor settings, significantly enhancing the mobility of robots employed in industrial and everyday services. Nonetheless, the prevalent reliance of vSLAM technology on the assumption of static environments has led to suboptimal performance in practical implementations, particularly in unstructured and dynamically noisy environments such as substations. Despite advancements in mitigating the influence of dynamic objects through the integration of geometric and semantic information, existing approaches have struggled to strike an equilibrium between performance and real-time responsiveness. This study introduces a lightweight, multi-modal semantic framework predicated on vSLAM, designed to enable intelligent robots to adeptly navigate the dynamic environments characteristic of substations. The framework notably enhances vSLAM performance by mitigating the impact of dynamic objects through a synergistic combination of object detection and instance segmentation techniques. Initially, an enhanced lightweight instance segmentation network is deployed to ensure both the real-time responsiveness and accuracy of the algorithm. Subsequently, the algorithm’s performance is further refined by amalgamating the outcomes of detection and segmentation processes. With a commitment to maximising performance, the framework also ensures the algorithm’s real-time capability. Assessments conducted on public datasets and through empirical experiments have demonstrated that the proposed method markedly improves both the accuracy and real-time performance of vSLAM in dynamic environments.

Type
Research Article
Copyright
© The Author(s), 2024. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Lu, S., Zhang, Y. and Su, J., “Mobile robot for power substation inspection: A survey,” IEEE/CAA J Automat Sin 4(4), 830847 (2017).CrossRefGoogle Scholar
Li, G., Li, Z., Su, C.-Y. and Xu, T., “Active human-following control of an exoskeleton robot with body weight support,” IEEE Trans Cybernet 53(11), 73677379 (2023).CrossRefGoogle ScholarPubMed
Li, Z., Li, G., Wu, X., Kan, Z., Su, H. and Liu, Y., “Asymmetric cooperation control of dual-arm exoskeletons using human collaborative manipulation models,” IEEE Trans Cybernet 52(11), 1212612139 (2022).CrossRefGoogle ScholarPubMed
Li, J., Cong, M., Liu, D. and Du, Y., “Enhanced task parameterized dynamic movement primitives by GMM to solve manipulation tasks,” Robot Intell Automa 43(2), 8595 (2023).CrossRefGoogle Scholar
Wang, H., Zhang, C., Song, Y., Pang, B. and Zhang, G., “Three-dimensional reconstruction based on visual SLAM of mobile robot in search and rescue disaster scenarios,” Robotica 38(2), 350373 (2020).CrossRefGoogle Scholar
Cai, J., Yan, F., Shi, Y., Zhang, M. and Guo, L., “Autonomous robot navigation based on a hierarchical cognitive model,” Robotica 41(2), 690712 (2023).CrossRefGoogle Scholar
Hao, C., Chengju, L. and Qijun, C., “Self-localization in highly dynamic environments based on dual-channel unscented particle filter,” Robotica 39(7), 12161229 (2021).CrossRefGoogle Scholar
Naudet-Collette, S., Melbouci, K., Gay-Bellile, V., Ait-Aider, O. and Dhome, M., “Constrained RGBD-SLAM,” Robotica 39(2), 277290 (2021).CrossRefGoogle Scholar
Campos, C., Elvira, R., Rodríguez, J. J. G., Montiel, J. M. M. and Tardós, J. D., “ORB-SLAM3: An accurate open-source library for visual, visual-inertial, and multimap SLAM,” IEEE Trans Robot 37(6), 18741890 (2021).CrossRefGoogle Scholar
Fei, H., Wang, Z., Tedeschi, S. and Kennedy, A., “Boosting visual servoing performance through RGB-based methods,” Robot Intell Automat 43(4), 468475 (2023).CrossRefGoogle Scholar
Zhong, F., Wang, S., Zhang, Z., Chen, C. and Wang, Y., “Detect-SLAM: Making Object Detection and SLAM Mutually Beneficial,” In: Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV), (2018) pp. 10011010.Google Scholar
Ul Islam, Q., Ibrahim, H., Chin, P. K., Lim, K. and Abdullah, M. Z., “FADM-SLAM: A fast and accurate dynamic intelligent motion SLAM for autonomous robot exploration involving movable objects,” Robot Intell Automat 43(3), 254266 (2023).CrossRefGoogle Scholar
Liu, Y. and Miura, J., “KMOP-vSLAM: Dynamic Visual SLAM for RGB-D Cameras using K-means and OpenPose,” In: Proceedings of the IEEE/SICE International Symposium on System Integration (SII), (2021) pp. 415420.Google Scholar
Ji, T., Wang, C. and Xie, L., “Towards Real-time Semantic RGB-D SLAM in Dynamic Environments,” In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), (2021) pp. 1117511181.Google Scholar
Xie, W., Liu, P. X. and Zheng, M., “Moving object segmentation and detection for robust RGBD-SLAM in dynamic environments,” IEEE Trans Instru Measure 70, 18 (2021).Google Scholar
Kenye, L. and Kala, R., “Improving RGB-D SLAM in dynamic environments using semantic aided segmentation,” Robotica 40(6), 20652090 (2022).CrossRefGoogle Scholar
Yu, C., Liu, Z., Liu, X.-J., Xie, F., Yang, Y., Wei, Q. and Fei, Q., “DS-SLAM: A Semantic Visual SLAM towards Dynamic Environments,” In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), (2018) pp. 11681174.Google Scholar
Henein, M., Zhang, J., Mahony, R. and Ila, V., “Dynamic SLAM: The Need For Speed,” In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), (2020) pp. 21232129.Google Scholar
Li, J., Cong, M., Liu, D. and Du, Y., “Robot task programming in complex task scenarios based on spatio-temporal constraint calculus,” Robot Intell Automat 43(4), 476488 (2023).CrossRefGoogle Scholar
Miao, R., Jia, Q. and Sun, F., “Long-term robot manipulation task planning with scene graph and semantic knowledge,” Robot Intell Automat 43(1), 1222 (2023).CrossRefGoogle Scholar
Zhang, T., Zhang, H., Li, Y., Nakamura, Y. and Zhang, L., “FlowFusion: Dynamic Dense RGB-D SLAM Based on Optical Flow,” In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), (2020) pp. 73227328.Google Scholar
Dai, W., Zhang, Y., Li, P., Fang, Z. and Scherer, S., “RGB-D SLAM in dynamic environments using point correlations,” IEEE Trans Pattern Anal Mach Intell 44(1), 373389 (2022).CrossRefGoogle ScholarPubMed
Hu, J., Fang, H., Yang, Q. and Zha, W., “MOD-SLAM: Visual SLAM with Moving Object Detection in Dynamic Environments,” In: Proceedings of the 40th Chinese Control Conference (CCC), (2021) pp. 43024307.Google Scholar
Bao, R., Komatsu, R., Miyagusuku, R., Chino, M., Yamashita, A. and Asama, H., “Stereo camera visual SLAM with hierarchical masking and motion-state classification at outdoor construction sites containing large dynamic objects,” Adv Robotics 35(3), 228241 (2021).CrossRefGoogle Scholar
Bescos, B., Fácil, J. M., Civera, J. and Neira, J., “DynaSLAM: Tracking, mapping, and inpainting in dynamic scenes,” IEEE Robot Automat Lett 3(4), 40764083 (2018).CrossRefGoogle Scholar
He, K., Gkioxari, G., Dollár, P. and Girshick, R., “Mask R-CNN,” In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), (2017) pp. 29802988.Google Scholar
Badrinarayanan, V., Kendall, A. and Cipolla, R., “SegNet: A deep convolutional encoder-decoder architecture for image segmentation,” IEEE Trans Pattern Anal Mach Intell 39(12), 24812495 (2017).CrossRefGoogle ScholarPubMed
Wong, J. A. H. A., “Algorithm as 136: A k-means clustering algorithm,” J R Stat Soc 28(1), 100108 (1979).Google Scholar
Dumitriu, A., Tatui, F., Miron, F., Ionescu, R. T. and Timofte, R., “Rip Current Segmentation: A Novel Benchmark and YOLOv8 Baseline Results,” In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2022) pp. 12611271.Google Scholar
Li, H., Li, J., Wei, H., Liu, Z., Zhan, Z. and Ren, Q., “Slim-neck by GSConv: A better design paradigm of detector architectures for autonomous vehicles,” (2022). arXiv preprint, 2022 arXiv: 2206.02424.Google Scholar
Soudy, M., Afify, Y. and Badr, N., “RepConv: A novel architecture for image scene classification on intel scenes dataset,” Int J Intell Comp Infor Sci 22(2), 6373 (2022).Google Scholar
Burri, M., Nikolic, J., Gohl, P., Schneider, T., Rehder, J., Omari, S., Achtelik, M. W. and Siegwart, R., “The EuRoC micro aerial vehicle datasets,” Int J Robot Res 35(10), 11571163 (2016).CrossRefGoogle Scholar
Sturm, J., Burgard, W. and Cremers, D., “Evaluating egomotion and structure-from-motion approaches using the TUM RGB-D benchmark,” In: Proc. of the Workshop on Color-Depth Camera Fusion in Robotics at the IEEE/RJS International Conference on Intelligent Robot Systems (IROS), (2012) pp. 13.Google Scholar
Li, S., Gu, J. and Feng, Y., “Visual SLAM with a Multi-Modal Semantic Framework for the Visually Impaired Navigation-Aided Device”,” In: Proceedings of the IEEE International Conference on Advanced Robotics and Mechatronics (ICARM), (2023) pp. 870876.Google Scholar
Li, B., Zhang, C., Ye, C., Lin, W., Yu, X. and Meng, L., “A Robust Odometry Algorithm for Intelligent Railway Vehicles Based on Data Fusion of Encoder and IMU”,” In: The 46th Annual Conference of the IEEE Industrial Electronics Society In IECON, (2020) pp. 27492753.Google Scholar
Girbés-Juan, V., Armesto, L., Hernández-Ferrándiz, D., Dols, J. F. and Sala, A., “Asynchronous sensor fusion of GPS, IMU and CAN-based odometry for heavy-duty vehicles,” IEEE Trans Veh Technol 70(9), 86178626 (2021).CrossRefGoogle Scholar
Zhang, X., Mononen, T., Mattila, J. and Aref, M. M., “Mobile Robotic Spatial Odometry by Low-Cost IMUs,” In: 14th IEEE/ASME International Conference on Mechatronic and Embedded Systems and Applications, (2018) pp. 16.Google Scholar