Navigation for multi-humanoid using MFO-aided reinforcement learning approach

Abhishek Kumar Kashyap; Dayal R. Parhi; Vikas Kumar

doi:10.1017/S0263574722001357

Navigation for multi-humanoid using MFO-aided reinforcement learning approach

Published online by Cambridge University Press: 30 September 2022

Abhishek Kumar Kashyap

Dayal R. Parhi and

Vikas Kumar

Show author details

Abhishek Kumar Kashyap*: Affiliation:
Robotics Laboratory, Mechanical Engineering Department, National Institute of Technology, Rourkela, Odisha 769008, India Robotics Laboratory, Mechanical Engineering Department, MIT Art Design & Technology University, Pune, Maharashtra 412201, India
Dayal R. Parhi: Affiliation:
Robotics Laboratory, Mechanical Engineering Department, National Institute of Technology, Rourkela, Odisha 769008, India
Vikas Kumar: Affiliation:
Robotics Laboratory, Mechanical Engineering Department, National Institute of Technology, Rourkela, Odisha 769008, India
*: *Corresponding author. E-mail: Akkashyapmech@gmail.com

Article contents

Abstract
Introduction
Moth–flame optimization
Reinforcement learning
Petri-Net controller for dynamic obstacle avoidance
Hybridization of MFO and RL (hybrid MFO-RL) controller
Simulation and experimental results
Comparison
Conclusion
Authors contributions
Financial support
Conflict of interest
Ethical contributions
References

Rights & Permissions

Abstract

The given article emphasizes the development and modeling of a hybrid navigational controller to optimize the path length and time taken. The proposed navigational controller is developed by hybridizing the metaheuristic moth–flame optimization (MFO) approach and the reinforcement learning (RL) approach. Input parameters like obstacle and target locations are fed to the MFO controller that implements a proper navigational direction selection. It forwards to the RL controller, which exercises further refinement of the output turning angle around obstacles. The collaboration of the global MFO approach with the local-based RL approach helps to optimize the path traversed by the humanoid robot in an unknown environment. The major breakthrough in this article is the utilization of humanoid robots for navigation purposes between various checkpoints. The humanoid robots are placed in a cluttered environment and assigned specific target positions to complete the assigned tasks. In the case of a multi-humanoid robot system, to avoid self-collision, it requires a Petri-Net controller to be configured in the navigation system to prevent deadlock situations and enhance the smooth completion of tasks without inter-collision among the humanoid robots. Simulations and real-time experiments are undertaken using different controllers involving single- and multi-humanoid robot systems. The robustness of the proposed controller is also validated in dynamic environment. Comparisons are carried with an established navigational controller in a similar environmental setup, which proves the proposed hybrid controller to be robust and efficient.

Keywords

moth–flame optimization reinforcement learning approach humanoid robot path planning Petri-Net controller

Type: Research Article
Information: Robotica , Volume 41 , Issue 1 , January 2023 , pp. 346 - 369

DOI: https://doi.org/10.1017/S0263574722001357 [Opens in a new window]
Copyright: © The Author(s), 2022. Published by Cambridge University Press

1. Introduction

Humanoid robots are a prominent area of interest among current researchers due to their well-perceived ability to emulate human behavior and replicate task deportment. Various sensors and interaction devices implanted on humanoid robots help smooth proceedings between the robot and its environment, hence improvising task completion’s easiness. Due to humanoid robots’ consistent behavior in performing repetitive jobs, a significant chunk of production and manufacturing lines employ robots to complete monotonous work. The application of humanoid robots in various segments of the industry considerably improves the work efficiency of the process and reduces accident probability to a minimum level. The implementation of humanoid robots in perilous conditions extends the domain of its application and supersedes man’s applicability in these conditions. In lieu of the given advantages of the humanoid robot, it is employed in various sectors like automobile assembly, underwater repairing, offshore drilling, and basic household services. Since the humanoid robot is being widely used in various sectors, energy minimization during tasks’ performance is a major concern from an economic point of view. Various researchers have employed the use of artificial intelligence (AI) approach for the path-planning of mobile robots and humanoids, which are discussed below.

Authors [Reference Berquin and Zell1] have provided a novel forward model-based technique for lidar analytical approach. Existing mapping methods have a number of flaws that can be traced back to an absence of a clear forward concept. The goal of Wu et al. [Reference Wu, Du and Zhang2] has been to create a generalized wavefront method for mobile robot route mapping. Multi-objective point combinations, layered grid expenses, logarithmic expansion across barriers, and route improvement are all part of the research. Liang and Xu [Reference Liang and Xu3] have emphasized the development of a mutated simulated annealing (SA) approach to applying on wheeled robot global navigation. They modified the conventional method since it has a demerit of slow convergence rate. Zhu et al. [Reference Zhu, Wang, He and Sun4] have discussed the memetic approach (MA) of path planning and examined it on simulated maps and correlated with other counterpart approaches. They also described that the result came from the MA has better efficiency than the other conventional approaches. Botzheim et al. [Reference Botzheim, Toda and Kubota5] have emphasized a bacterial MA to recognize a collision-free route for the wheeled robot by minimizing the path length and the number of divergences in the trodden path. In this paper, Ant Colony Optimization (ACO) has been used to determine the presence of intra-class pheromone secreted by other ants to determine the shortest path. Liang et al. [Reference Liang, Li, Wu and Chen6] have emphasized the use of bacterial behavior to find out an optimal path without collision with the hindrances, with the help of bacterial foraging approach, in order to make a bio-inspired route outlining an approach for a wheeled robot.

The authors have created a navigation framework that uses an evolutionary threshold filtering technique to identify barriers in a sliding window, categories the identified barriers with a tree structure, heuristically forecasts future collisions, and uses a simplified Morphin method to select the best path [Reference Chen, Wu and He7]. Gao and Tian [Reference Gao and Tian8] have evaluated the refined SA mixed optimization approach and the artificial neural network and implemented it to wheeled robot motion planning in an unknown environment. This approach improves the protective measures and enhances the convergence value of the SA approach. It also minimizes the computing time of route outlining; thereby, it becomes easy and fast to obtain the optimal global solution. Jun and Qingbao [Reference Jun and Qingbao9] have presented a multi-objective mobile robot route planning approach based on an enhanced genetic algorithm. The method introduces a chaotic series and a heuristic technique focuses on ecological expertise to initialize the population in order to enhance the individuals’ ergodicity and practicability in the search area. Yue and Wang [Reference Yue and Wang10] have focused on the use of applied SA with a compound shape method for the route outlining for the neural network of wheeled robots. Along with achieving the globally optimal result that identifies the SA approach, it has been used for an ideal declining inclination, thus refining the convergence value. Sharifi and Vinke [Reference Janabi-Sharifi and Vinke11] have focused on the use of a SA approach to preventing the entrapment of robots in a local minima position. Authors have applied it for local path planning in a stationary environment to increase the effectiveness of selecting path and to ensure robustness. Alfaro and Garcia [Reference Martınez-Alfaro and Gomez-Garcıa12] have proposed a fuzzy logic and SA approach to design an automated route outlining approach of wheeled robot. Proposed approach was used to regulate the velocity of the wheeled robot at the time of navigation, and it was applied in-between of the fixed polygonal hindrances by applying 49 fuzzy rules to get collision-free optimal path.

A behavior-based neural network and reactive command framework for mobile robot guidance are proposed by Pandey and Parhi [Reference Pandey and Parhi13]. As sources, two distinct reactive behaviors have been used that includes location and angle of the obstacles. Ganganath and Cheng [Reference Ganganath and Cheng14] have presented an off-line path planner for which the basic was ACO approach (ACO2 Gauss) for wheeled robots. Chang et al. [Reference Chang, Shan, Jiang and Dai15] has presented an advanced dynamic window approach based on Q-learning. To improve the efficiency of global guidance, the fundamental assessment algorithms are updated and enhanced by introducing two additional evaluation processes. Wei and Zhao [Reference Wei and Zhao16] have developed a three-part technique that includes motion primitives, a Bayesian network, and an unique coupling neural network. To decouple human arm motions, motion elements are utilized. Arm motion categorization enhances the precision of human-like gestures.

Kusuma et al. [Reference Kusuma, Riyanto and Machbub17] have emphasized the usage of the A^* search approach for path-planning of a humanoid robot in various home assistance operations. The methodology aims for an efficient path sketching and rerouting of the humanoid robot’s path in case it of unreachability to a given checkpoint. Sabe et al. [Reference Sabe, Fukuchi, Gutmann, Ohashi, Kawamoto and Yoshigahara18] have proposed a stereo vision for path planning and obstacle detection of a QRIO humanoid robot in home environments. The grid detection methodology is implemented for obstacle avoidance along the proposed path. Huang et al. [Reference Huang, Kim and Atkeson19] have proposed a star search navigational strategy for efficient footstep planning of humanoid robots with a major emphasis on energy consumption in the navigation process. The given methodology is based on the optimization of step length, step width, and the turning angle of the humanoid robot. Lee et al. [Reference Lee, Heo, Park, Yang, Jang, Benz, Park, Kweon and Oh20] have proposed the concept and modeled a service-providing humanoid robot for performing day-to-day tasks in an unknown environment with an in-built 3D object detection system. Simulations in real-time environments show that the proposed robot performs the given tasks at about 24% efficiency of human beings. Lagaza et al. [Reference Lagaza, Kashyap and Pandey21] have emphasized the use of the Spider Monkey algorithm for the optimization of the path traced between checkpoints in static and dynamic environments.

As evident from the study of the presented research papers, a significant amount of work has been done for path planning in mobile robots. Research work involving the navigation of humanoid robots in unknown environments is scarce. Path planning approaches involving humanoid robots have contributed a meager amount in research works with almost no proper work done for navigation of robots between multiple targets. In the current scenario, the robots employed in various industries are multidimensional, requiring various skills to complete the task. This includes the design of a mechanical engine, repair works, and various other breakdown maintenance tasks. Due to major complications in the humanoid robots’ path planning, a hybrid navigational strategy combining the global moth–flame optimization approach with the accurate and local reinforcement learning (MFO-RL) approach has been proposed in this article. The MFO is a global path planning approach that cancels out the probability of local minima entrapment, whereas RL being a local path planning approach that optimizes the angle of turning of the humanoid robot around various obstacles. The paper also focuses on a multi-humanoid system’s path planning in an unknown environment, which has not been done actively yet. In the case of multi-humanoid navigation, a situation of deadlock arises when a robot enters another robot’s proximity zone. To avoid inter-collision among the robots, a Petri-Net controller is configured alongside the hybrid controller for the path-planning of multi-humanoid system. The Petri-Net controller, in this scenario, negotiates between the humanoid robots and cancels out any hindrance during the navigation. The various research works done on the MFO approach, RL approach, and Petri-Net controller are enlisted below.

Jalali et al. [Reference Jalali, Hedjam, Khosravi, Heidari, Mirjalili and Nahavandi22] have proposed MFO-based multilayer perceptron network for trajectory planning of an autonomous robot in an unknown environment. The proposed MFO methodology, entrusted with controlling the NN controller’s weight and bias parameters compared with various evolutionary and gradient-based approaches, demonstrates its clear superiority. Abdullah et al. [Reference Abdullah, Rashid, Ponnambalam and Ghazalli23] have developed an MFO-based approach for the optimization of energy usage at assembly stations. The idle energy in the assembly sequencing problems is optimized using MFO-based controller, which performs much better than the genetic algorithm (GA), Particle Swarm Optimization (PSO), and ACO controllers in terms of robustness, feasibility, and computational time. Mehne and Mirjalili [Reference Mehne and Mirjalili24] have proposed the use of the MFO approach for feedback control on a general nonlinear problem. The methodology is used to convert a given infinite-dimensional problem to a finite one using a set of coefficients. The application of the proposed controller to certain established problems demonstrates its efficiency in achieving optimum results in this regard. Elaziz et al. [Reference Elaziz, Ewees, Ibrahim and Lu25] have explored the applications of MFO-based approaches in various benchmark works in conjunction with opposition based learning (OBL) and Differential evolution (DE) approaches for better initial insect population generation and exploration facilities. The results of the experiment are evidenced by the fact that the proposed approach performs better than the established ones.

Gao et al. [Reference Gao, Liu, Wu and Wang26] have illustrated the use of the Q-learning reinforcement approach for the path-planning of mobile robots in cluttered environments. The proposed methodology involves the generation of an obstacle-free path in the first step and optimization of the generated path in the next step. The emphasis was to establish the proposed method, superior to the BFS breadth-first search (BFS) and rapidly exploring random trees (RRT) methods. Fakoor et al. [Reference Fakoor, Kosari and Jafarzadeh27] have solved the latter challenge by using a unique method wherein the speed is decided using fuzzy Markov decision process (MDP).

Trinh et al. [Reference Trinh, Ekström and Cürüklü28] have emphasized the usage of dipole interaction systems for the path-planning of humanoid robots in cluttered environments. Furthermore, a Petri-Net framework with a dynamic window approach has been used for dynamic obstacle avoidance in unknown environment navigation. Parhi and Mohanta [Reference Parhi and Mohanta29] have emphasized the use of fuzzy logic controller combined with the Petri-Net framework, which helps in avoidance of dynamic obstacles during the navigation of humanoid robots in the given terrain. Kumar et al. [Reference Kumar, Muni and Parhi30] have proposed the usage of a hybrid RA-fuzzy logic controller for hassle-free travel of humanoid robots in unknown environments. The Petri-Net controller attached to the proposed framework helps in the prevention of inter-collision among the humanoid robots during task completion.

Implementation of the sole AI approach for the path-planning of a robot takes an enormous amount of time and renders the strategy uneconomical. Furthermore, the single technique used often leads to the robot being captive in a local minima situation where the conditions created are unresolvable. The hybrid navigational strategy proposed in the article deals with the entrapment of the robot in a local minima condition and guides in faster convergence toward the target with minimum deviations. The hybrid controller is implemented in single- and multi-humanoid robots in a common platform, which is still a topic for research. It is assessed against various standalone approaches based on path length. The hybrid controller leads to the successful completion of the task and has not been used anywhere yet by author’s knowledge. The MFO approach is a global search strategy that is based on the generation of fitness function of various sample spaces in the unknown environment to sketch out an optimum path for the humanoid robot navigation. The RL approach is a more accurate local search strategy that helps in the determination of the optimum turning angle of the humanoid for collision-free navigation. RL optimizes the footstep planning to efficiently avoid obstacles with any shapes. The algorithm also helps the robot to find out the prioritized target to decrease the overall length of the path. The hybridization of these two approaches are implemented in a single controller and configured alongside a Petri-Net controller for dynamic obstacle clearance. The proposed controller is also capable of solving the problem of avoidance of dynamic obstacles (multi-humanoid robots). The layout of the article is demonstrated as follows. Section 2 emphasizes the MFO approach, whereas the RL approach is described in Section 3. The Petri-Net controller, which is used to solve the conflict problem, is presented in Section 4. The hybrid controller based on the MFO and RL approach is presented in Section 5. Various simulations and experimental results using the proposed hybrid controller are shown in Section 6, whose comparisons with previously established approaches are discussed in Section 7. Section 8 demonstrates the conclusion of the presented work and gives the scope for future improvisations in the given context.

2. Moth–flame optimization

It is a bio-inspired swarm optimization method based on moth’s navigational strategy during the night. This paper’s navigational mechanism is referred to as the transverse orientation in which moths can travel in a straight line for longer distances by maintaining a fixed angle with respect to the moon. However, the presence of artificial light sources in the environment, moths, due to their proximity to these artificial sources, is influenced on a larger scale and maintains a certain angle with them, resulting in the travel path being spiral. In this approach, the moths traverse the spiral path while exploiting the search space to reach the maximum fitness function value attainable by the flame position with respect to other moths.

In this optimization approach, the moths are randomly placed in the solution space, where each moth is assigned a solution in the sample space. Moths are assigned a specific fitness function and a flame that stores the best solution found by the moth. With every iteration, the moth traverses a spiral path around the flame, thus updating its fitness function and new positions. The position matrix for’ $x$ ‘ number of moths in’ $n$ ‘dimensional solution space are as follows:

(1)

\begin{equation} M=\left[\begin{array}{l@{\quad}l@{\quad}l@{\quad}l} m_{1,1} & m_{1,2} & \ldots & m_{1,n}\\[3pt] m_{2,1} & \ldots & \ldots & \ldots \\[3pt] \ldots & \ldots & \ldots & \ldots \\[3pt] m_{n,1} & \ldots & \ldots & m_{n,n} \end{array}\right] \end{equation}

The fitness function for’ $x$ ‘number of moths are as follows:

(2)

\begin{equation} MF=\left[\begin{array}{l} MF_{1}\\[3pt] MF_{2}\\ \ldots \\[3pt] MF_{x} \end{array}\right] \end{equation}

The position matrix for ‘ $x$ ’ number of flames in ‘ $n$ ’ dimensional solution space are as follows:

(3)

\begin{equation} F=\left[\begin{array}{l@{\quad}l@{\quad}l} F_{1,1} & \ldots & F_{1,n}\\ \ldots & \ldots & \ldots \\ F_{x,1} & \ldots & F_{x,n} \end{array}\right] \end{equation}

The complete fitness function of flames based on fitness function of each flame is given as as follows:

(4)

\begin{equation} FF=\left[\begin{array}{l} FF_{1}\\[3pt] FF_{2}\\ \ldots \\[3pt] FF_{3} \end{array}\right] \end{equation}

where $F_{j}$ is the $j^{th}$ flame. The fitness function refers to the evaluation of the positions based on the suitability of the given path for travel, that is, the higher the fitness function value, the higher are the chances of treading via that path. The functions used for position updation of moths that contains $M$ denotes random position of moths, and $E$ presents the termination search operation.

The random positioning of moths is done by the $M$ function which is given as:

(5)

\begin{equation} M\!\left(i,j\right)=\left(U\!\left(i\right)-L\!\left(j\right)\right)*rand\!\left(\right)+L\!\left(i\right) \end{equation}

where $U$ and $L$ are arrays defining upper and lower bounds of variables, respectively.

The movement of moths in the search space is based on traverse orientation, and the path model used is the logarithmic spiral curve. The curve has the following properties:

(i) Spiral’s initial point is the starting point of the moth and the final point is the next position of the flame.

(ii) Fluctuations in the spiral curve’s range should be within the search space domain.

The movement of moths in the search space is given by the H function whose equation is given as:

(6)

\begin{equation} H\!\left(M_{i},F_{j}\right)=D_{i}e^{pt}\cos \!\left(2\Pi t\right)+F_{j} \end{equation}

where $H$ represents the movement of moths in search space, $M_{i}$ is the $i^{th}$ moth, $F_{j}$ is the $j^{th}$ flame, $p$ is the defining parameter for the spiral, $t$ is the time between [–1,+1], and $D_{i}$ is the distance between $i^{th}$ moth and $j^{th}$ flame and is given by:

(7)

\begin{equation} D_{i}=\left| F_{j}-M_{i}\right| \end{equation}

Thus, with the completion of each iteration, the moths and flames’ fitness value get updated, which ensures that the exact location of the best solution is found. The terminal function $E$ is based on the termination criteria as proposed by the programmer which is limited by the number of epochs or a certain minimum fitness function value. The flow diagram of the working of an MFO approach is presented in Fig. 1.

Figure 1. Flow chart of MFO approach in humanoid robot navigation.

Steps for carrying out the MFO approach:

Step 0: Set the number of moth and flame and the maximum no. of iterations ‘ $\delta$ ’ to be carried out. Set the upper and lower bound of the function and initialize random generation of the moths.
Step 1: Calculate the fitness function for each moth and flame and tag the best position by comparing the fitness function of flames.
Step 2: Update the flame position by the fitness function criteria.
Step 3: Calculate the value of $D_{i}$ for the corresponding moth and update it’s $M\!\left(i,j\right)$ using the relation:
(8) \begin{equation} M_{i+1}=H\!\left(M_{i},F_{j}\right) \end{equation}
Step 4: If no. of iterations $\geq \delta$ , terminate the process or else, reiterate the process.

2.1. Description of the optimal navigational fitness function

Assume a humanoid NAO going through an environment having a starting location $\left(X_{s},Y_{s}\right)$ and a goal location $\left(X_{t},Y_{t}\right)$ , and a barrier at $\left(X_{o},Y_{o}\right)$ . The environment is made up of a number of robots that behave as dynamic barriers to one another. The main purpose is to develop the humanoid NAO smart enough to evade static and dynamic barriers (another humanoid NAO) and reach the goal in the least period of time. In addition, the humanoid NAO should create the shortest and best route possible while also ensuring its smoothness. In this part, the objective navigational mechanism is built while keeping these factors into account. As there is a possibility of inter-collision, the motion of another robot also disrupts their activities. As a result, in order to comprehend the situation and evade self-collision, the robot’s movement is also evaluated.

The objective feature is based on the minimum length to the goal, the evasion of barriers, and the smooth path. These aims should be met in order to complete the assignments with the minimal computational effort.

2.1.1. Minimum route length

The design of the minimum route length is considered as the primary goal of humanoid NAO’s trajectory mapping. Between the starting position and the goal, the robot must take the quickest path. To find the optimal path, it ought to be a Euclidean length. At each cycle, the robot’s location ought to be accurately adjusted to obtain the shortest trip interval between it and the destination. The module is based on the humanoid NAO’s optimal positioning and goal. It is defined as [Reference Ajeil, Ibraheem, Sahib and Humaidi31]:

(9)

\begin{equation} f_{1}\!\left(X,Y\right)=d\!\left[\left(X_{Hr}\!\left(i\right),\mathrm{Y}_{Hr}\!\left(i\right)\right),\left(X_{t},\mathrm{Y}_{t}\right)\right] \end{equation}

where $\left(X_{Hr}\!\left(i\right),\mathrm{Y}_{Hr}\!\left(i\right)\right)$ is the humanoid NAO’s location at $i^{th}$ point. There could be $n$ locations between starting and goal, with the coordinate of humanoid NAO at $n^{th}$ location is $\left(X_{Hr}\!\left(n\right),\mathrm{Y}_{Hr}\!\left(n\right)\right)$ . The overall path distance is equivalent to the total of the lengths between the humanoid NAO’s various sites. It is defined as [Reference Ajeil, Ibraheem, Sahib and Humaidi31]:

(10)

\begin{equation} E_{spl}=\sum _{i=1}^{n}d\!\left[\left(X_{Hr}\!\left(i\right),\mathrm{Y}_{Hr}\!\left(i\right)\right),\left(X_{t},\mathrm{Y}_{t}\right)\right]=\sum _{i=1}^{n}spl \end{equation}

(11)

\begin{equation} spl=\sqrt{\left(Y_{Hr}\!\left(i+1\right)-Y_{Hr}\!\left(i\right)\right)^{2}+\left(X_{Hr}\!\left(i+1\right)-X_{Hr}\!\left(i\right)\right)^{2}}+\sqrt{\left(Y_{Hr}\!\left(n\right)-Y_{t}\right)^{2}+\left(X_{Hr}\!\left(n\right)-X_{t}\right)^{2}} \end{equation}

2.1.2. Barrier evasion

For safe trajectory mapping, barrier evasion should be considered in addition to the minimal path length. The utility is determined by the location of $j^{th}$ barrier and the humanoid NAO at $i^{th}$ location. The following is a representation of the framework [Reference Ajeil, Ibraheem, Sahib and Humaidi31]:

(12)

\begin{equation} f_{2}\!\left(X,Y\right)=\left[\left(X_{o}\!\left(j\right),\mathrm{Y}_{o}\!\left(j\right)\right),\left(X_{Hr}\!\left(i\right),\mathrm{Y}_{Hr}\!\left(i\right)\right)\right] \end{equation}

The range between the humanoid NAO at location $i^{th}$ and the barrier at location $j^{th}$ must be kept to a minimum (safe). The humanoid NAO must travel to the closest, secured location to the barrier. The entire length of the route between them is denoted by [Reference Ajeil, Ibraheem, Sahib and Humaidi31]:

(13)

\begin{equation} E_{oa}=\sum _{j=1}^{m}\sum _{i=1}^{n}d\!\left[\left(X_{o}\!\left(j\right),\mathrm{Y}_{o}\!\left(j\right)\right),\left(X_{Hr}\!\left(i\right),\mathrm{Y}_{Hr}\!\left(i\right)\right)\right]=\sum _{j=1}^{m}\sum _{i=1}^{n}oa \end{equation}

The above equation represents the discrete situation while achieving the target. It is the summation of the distance between the robot at and $i^{th}$ position and barrier at $j^{th}$ position. Likewise, it draws a line between the start point and the target by considering the barrier.

where, $m$ is the count of barriers.

(14)

\begin{equation} oa=\sqrt{\left(Y_{Hr}\!\left(i\right)-Y_{o}\!\left(j\right)\right)^{2}+\left(X_{Hr}\!\left(i\right)-X_{o}\!\left(j\right)\right)^{2}} \end{equation}

2.1.3. Soothing of route

The purpose of performance management is to maintain a smooth path while evading barriers. It denotes the reduction of angle fluctuation from the Euclidean route to the smallest possible value (from the humanoid NAO at $i^{th}$ location to the goal). It is formulated as [Reference Ajeil, Ibraheem, Sahib and Humaidi31]:

(15)

\begin{equation} f_{3}\!\left(X,Y\right)=\left| \alpha \!\left[H\!\left(i\right),H\!\left(i+1\right)\right],\alpha \!\left[H\!\left(i\right),\mathrm{T}\right]\right| \end{equation}

where $\alpha \!\left[H\!\left(i\right),H\!\left(i+1\right)\right]$ and $\alpha \!\left[H\!\left(i\right),\mathrm{T}\right]$ are the angles between the humanoid NAO’s route at $i^{th}$ and $\left(i+1\right)^{th}$ spots and the angle between the humanoid NAO’s route at $i^{th}$ location and the goal, respectively.

$\alpha \!\left[H\!\left(i\right),H\!\left(i+1\right)\right]=\tan ^{-1}\!\left[\dfrac{Y_{hr}\!\left(i+1\right)-Y_{hr}\!\left(i\right)}{X_{hr}\!\left(i+1\right)-X_{hr}\!\left(i\right)}\right]$ , and $\alpha \!\left[H\!\left(i\right),\mathrm{T}\right]=\tan ^{-1}\!\left[\dfrac{Y_{t}-Y_{hr}\!\left(i\right)}{X_{t}-X_{hr}\!\left(i\right)}\right]$

(16)

\begin{equation} E_{ts}=\sum _{i=1}^{n}\left| \alpha \!\left[H\!\left(i\right),H\!\left(i+1\right)\right],\alpha \!\left[H\!\left(i\right),\mathrm{T}\right]\right| \end{equation}

The multi-objective activity is calculated as follows [Reference Ajeil, Ibraheem, Sahib and Humaidi31]:

(17)

\begin{equation} f\!\left(mof\right)=w_{1}\sum _{i=1}^{n}E_{spl}\!\left(i\right)+w_{2}\sum _{i=1}^{n}\sum _{j=1}^{m}E_{oa}\!\left(i\right)\left(j\right)+w_{3}\sum _{i=1}^{n}E_{ts}\!\left(i\right) \end{equation}

where the weight functions $w_{1}, w_{2}$ , and $w_{3}$ represent the relative importance of specific objectives. They must adhere to the following restrictions:

(18)

\begin{equation} w_{1}+w_{2}+w_{3}=1 \end{equation}

The multi-objective equations’ ultimate fitness is expressed as:

(19)

\begin{equation} f_{\textit{fitness}}=\frac{1}{f\!\left(mof\right)+\varepsilon } \end{equation}

where $\varepsilon$ is a small value that is inserted to ensure that it is not divided by zero. Because all factors must be reduced to produce the best solution, the multi-objective expression is a minimization task.

To achieve our objective, it is needed to create a genotype layout that could be easy for the MFO approach agents to interpret while yet preserving the necessary routing path for autonomously moving robot. As a result, as illustrated in Fig. 2, a simple cell layout has been employed for the global space. The parameters (length and width) of the global space are presumed to be available. The following hypotheses are used in this concept:

1. There is a layout of the space where navigation occurs. The exploration space’s length and width will be determined by the route organizer, who will then implement a cell layout to the space. As a result, the space is split into rows and columns.

The strategy of assuming that the quantity of rows equals the number of columns is used. The cell’s populated area represents the positions of recognized barriers.

2. The source and target of the planned robot’s motion are also specified in predefined dimensions.
3. The robot is able to travel on all viable cells, with its center moving down an imaginary path connecting the centers of one cell and the centers of adjacent cell.

A route in that domain is defined by a genotype [Reference Hosseinzadeh and Izadkhah32] having N genes, provided a navigable scenario defined by N rows. Every gene frequency correlates with a column reference inside that row, while each gene location correlates with a row reference. Let’s say there is chromosome 2,3,3,6,8,8. This genotype denotes a route that begins in row 1, column 2 (1,2) and finishes in row 6, column 8 (6,8). This route’s interim locations are (2,3), (3,3), (4,6), and (5,8). The passage through this route in world space is shown in Fig. 3(a), with point (1,1) presumed to be in the upper left corner.

Figure 2. Representation of global space example with cell system.

Figure 3. Representation of global space navigation: (a) row wise and (b) column wise.

The interim phases, or vertices, of a route are represented by the directional data of a chromosome. Moving the robot in a continuous trajectory from one cell’s center to another cell’s center, on the other hand, might result in the robot moving in a diagonal route over several neighboring cells. If any neighboring cells on the robot’s path through one row to another have a barrier, it will create issues. A better solution is to divide the diagonally route section into a horizontally and a vertically portion, allowing the robot to navigate around barriers. As a result, an orientation information to the chromosomal architecture has been added to signify the robot’s initial turning as it moves to another vertex. The genotype architecture has been enhanced by introducing a guidance bit to every chromosome: 0 indicates column-wise configuration and 1 indicates row-wise configuration. Each locus, on the other hand, correlates with a column reference on a column-wise chromosome, whereas every gene correlates with a row reference. The movement for the column-wise chromosome 4,4,2,4,6,5,3,3 is shown in Fig. 3(b). This genotype shows a route that begins in row 4, column 1 (4,1) and continues to row 3, column 8 (3,8) via intermediary locations (4,2), (2,3), (4,4), (6,5), (5,6), and (6,3).

The process has one limitation that the genetic individual representation disallows moving back simultaneously in both rows and columns. The proposed optimization technique’s parameters are the number of search agents (50), number of features (400), number of iterations (200), and dimension (400).

3. Reinforcement learning

RL is a learning approach implemented in unknown environments for assessment and improvisation of the agent’s learning skills when an incomplete output data set is provided. The approach is based on the development of a continuous space scenario into a discrete space scenario by the implementation of sampling criteria used in the MDP and the Q-learning algorithm. This random sampling of the given environment occurs unless the final target is attained. The approach considers obtaining a specific Q-value from a specific action and moving on from the current state to a newer state. The obtained Q-value is compared to the maximum Q-value desired by carrying out value iterations. The policy network is the major deciding criteria, which determines the course of action taken by the agent in the current state. Based on evaluating the Q-value received from a certain iteration, the network either gives a reward or imposes a penalty on the agent’s action, which governs its future actions.

The penalty imposed on the actions reduces the probability of a similar action occurring in the future, thus, filtering out the action from the agent’s behavior pattern. Similarly, a rewarded action has a greater probability of occurrence in the future in similar states. The agent has two primary behaviors: exploration and exploitation. Exploration refers to assessing the immediate surroundings by the agent’s inquisitive behavior and drawing out a certain conclusion, while exploitation is the calculation of optimum action based on provided information to generate a suitable path for locomotion in the environment. Q-learning is based on the agent’s exploration behavior in the sample space and discovers optimum solutions that are not provided in the input data set. The schematic diagram for the RL approach has been presented in Fig. 4.

Figure 4. Schematic diagram of reinforcement learning approach [Reference Alfaverh, Denaï and Sun33].

3.1. State representation and reward function

The state comprises of ultrasonic sensors data that include range statistics from the working region, its forward speed, and the respective ranges of x and y from the robot to the goal location:

(20)

\begin{equation} z=z_{\textit{ultrasonic}}+z_{t\arg et}+z_{\textit{velocity}} \end{equation}

where $z_{\textit{ultrasonic}}$ data is ultrasonic sensor information that measures range and recognizes obstacles to identify the interaction between barriers and the robot. Additionally, the ultrasonic information has been utilized to explicitly estimate the object’s motion position and velocity over three step. The comparative distance between the robot and the goal point is $z_{t\arg et}$ , and moving orientation may be efficiently determined by’s unambiguous data on how the moving route is correct. The robot’s speed data is displayed via $z_{\textit{velocity}}$ . Furthermore, $z_{\textit{velocity}}$ can be used to determine the robot’s maximum velocity and inertia, as well as the robot’s evasion technique based on velocity.

The robot’s objective in this investigation would have been to achieve the goal point $\left(X_{t},Y_{t}\right)$ . Traveling with collision avoidance using RL and staying within the efficiency limit is required to achieve the goal location. As a result, the reward has also been taken into account independently. The combination of following two reward functions is considered:

(21)

\begin{equation} R_{t}=R_{G}+R_{CA} \end{equation}

The system earns a huge reward of 20 if the robot achieves the goal location. Furthermore, if the range to the goal grows shorter than previously while advancing to the goal location, the robot is rewarded as it moves in the correct route:

(22)

\begin{equation} R_{G}=\left\{\begin{array}{c@{\quad}c} 20 & if\,p_{c}\lt 0.5\\[4pt] p_{p}-p_{c} & \textit{otherwise} \end{array}\right. \end{equation}

where $p_{c}$ is the current position of robot and $p_{p}$ is the previous position of robot.

(23)

\begin{equation} p=\sqrt{\left(Y_{Hr}\!\left(i\right)-Y_{t}\!\left(j\right)\right)^{2}+\left(X_{Hr}\!\left(i\right)-X_{t}\!\left(j\right)\right)^{2}} \end{equation}

Walking in the reverse way from the desired location results in a penalty equal to the distance traveled in one step, while traveling in the other direction results in a reward equal to the distance traveled in one step.

Whenever the robot collides with a barrier, the reward $R_{CA}$ applies a severe penalty of –20, which is a condition that the robot must be trained to escape:

(24)

\begin{equation} R_{CA}=\left\{\begin{array}{c@{\quad}c} -20 & if\ {collisions}\ {occur}\\[4pt] 0 & {otherwise} \end{array}\right. \end{equation}

The random sampling carried out in the Q-learning approach is economical as it significantly reduces the number of grids to be computed for the generation of an optimum path. After the sampling, paths that interfere with the obstacles are removed, and the remaining connections are considered for the next stage. The prime purpose is to generate a smaller and smoother path based on the current location according to the given state of the function. The Q-learning approach is based on the existence of a Q-function $Q\!\left(p_{t},q_{t}\right)$ described as:

(25)

\begin{equation} Q\!\left(p_{t},q_{t}\right)=\max \!\left(R_{t+1}\right) \end{equation}

where $R_{t}$ is the total future reward from any time point $t$ and is defined as:

(26)

\begin{equation} R_{t}=r_{t}+wr_{t+1}+w^{2}r_{t+2}+…..+w^{n-t}r_{n} \end{equation}

And $p_{t}$ is the state at time step $t$ and $q_{t}$ is the action at time step $t$ . Substituting the value of $R_{t+1}$ in Eq. (25) yields the equation as:

(27)

\begin{equation} Q\!\left(p_{t},q_{t}\right)=w+\left[\delta^{*}\left\{\max Q\!\left(p_{t+1},q_{t+1}\right)\right\}\right] \end{equation}

where w is the discount factor (0< $w$ <1)

The action value function is denoted as $Q\!\left(p_{t},q_{t}\right)$ which is improvised by the following given rule:

(28)

\begin{equation} Q\!\left(p_{t+1},q_{t+1}\right) = Q\!\left(p_{t},q_{t}\right) + \beta^{*}\left[a_{t}+\delta^{*}\left\{\max Q\!\left(p_{t+1},q_{t+1}\right)\right\}-Q\!\left(p_{t},q_{t}\right)\right] \end{equation}

where $\beta$ is the learning rate (0< $\beta$ <1), which decides the learning convergence rate, and $\delta$ is the discount rate (0< $\delta$ <1), which determines the relative ratios between awards at two consecutive states.

The steps for carrying out the Q-learning process in path planning are as follows:

1. The employed humanoid robot evaluates the input data provided to it and decides upon exploration or exploitation of the environment.
2. An action is determined by the humanoid robot based on the policy network by the usage of policy gradients.
3. The action is carried out in the given environment.
4. Based on the action undertaken, the agent receives an award or penalty as a reinforcement and a new state is formed.
5. If $Q\!\left(q_{t},p_{t}\right)$ < $Q_{\max }\!\left(q_{t,}p_{t}\right)$ , reiterate the process or else, stop the iteration and feed the final data to the path planning approach.

With the increase in the number of iterations, if the Q-value also increases, then the path planning approach is in a learning state. As the value of Q reaches an optimum value, the iterations stop, and the stable state is reached. This data is fed to the path planning approach as the optimized path is obtained. The RL algorithm has been presented in Fig. 5.

Figure 5. Flow chart of reinforcement learning approach in humanoid robot navigation.

3.2. Collision detection problem

When humanoid robot navigates in a global space to complete tasks, it utilizes its own gadgets to evaluate if the range between robot and obstacle is smaller than the threshold distance $T_{r}$ , and subsequently if there is a risk of collision. If a collision is possible, the humanoid robot must execute a collision prevention strategy to maintain safe navigation. Due to the limitations of movement of humanoid robots, they have a safety zone, and whenever an obstacle enters in its safety zone, avoiding movements are required. As a result, an avoidance movement is necessary before any obstacle enter the robot’s security zone.

The detection zone for the humanoid robot is a part of circular region with threshold radius $T_{safe}$ in its present location $\left(x_{r,}y_{r}\right)$ , as depicted in Fig. 6. In the safe region, the humanoid robot will move safely, if there is no obstacle, else it has high risk of collision.

Figure 6. Representation of safety range of the robot and detection of static obstacle.

In the Fig. 6, x _p oy _p is the relative coordinate system and XOY is the absolute coordinate system, with the location of humanoid robot taken as the origin. The safe range of the robot is denoted by $T_{safe}$ and the threshold range is represented as $T_{r}$ . The humanoid robot in the present location $\left(x_{r,}y_{r}\right)$ to identify the range between the barriers and the humanoid robot is being $R$ . Once the sensor radar $T_{r}\lt R$ demonstrates that the humanoid robot has identified the barriers, and a possibility collision prevention operation decision should be considered. If $R\lt T_{safe}$ is not present, it signifies that the barrier has reached the robots’s safe area and cannot be efficiently avoided.

4. Petri-Net controller for dynamic obstacle avoidance

Petri-Net, consisting of places, transitions and arcs, was first used for modeling of systems in 1981 by Peterson [Reference Peterson34]. A Petri-Net setup is often characterized by marks known as a token, which creates a configuration called marking. The enabling of a certain token in a Petri-Net is termed as firing, which leads to the consumption of the required input token and the creation of an output token. Unless defined, the firing order in a Petri-Net is often nondeterministic, that is, in any random order. The transfer of tokens from one state to another occurs via a transition move between them.

In the scenario of obstacle avoidance among humanoid robots, where they act as dynamic obstacles to each other, Petri-Net is used for the employment of supervised firing order. The property of concurrent firing of humanoid robots is utilized in this controller. The proposed Petri-Net network for obstacle avoidance is shown in Fig. 7. The configuration net consists of six places and eight possible transitions. It shows the initial position of robot using a circular black circle and transition is denoted by black bar.

The steps for working of the Petri-Net controller in a humanoid robot navigation system are as follows:

Step 1: Initially, the token is enabled at state 1, which is the state of the humanoid robot’s initialization. The humanoid robot is at its initial position waiting for the signal to start the task.
Step 2: As the first transition occurs, the token is transferred to state 2, which is the state of navigation of the humanoid in the cluttered environment while avoiding static obstacles by following the path decided by the proposed path planning approach. The robot navigates in an unknown environment unless it comes in proximity of another humanoid robot.
Step 3: The proximity issue leads to the transition of the token to state 3, where the humanoid robot comes into conflict scenario which another robot, hence, creating dynamic obstacles in its pathway.
Step 4: The token then undergoes another transition from state 3 to state 4, which involves the implementation of negotiation strategy among the humanoid based on various deciding parameters and assignment of priority to the preferred robot. The priority parameters often prioritize the robot, which is at a lesser distance from its goal and has lesser obstacles left in its path to overcome.
Step 5: The token thereafter transfers to state 5, where the situation is similar to state 2, that is, navigation in the unknown environment consisting of static environment and treading of the optimal path decided by the path planning approach. The robot which has been prioritized earlier undertakes this task and moves forward, whereas the less priority robot’s state is transferred to state 6.
Step 6: State 6 involves waiting for the lesser prioritized robot in its original place till the superior robot leaves its proximity zone. Once the higher priority robot clears out of its proximity zone, the state of the lesser prioritized humanoid robot transitions to state 2.
Step 7: In case a third robot enters the conflict zone in between, it is assigned as state 6.

Figure 7. Petri-Net framework for multi-humanoid robot navigation system.

This proposed configuration net helps in the smooth completion of each humanoid robots’ task without causing any conflict between them.

5. Hybridization of MFO and RL (hybrid MFO-RL) controller

The proposed path planning approach in the hybrid controller is a combination of the bio-inspired MFO approach and the RL approach. From the literature review section, it is evident that various classical approaches have been earlier used for the path-planning of the humanoid robot, which is usually faster but less accurate while convergence to final solutions. On the other hand, AI approaches are precise but are less efficient in providing optimum solutions at a steady rate. Furthermore, usage of a single approach often results in the entrapment of the humanoid robot in the local minima, which presses the need for another approach to remove this complication. For overcoming this problem, we have utilized the culmination of a classical approach to strengthening the benefits of a metaheuristic approach for efficient path planning.

In the proposed hybrid controller, input parameters of the humanoid robot’s surroundings are fed to the sensor system implanted on the humanoid robot’s body. The sensor system forwards this information to the navigation system, which is controlled by the MFO controller. The MFO controller targets the obstacles and the final goal and designs a proper path for the navigation of the humanoid robot. In case of conflict with an obstacle, the RL controller is activated. This data further fed to the RL controller helps further refine the optimal angles during turning, as shown in Fig. 8. This process continues till the humanoid robot reaches the final goal. The hybrid controller works in a similar fashion as the feedback loop mechanism to obtain optimum results.

Figure 8. Block diagram of proposed hybrid MFO-RL controller.

The steps of the MFO-RL hybrid controller for humanoid robot navigation are listed below:

1. The start and end positions of the humanoid robot are intialized.
2. The MFO controller is activated, which decides the obstacles’ fitness function and the target and sketches out an optimal path for the robot.
3. The robot is navigated freely through the cluttered environment until it encounters an obstacle.
4. MFO controller has activated again. In case of any obstacle, input parameters (left obstacle distance, right obstacle distance, and front obstacle distance) are fed to the robot and initial turning angle is determined to avoid getting stuck in local minima, the reward significance for every particular optimum approach as well as the global best location in the trajectory planning process is computed and recorded.
5. RL controller is activated, and final turning angle is determined by improvisation using the RL approach. It is utilized to determine the cumulative reward depending on all global best locations and the reward value of particular optimum solution.
6. The ideal particle location and searching vector of the MFO approach, as well as the fitness of every individual’s optimum response and the global optimum position, are modified.

The mechanism of correction is as follows:
\begin{equation*} \vec{G}\!\left(x\right)=G\!\left(x\right)-\pi \end{equation*}

For $j^{th}$ particle, $g\!\left(x_{j}^{j+1}\right)=g\!\left(x_{i}\right)-\pi \!\left(i\right)$
7. If there comes a situation where, multi-humanoid robots (dynamic obstacle) come at a single coordinate rise the situation of conflict. In this situation, Petri-Net comes into play and priotize the robot which can reach the target quickest (as described in Fig. 7).
8. Final turning angle is implemented to steer the humanoid out of the conflict zone.
9. Once out of the conflict zone, MFO is activated to sketch out the humanoid’s new optimal path.
10. Steps 3–9 are repeated till the humanoid robot reaches the end position.

The optimum route is chosen using the optimal route assessment variable system, with the predefined weight limit variable as the most important criteria.

The RL method will be used to train the MFO approach in this section. The adjustment parameter generated from the total reward, which is called as hybrid MFO-RL approach, maintains the optimum particles and global ideal location fitness of the MFO. Then, using the variables in the optimal route assessment mechanism, this method is implemented to the trajectory-tracking of autonomous navigation robot and an effective optimum route is established. The trajectory planning for the robot is done by using following steps.

The hybrid MFO-RL approach has been developed. Further, it has been tested against few standalone approaches with respect to optimal travel length in number on interations. Five algorithms (genetic algorithm, particle swarm optimization, MFO, RL, and hybrid MFO-RL approach) are tested with reference to travel length. As presented in Fig. 9, graph shows that the hybrid MFO-RL approach provides optimal travel length and too in the least number of iterations. Therefore, further research has been proceeded by taking hybrid MFO-RL approach in consideration.

Figure 9. Representation of convergence curve between standalone algorithms and hybrid MFO-RL approach.

6. Simulation and experimental results

6.1. Robotic platform

The robotic platform used in the given study is NAO V4 developed by Aldebaran Robotics of France. It is a compact, automated, and configurable robot with 25 degrees of freedom and weighs about 5.2 kg. Various specifications of the robot are given in Table I.

Table I. Specification of robotic platform.

Figure 10. Simulation result of navigation of humanoid robot using hybrid MFO-RL approach for single robot.

6.2. Path-planning of a single-humanoid robot using hybrid MFO-RL approach

For the justification of the efficiency of the proposed controller, it is employed for the navigational purpose of a single-humanoid robot in a cluttered environment. The controller is employed in simulated and real NAO. WEBOT, a 3D simulator, is utilized first to test the developed controller. Further, it is verified in real NAO. The comparison of outcomes from simulation and experiment are performed based on navigational parameters. The navigational parameters are recorded from the interface in WEBOT and using measuring instruments like measuring tape and stopwatch for recording the parameters. In Fig. 10, the robot is placed at its initial Start position and proceeds toward the first checkpoint Target T1. Henceforth, it traverses from Target 1 to Target 2. Similar environment conditions are being created for real-time experiments, as shown in Fig. 11. The hybrid model works according to the steps as directed earlier. The path length and time taken are directly measured through the WEBOT interface in simulation. While, in experimental demonstration, measuring tape and stopwatch are used to measure path length and travel time.

A tabular data of path length and time taken by the humanoid robot during simulation and experiment show a minimal deviation of about 5% between them as shown in Table II. Hence, it justifies the usage of the proposed hybrid approach. Figures 10 and 11 show that the robot reaches both targets conveniently in simulation and experimental setups.

Table II. Comparison of travel length (Tl) and travel time (Tt) for simulations and experiments using hybrid MFO-RL approach for single robot.

Figure 11. Experimental result of navigation of humanoid robot using hybrid MFO-RL approach for single robot.

The deviation of 5% indicates that the result from the simulation is validated and demonstrates the robustness of the proposed controller. The deviation is due to some external disturbance in experimental demonstration that are absent in simulation.

Figure 12. Simulation result of navigation of humanoid robot using hybrid MFO-RL approach for two robots.

Figure 13. Experimental result of navigation of humanoid robot using hybrid MFO-RL approach for two robots.

6.3. Path-planning of multi-humanoid robot using hybrid MFO-RL approach

For the acceptance of the proposed hybrid path planning controller, the approach is used as the prime navigational strategy in the path-planning of two humanoid robots N1 and N2, in an unknown environment. The environment is modeled with various static obstacles, and similar conditions are created for both simulation and real-time experiments, as shown in Figs. 12 and 13. The robots are placed at their initial start positions S1 and S2 and are navigated through the environment to Target T1 and Target T2, respectively. It is observed that although the hybrid approach proves its worth in static obstacle clearance, it reaches a stalemate in the scenario of a dynamic obstacle.

Thus, the hybrid approach combined Petri-Net controller is encoded in the humanoid robot for dynamic obstacle clearance. On entering the proximity zone of each other, the robots activate the Petri-Net controller, which prioritizes the robots for further navigation based on a given set of rules. The data values about the path length traversed and time taken to complete the two humanoid robots’ tasks are recorded in tabular form using a similar method discussed in the previous section. Further analysis confirms that the simulation and experimental values agree with each other, with a deviation of about 6%, which confirms the hybrid approach’s robustness and feasibility in multi-humanoid navigation. Tables III and IV show the travel length and time taken obtained through these simulations and experimental results, respectively.

Table III. Comparison of travel length for simulations and experiments using hybrid MFO-RL approach for two robots.

Table IV. Comparison of travel time for simulations and experiments using hybrid MFO-RL approach for two robots.

Figure 14. Simulation result of navigation of humanoid robot using hybrid MFO-RL approach for a dynamic environment.

Figure 15. Experimental result of navigation of humanoid robot using hybrid MFO-RL approach for a dynamic environment.

The proposed controller has been examined in the terrain having a single robot and multiple robots. In the multiple robots situation, one robot act as a dynamic obstacle to the other. But in this situation, the programmers do not know which will act as the robot and which will act as a dynamic obstacle. Therefore, a different scenario has been taken where one robot (Knepra III with one obstacle mounted on it) will act as a dynamic obstacle. The proposed controller has been checked in both simulation and experimental scenarios, as shown in Fig. 14 and 15, respectively. The situation shows that the dynamic obstacle moves in its path and the humanoid robot need to change its direction to find a safe path to the target. The results displated in tabular form (Table V) demonstrate that the simulation and experimental results are in good relation with each other with deviation under 5% for both travel length and travel time.

Table V. Comparison of travel length (Tl) and travel time (Tt) for simulations and experiments using hybrid MFO-RL approach for a dynamic environment.

Figure 16. Comparison of path adaptation of humanoid robot using (a) proposed hybrid MFO-RL approach against the (b) existing approach [Reference Lee, Myung and Kim35].

7. Comparison

The proposed hybrid MFO-RL controller, in culmination with the Petri-Net controller, is configured into the humanoid robot NAO. The results obtained in lieu of the simulations and real-time experiments show the proposed hybrid approach’s efficacy. To analyze the effectiveness and profundity of the path planning approach, it is weighed up against previous works in path planning research. The paper used in lieu of comparison is the online multi-objective evolutionary approach for navigation of humanoid robots [Reference Lee, Myung and Kim35]. A similar environmental scenario has been created to check the efficiency of the proposed controller. The previously used method and the proposed hybrid approach has been tested on, as shown in Fig. 16. The results obtained show the successful clearance of the obstacles and the attainment of the final goal by both strategies.

However, the data report’s detailed analysis shows that the proposed hybrid approach edges over the past-established method by 9.8% in terms of path length traversed between the initial and final position. The results ensue in lesser computational complexity, making the proposed controller economical. These comparisons prove the method to be highly feasible and robust while using humanoid robot path planning in an unknown environment. Table VI shows the deviation between the two path planning strategies.

Table VI. Comparison of previously developed approach [Reference Lee, Myung and Kim35] and proposed hybrid MFO-RL approach based on path length.

Table VII. Comparison of previously developed approach [Reference Ali, Gong, Wang and Dai36] and proposed hybrid MFO-RL approach based on path length.

The proposed approach has been evaluated with reference to improved ACO and MDP [Reference Ali, Gong, Wang and Dai36] in regard to path length. Just like the previous comparison, utmost care has been taken to replicate the environment. The process has been started by feeding the proposed controller to the robot. It finds its path and reaches the target by avoiding obstacles in its path. Figure 15 (b) shows the path adaption by robot using the improved ACO and MDP approach. The proposed approach is compared in the same environment and presented in Fig. 17(a). The comparison has been made also in tabular form and presented in Table VII. It shows that the overall improvement of path length can be obtained, that is, about 11% to the path length obtained by previously improved ACO and MDP approach.

Figure 17. Comparison of path adaptation of humanoid robot using (a) proposed hybrid MFO-RL approach against the (b) existing approach [Reference Ali, Gong, Wang and Dai36].

8. Conclusion

Path planning approach using a hybrid MFO-RL controller is successfully applied and tested on single- and multi-humanoid robot navigation in an unknown environment. A total of 150 epochs were carried out to optimize the training weights in the given approach used for finding out the optimal turning angle of the humanoid robot. The reward function is used to optimize the trajectory selection by each moth. For the path-planning of multi-humanoid robot system in an unknown environment, a Petri-Net controller is configured alongside the MFO-RL controller to prevent deadlock situations during navigation in the terrain. This is accomplished by prioritization among the humanoid robots using the Petri-Net controller. For the analysis of the applied approach’s robustness and effectiveness, various factors like the easiness of obstacle avoidance, turning angle, clearance, and task completion time are considered. The hybrid controller is then assessed based on path length with reference to other standalone approaches that displays the superiority of the hybrid MFO-RL approach. The simulation and experimental results in both single- and multi-humanoid robot navigation yield a deviation of around 6%, which is acceptable by common standards. In the case of single-humanoid robot navigation, comparison with a previously established approach demonstrates the superiority of the proposed approach by over 9% reduction in path length covered. Comparison with one more environment has been done based on the path length. In that environment also, the proposed controller comes out as a clear winner. It justifies that the proposed hybrid controller is efficient and feasible for the path-planning of single- and multi-humanoid robots in an unknown environment having multiple targets. The proposed approach can be improvised in the future by combining the proposed controller with a classical path planning approach for navigation over uneven terrains.

Authors contributions

Abhishek Kumar Kashyap: Conceptualization, Methodology, Software, Investigation, Software, and Validation. Dayal R. Parhi: Supervision, and Writing – Reviewing and Editing. Vikas Kumar: Writing – Original draft preparation and Visualization.

Financial support

The authors do not have any financial support

Conflict of interest

The authors declare that they have no conflicts of interest.

Ethical contributions

This article does not contain any studies with human participants or animals performed by any of the authors.

References

Berquin, Y. and Zell, A., “A physics perspective on lidar data assimilation for mobile robots,” Robotica 40(4), 862–887 (2022).CrossRef Google Scholar

Wu, S., Du, Y. and Zhang, Y., “Mobile robot path planning based on a generalized wavefront algorithm,” Math. Probl. Eng. 1(2), 2020–2012 (2020).Google Scholar

Liang, Y. and Xu, L., “Global Path Planning for Mobile Robot Based Genetic Algorithm and Modified Simulated Annealing Algorithm,” In: Proceedings of the First ACM/SIGEVO Summit on Genetic and Evolutionary Computation (2009) pp. 303–308.Google Scholar

Zhu, Z., Wang, F., He, S. and Sun, Y., “Global path planning of mobile robots using a memetic algorithm,” Int. J. Syst. Sci. 46(11), 1982–1993 (2015).CrossRef Google Scholar

Botzheim, J., Toda, Y. and Kubota, N., “Bacterial memetic algorithm for offline path planning of mobile robots,” Memetic Comput. 4(1), 73–86 (2012).Google Scholar

Liang, X. D., Li, L. Y., Wu, J. G. and Chen, H. N., “Mobile robot path planning based on adaptive bacterial foraging algorithm,” J. Cent. South Univ. 20(12), 3391–3400 (2013).CrossRef Google Scholar

Chen, M.-Y., Wu, Y.-J. and He, H., “A novel navigation system for an autonomous mobile robot in an uncertain environment,” Robotica 40(3), 421–446 (2022).CrossRef Google Scholar

Gao, M. and Tian, J., “Path Planning for Mobile Robot Based on Improved Simulated Annealing Artificial Neural Network,” In: Third International Conference on Natural Computation (ICNC) , vol. 3 (IEEE, 2007) pp. 8–12.CrossRef Google Scholar

Jun, H. and Qingbao, Z., “Multi-objective Mobile Robot Path Planning Based on Improved Genetic Algorithm,” In: International Conference on Intelligent Computation Technology and Automation , vol. 2 (IEEE, 2010) pp. 752–756.CrossRef Google Scholar

Yue, H. and Wang, Z.-M., “Path Planning of Mobile Robot Based on Compound Shape and Simulated Annealing Hybrid Algorithm,” In: IEEE International Conference on Robotics and Biomimetics-ROBIO (IEEE, 2005) pp. 186–189.Google Scholar

Janabi-Sharifi, F. and Vinke, D., “Integration of the Artificial Potential Field Approach with Simulated Annealing for Robot Path Planning,” In: Proceedings of 8th IEEE International Symposium on Intelligent Control (IEEE, 1993) pp. 536–541.Google Scholar

Martınez-Alfaro, H. and Gomez-Garcıa, S., “Mobile robot path planning and tracking using simulated annealing and fuzzy logic control,” Expert Syst. Appl. 15(3-4), 421–429 (1998).CrossRef Google Scholar

Pandey, K. K. and Parhi, D. R., “Trajectory planning and the target search by the mobile robot in an environment using a behavior-based neural network approach,” Robotica 38(9), 1627–1641 (2020).CrossRef Google Scholar

Ganganath, N. and Cheng, C.-T., “A 2-dimensional ACO-based Path Planner for Off-Line Robot Path Planning,” In: International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (IEEE, 2013) pp. 302–307.CrossRef Google Scholar

Chang, L., Shan, L., Jiang, C. and Dai, Y., “Reinforcement based mobile robot path planning with improved dynamic window approach in unknown environment,” Auton. Robots 45(1), 51–76 (2021).CrossRef Google Scholar

Wei, Y. and Zhao, J., “Designing Human-like behaviors for anthropomorphic arm in humanoid robot NAO,” Robotica 38(7), 1205–1226 (2020).CrossRef Google Scholar

Kusuma, M., Riyanto, , Machbub, C., “Humanoid Robot Path Planning and Rerouting Using A-Star Search Algorithm,” In: Proceedings - 2019 IEEE International Conference on Signals and Systems, ICSigSys 2019 (IEEE, 2019) pp. 110–115.Google Scholar

Sabe, K., Fukuchi, M., Gutmann, J.-S., Ohashi, T., Kawamoto, K. and Yoshigahara, T., “Obstacle Avoidance and Path Planning for Humanoid Robots Using Stereo Vision,” In: IEEE International Conference on Robotics and Automation, Proceedings. ICRA 2004 (IEEE, 2004) pp. 592–597.CrossRef Google Scholar

Huang, W., Kim, J. and Atkeson, C. G., “Energy-based Optimal Step Planning for Humanoids,” In: IEEE International Conference on Robotics and Automation (IEEE, 2013) pp. 3124–3129.Google Scholar

Lee, M., Heo, Y., Park, J., Yang, H. D., Jang, H. D., Benz, P., Park, H., Kweon, I. S. and Oh, J. H., “Fast Perception, Planning, and Execution for a Robotic Butler: Wheeled Humanoid M-Hubo,” In: IEEE International Conference on Intelligent Robots and Systems (2019) pp. 5444–5451.Google Scholar

Lagaza, K. P., Kashyap, A. K. and Pandey, A., “Spider Monkey Optimization Algorithm Based Collision-Free Navigation and Path Optimization for A Mobile Robot in the Static Environment,” In: Advances in Mechanical Engineering (2020) pp. 1459–1473.Google Scholar

Jalali, S. M. J., Hedjam, R., Khosravi, A., Heidari, A. A., Mirjalili, S. and Nahavandi, S., “Autonomous robot navigation using Moth-Flame-Based neuroevolution,” In: Evolutionary Machine Learning Techniques (Springer 2020) pp. 67–83.Google Scholar

Abdullah, A., Rashid, M. F. F. A., Ponnambalam, S. G. and Ghazalli, Z., “Energy efficient modeling and optimization for assembly sequence planning using moth flame optimization,” Assem. Autom 39(2), 356–368 (2019).CrossRef Google Scholar

Mehne, S. H. H. and Mirjalili, S., “Moth-Flame optimization algorithm: theory, literature review, and application in optimal nonlinear feedback control design,” Nat.-Inspir. Optim. 811, 143–166 (2020).Google Scholar

Elaziz, M. A., Ewees, A. A., Ibrahim, R. A. and Lu, S., “Opposition-based moth-flame optimization improved by differential evolution for feature selection,” Math. Comput. Simul. 168(4), 48–75 (2020).CrossRef Google Scholar

Gao, P., Liu, Z., Wu, Z. and Wang, D., “A Global Path Planning Algorithm for Robots Using Reinforcement Learning,” In: IEEE International Conference on Robotics and Biomimetics (ROBIO) (IEEE, 2019) pp. 1693–1698.CrossRef Google Scholar

Fakoor, M., Kosari, A. and Jafarzadeh, M., “Humanoid robot path planning with fuzzy Markov decision processes,” J. Appl. Res. Technol 14(5), 300–310 (2016).CrossRef Google Scholar

Trinh, L. A., Ekström, M. and Cürüklü, B., “Petri Net Based Navigation Planning with Dipole Field and Dynamic Window Approach for Collision Avoidance,” In: 6th International Conference on Control, Decision and Information Technologies (CoDIT) (IEEE, 2019) pp. 1013–1018.CrossRef Google Scholar

Parhi, D. R. and Mohanta, J. C., “Navigational control of several mobile robotic agents using Petri-potential-fuzzy hybrid controller,” Appl. Soft Comput. J 11(4), 3546–3557 (2011).CrossRef Google Scholar

Kumar, P. B., Muni, M. K. and Parhi, D. R., “Navigational analysis of multiple humanoids using a hybrid regression-fuzzy logic control approach in complex terrains,” Appl. Soft Comput. 89(3), 106088 (2020).CrossRef Google Scholar

Ajeil, F. H., Ibraheem, I. K., Sahib, M. A. and Humaidi, A. J., “Multi-objective path planning of an autonomous mobile robot using hybrid PSO-MFB optimization algorithm,” Appl. Soft Comput. 89(4), 106076 (2020).CrossRef Google Scholar

Hosseinzadeh, A. and Izadkhah, H., “Evolutionary approach for mobile robot path planning in complex environment,” IJCSI Int. J. Comput. Sci. Issues 7(8), 1–9 (2010).Google Scholar

Alfaverh, F., Denaï, M. and Sun, Y., “Demand response strategy based on reinforcement learning and fuzzy reasoning for home energy management,” IEEE Access 8, 39310–39321 (2020).CrossRef Google Scholar

Peterson, J. L., “Petri nets,” ACM Comput. Surv. 9(3), 223–252 (1977).CrossRef Google Scholar

Lee, K. B., Myung, H. and Kim, J. H., “Online multiobjective evolutionary approach for navigation of humanoid robots,” IEEE Trans. Ind. Electron. 62(9), 5586–5597 (2015).CrossRef Google Scholar

Ali, H., Gong, D., Wang, M. and Dai, X., “Path planning of mobile robot with improved ant colony algorithm and MDP to produce smooth trajectory in Grid-Based environment,” Front. Neurorobot. 14(July), 1–13 (2020).CrossRef Google Scholar PubMed

Figure 1. Flow chart of MFO approach in humanoid robot navigation.

Figure 2. Representation of global space example with cell system.

Figure 3. Representation of global space navigation: (a) row wise and (b) column wise.

Figure 4. Schematic diagram of reinforcement learning approach [33].

Figure 5. Flow chart of reinforcement learning approach in humanoid robot navigation.

Figure 6. Representation of safety range of the robot and detection of static obstacle.

Figure 7. Petri-Net framework for multi-humanoid robot navigation system.

Figure 8. Block diagram of proposed hybrid MFO-RL controller.

Figure 9. Representation of convergence curve between standalone algorithms and hybrid MFO-RL approach.

Table I. Specification of robotic platform.

Figure 10. Simulation result of navigation of humanoid robot using hybrid MFO-RL approach for single robot.

Table II. Comparison of travel length (Tl) and travel time (Tt) for simulations and experiments using hybrid MFO-RL approach for single robot.

Figure 11. Experimental result of navigation of humanoid robot using hybrid MFO-RL approach for single robot.

Figure 12. Simulation result of navigation of humanoid robot using hybrid MFO-RL approach for two robots.

Figure 13. Experimental result of navigation of humanoid robot using hybrid MFO-RL approach for two robots.

Table III. Comparison of travel length for simulations and experiments using hybrid MFO-RL approach for two robots.

Table IV. Comparison of travel time for simulations and experiments using hybrid MFO-RL approach for two robots.

Figure 14. Simulation result of navigation of humanoid robot using hybrid MFO-RL approach for a dynamic environment.

Figure 15. Experimental result of navigation of humanoid robot using hybrid MFO-RL approach for a dynamic environment.

Table V. Comparison of travel length (Tl) and travel time (Tt) for simulations and experiments using hybrid MFO-RL approach for a dynamic environment.

Figure 16. Comparison of path adaptation of humanoid robot using (a) proposed hybrid MFO-RL approach against the (b) existing approach [35].

Table VI. Comparison of previously developed approach [35] and proposed hybrid MFO-RL approach based on path length.

Table VII. Comparison of previously developed approach [36] and proposed hybrid MFO-RL approach based on path length.

Figure 17. Comparison of path adaptation of humanoid robot using (a) proposed hybrid MFO-RL approach against the (b) existing approach [36].

Article contents

Navigation for multi-humanoid using MFO-aided reinforcement learning approach

Abstract

Keywords

1. Introduction

2. Moth–flame optimization

2.1. Description of the optimal navigational fitness function

2.1.1. Minimum route length

2.1.2. Barrier evasion

2.1.3. Soothing of route

3. Reinforcement learning

3.1. State representation and reward function

3.2. Collision detection problem

4. Petri-Net controller for dynamic obstacle avoidance

5. Hybridization of MFO and RL (hybrid MFO-RL) controller

6. Simulation and experimental results

6.1. Robotic platform

6.2. Path-planning of a single-humanoid robot using hybrid MFO-RL approach

6.3. Path-planning of multi-humanoid robot using hybrid MFO-RL approach

7. Comparison

8. Conclusion

Authors contributions

Financial support

Conflict of interest

Ethical contributions

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests