OMORTUNDEY Freelance Multi-Level Ventures Blog World: Human-tracking system using quadrotors and multiple environmental

Creative Commons Attribution 4.0 License

Figure 2. Calculation of camera’s reduced span.

The rotation also has some effects on reducing the depth range of the FOV. By small-angle approximation, the reduction in this dimension can be approximated as

Δ D = D tan α_{c} tan δ \approx δ D tan α_{c}

The larger α_c has larger effects on ΔD, therefore, the horizontal angle α_c,h is selected.
The error in the Kinect’s roll component is complex and therefore approximated as an additional reduction of the horizontal and vertical viewing angles. However, error in roll has no effects of reducing FOV’s depth.
In the final stage, the translational errors Δ are applied by additionally removing the FOV along x, y, z axes.

Possible placement locations

In reality, cameras cannot be placed without constraints. It is possible to place a camera 2 m above the floor in the center of the room, but some supporting structure, such as a tripod, would be necessary, which can obstruct people and therefore is undesirable.
Generally, the possible and suitable locations to place the cameras are usually along the edge of the room to increase the coverage as well as to avoid obstruction in the space. This can reduce the complexity of optimizing the location of the cameras. Instead of searching through 3-D space xyz, searching on the xy-plane can be reduced to searching along those edges, reducing one dimension. Moreover, if the height of the camera placement is fixed by the height of the existing structure, z is also restricted, so the 3-D search problem is now reduced to only 1-D.

Camera placement optimization algorithm

In order to obtain the optimal, or nearly optimal, solution, a camera is added one-by-one to minimize the uncovered area. After the addition, the configurations of all cameras are adjusted around the current configurations to also minimize the uncovered area. The two steps are repeated until the whole area is covered. After that, the final optimization is done to maximize the coverage. Each camera is readjusted one-by-one to maximize the sum of the number of cameras covering each point while keeping every point covered. All optimization processes are done by PSO²² following previous works,^19,21 using a Python library pyswarm.²³ The process’s flowchart is illustrated in Figure 3.

Figure 3. Flowchart of camera’s location design process.

Particle’s state

PSO requires definition of the state of each particle. Each variable in the state is the value which can be adjusted to optimize the objective function.
In this coverage optimization problem, a state of each camera is defined to represent camera’s location and orientation. The location depends on the limitation of location of cameras, and the orientation includes pitch and yaw angles around fixed y-axis and z-axis, respectively. Roll angle is set to zero. So, the state is (x_state, θ_state, ψ_state).

Coverage test

To evaluate the coverage of the system, we need to know if the point P(x, y, z) within the region of interest (ROI) is in the FOV of the camera C_i (

F O V_{C_{i}}

), given the position

O_{i} = (O_{x_{i}}, O_{y_{i}}, O_{z_{i}})

and orientation (ϕ_i, θ_i, ψ_i) of C_i, for all cameras in the system. This is done by transforming the point P(x, y, z) relative to world frame, into P_i(x_i, y_i, z_i) relative to the camera C_i‘s frame. By comparing x_i, y_i, z_i to the obtained reduced model, we can judge whether P is in

F O V_{C_{i}}

Process of adding a new camera

The first step in each loop is adding a camera to the system. To cover the ROI using the least cameras, the placement is chosen by minimizing the number of points in the area which is not covered by any cameras. That is at point P(x, y, z), the index f(P) is defined as 0 if P is in FOV of any camera, and 1 otherwise. The objective function f_opt to be minimized is

f_{o p t} = \sum_{P \in R O I} f (P) .

PSO is implemented to minimize f_opt with the particle’s state representing the location and orientation of the new camera. The final result is then added to the camera list.

Process of readjustment

In this step, all configurations, including those of the newly added camera, are changed within small boundary around their current settings, that is, for C_i with

x_{state, i}, θ_{state, i}, ψ_{state, i}

, C_i will be varied within the range of

[x_{state, i} - x_{b}, x_{state, i} + x_{b}]

[θ_{state, i} - θ_{b}, θ_{state, i} + θ_{b}]

, and

[ψ_{state, i} - ψ_{b}, ψ_{state, i} + ψ_{b}]

, respectively, given that the ranges are truncated to the limit.

x_{b}, θ_{b}, and ψ_{b}

are the size of the boundary of

x_{state}, θ_{state}, and ψ_{state}

, respectively.
The objective function used in this process is the same as those used in the adding process, that is, f_opt. If the result provides lower number of remaining points, the result is kept; otherwise, the process continues with the previous result.
After this process, if all the points in ROI are covered, that is, f_opt = 0, the optimization process breaks the loop and gets to the final readjustment. If there are still some points left, the loop continues and proceeds to add a new camera again.

Process of final optimization

This final optimization aims at maximizing the coverage while maintaining the complete coverage without adding new camera. At each point P ∈ ROI, the number of cameras seeing the point is denoted as n(P), and the objective function to be maximized is

f_{o p t_{2}} = \sum_{P \in R O I} n (P)

subject to the constraint that

n (P) > 0, \forall P \in R O I

First,

f_{o p t_{2}}

is calculated based on the camera configurations from the loop, stored as f_cur and f_loop. Next, configuration of each camera is adjusted one-by-one within the boundary, and

f_{o p t_{2}}

is calculated. If

f_{o p t_{2}} > f_{c u r}

, the adjusted result is kept and f_cur is changed to

f_{o p t_{2}}

After adjusting all cameras, f_cur is then compared with f_loop at the beginning of the loop. If

f_{c u r} > f_{l o o p}

, there was some improvement in the previous loop, and the optimization continues. f_loop is then set to f_cur and the adjustment continues. On the other hand, f_cur = f_loop indicates that there was no improvement in the previous loop. Therefore, the process is stopped, and the final result is obtained.

Implementation

A space in our laboratory is a good candidate for testing the optimization algorithm, as there are possible places in complex shapes to put the cameras. By measuring the dimension of the space and the structures, a model of the area was obtained.
The area of the experiment is set as a 3-by-3.5-m area, sampled by 0.1-m step, as drawn in dots in Figure 4. In order to cover the area such that people’s position as well as quadrotor’s position can be obtained, the area must be covered from the height of 0.7–2.5 m, also 0.1-m step. The points become our set of ROI. The optimization is done to cover all of these sampled points.

Figure 4. Area of interest for the experiment.

In the system, Kinect’s depth sensors were utilized. According to the specification of Xbox Kinect sensor,²⁴ the horizontal angle of view (2α_c,h) is 57° and the vertical angle of view (2α_c,v) is 43°. The depth range of the Kinect is set at 0.45 m–4.5 m. By placing x-axis along the depth, and y-axis pointing to the left, the simplified model of Kinect’s FOV can be illustrated as dashed black lines in Figure 5.

Figure 5. Reduced FOV compared to full FOV. FOV: field of view.

With reduced model discussed in the section “Placement error prevention”, for each degree of error δ with α_c = 28°, ΔD_min ≈ 0.005 m on the minimum depth side (D_min = 0.45 m), and ΔD_max ≈ 0.05 m on the maximum depth side (D_max = 4.5 m). Error in roll angle results in additional 1° reduction in both horizontal and vertical angles.
With the angular errors δ = 2° and the translational errors Δ = 0.03 m, the reduced FOV is shown in Figure 5. Point P_i(x_i, y_i, z_i) in Kinect K_i‘s frame is in its FOV if

\begin{matrix} 0.49 \leq x_{i} \leq 4.37, \\ - tan (25^{\circ}) x_{i} + 0.03 \leq y_{i} \leq tan (25^{\circ}) x_{i} - 0.03, \\ - tan (18^{\circ}) x_{i} + 0.03 \leq z_{i} \leq tan (18^{\circ}) x_{i} - 0.03 \end{matrix}

Limitation of possible placement locations

Figure 6 shows the elevated structures possible for placing Kinect in thick red lines. As the Kinects are to be placed on those edges, and their heights are fixed, z_state cannot be adjusted but depends on x_state, and the possible locations are reduced to 1-D state coordinate x_state ∈ [0, 12.73] along the edge of those structures (thick blue axes in Figure 6).

Conversion of states to position and orientation

Figure 6. Possible locations of cameras and the angle ψ = 0 for each side and corner (blue dashed lines and dotted arrows are added in the environmental-camera-only case).

Roll angle ϕ of the camera is always set as zero, while pitch angle θ is used directly, with positive pitch equals to the Kinect looking down from the horizontal level, that is, θ = θ_state. The yaw angle ψ_state is defined such that yaw_state is perpendicular to the edge, or halving the corner in case of corners, as shown by thin black arrows in Figure 6.
The extreme limits of the states are set as x_state ∈ [0 m, 12.73 m], θ_state ∈ [10^∘, 30^∘], and ψ_state ∈ [−70^∘, 70^∘]. The boundary for readjustment is set as x_b = 0.5 m for x_state, and θ_b = ψ_b = 15^∘ for θ_state and ψ_state, subject to the extreme limits. Other PSO’s parameters used are provided in Table 1.

Table 1. PSO’s parameters used in the optimization simulations.

View larger version

Optimization result

The optimization script was run for 20 times, with each run resulting in different number of cameras required, ranging from 5 to 7 cameras. Each run took around 1–3 days to complete, depending on the number of Kinects to use.
The run which resulted the lowest number of cameras, that is, 5, was selected. The final readjustment in the section “Process of final optimization” was applied to the result. This process took more than a week before converging to the final result. The plots showing the position of the cameras, with their reduced fields of view and the number of cameras covering each point are shown in Figure 7(a) and (b). The combination of all five Kinects is drawn in 3-D in Figure 7(c).

Figure 7. Camera configuration of the final optimization result.

Optimization using only environmental cameras

The optimization problem utilizing only environmental cameras was also completed for comparison of the number of required cameras discussed in the section “System configuration.”
Using only environmental cameras to track the face, the requirement for coverage is increased. Instead of only covering all the points, we need to cover the points from all directions to obtain the face. The camera needs to look into the face, with the maximum deviation angle from the camera’s principal axis less than or equals to 30°.
As the optimization is for comparison with the proposed method, the simulation’s conditions and steps were maintained to be as similar as possible. In addition to covering all points in the ROI, the angles which can be covered by the quadrotor in the proposed system must also be covered. The subroutine of final readjustment is also removed to reduce time as only the number of cameras is needed.
Using the same possible locations for camera placement obviously cannot cover all angles as no cameras can observe the face in +y direction. Therefore, additional locations were added along the dashed blue lines in Figure 6, with black dotted arrows show the central angle.
Figure 8 shows the top view of the person relative to the camera. θ_p is the angle of the person’s location from the camera’s principal axis, which can be obtained by

θ_{p} = arctan2 (y_{p} - y_{c}, x_{p} - x_{c}) - ψ_{c}

where (x_p, y_p) and (x_c, y_c) are the position of the camera and the person, respectively, and ψ_c is the rotation angle of the camera from x-axis. So, the person is inside the camera’s FOV if |θ_p| ≤ α_{c, horizontal}, camera’s horizontal viewing angle.

Figure 8. Position and rotation of a person relative to a camera.

If (x_p, y_p) is not in the FOV, then all angles cannot be seen by this camera. If it is inside, the angle α can be obtained by

α = ψ_{p} - ψ_{c} - θ_{p} - 180^{\circ}

where ψ_p is the rotation angle of the person from x-axis. So

| α | \leq 30^{\circ} \to Face at (x_{p}, y_{p}) looking at ψ_{p} is detected .

The condition is checked for (x_p, y_p) ∈ ROI and ψ_p that can be tracked by the quadrotor without getting out of area.
The simulation was run with the angle steps of 1°, 2°, and 3°. Some angles at positions on the top left corner cannot be covered by the possible camera positions, resulting in incomplete solutions. The simulations were stopped after there was no improvement, and the lowest number of cameras obtained was 44 cameras in 1° step simulation, as shown in Figure 9 (red areas show uncovered face angles and positions). There are a large number of cameras for real implementation compared to our proposed system, which requires only five environmental cameras and some moving cameras, basically one camera per one person.

Figure 9. Result of optimization with only environmental cameras: 44 cameras.

Sensor fusion and quadrotor’s tracking and control

As multiple Kinect sensors are used, a quadrotor or a person is likely to be detected by more than one Kinect, so it is necessary to correctly correlate data from the same object together. Data fusion is implemented by simple weighted averaging, whose weights are determined empirically.

Person fusion and tracking

The positional data for a person are 3-D positions (x, y, z) relative to the world’s coordinate. For the orientation, only a rotation angle is required, as looking up or down cannot be detected by the human-tracking algorithm. Rotation angle is taken as the orientation of the whole body as looking left and right cannot be observed.
At the beginning, a tracking_list is initialized as an empty list. In each sample, if tracking_list is empty, all detected heads are grouped as the same head by Euclidean distance. The position and orientation of each head is averaged in the group. On the other hand, if tracking_list is not empty, the detected heads will be correlated to those in the tracking_list. It is considered as belonging to the same person if they are close enough. Old and new data are averaged to x_avg, y_avg, z_avg in the ratio of 3:1, in which the new data are equally averaged among the new data.
If there are any heads left after the matching process, the remaining heads are grouped, averaged, and added to the tracking_list. The frames which are not updated for too long are eliminated (considered as the person has left the area and the tracking is lost). At the end, tracking_list is published to Robot Operating System (ROS) in the form of tf frames. The fused head is published as a frame with origin at the averaged (x_avg, y_avg, z_avg), rotated by ψ_{z, avg} around the world’s z-axis so that the frame’s x-axis points into the person’s face.

Quadrotor fusion and tracking

Kinect sensors can only detect the quadrotor’s 3-D position, not the orientation, by the depth images. Positions of the quadrotor are fused to provide 3-D position of the quadrotor in the world’s coordinate. To handle noise and false detection, the information about the number of cameras seeing the same object and the number of quadrotor in use is utilized. The orientation of the quadrotor is obtained from the on-board IMU and later coordinated with the 3-D position.
The fusion process is generally similar to the fusion for humans, with some modifications. As the size of the quadrotor is quite small, false detection is possible after background subtraction. If there are more than one sensor detecting an object around the same position, there is higher possibility that the object really exists at that position. A boolean flag tracking is introduced: It is set to false if there is only one Kinect seeing this object at the first detection, and true if there are two or more Kinects. The number of detecting cameras is also used to arrange the order of the quadrotors in tracking_list during the initialization.
After tracking flag is set and the quadrotor is kept in tracking_list, it is used in the tracking process. If the flag is false and in the next time step more than one camera can see this object, it is considered as a quadrotor and tracking becomes true. On the other hand, if no camera can see this object in the next time step, the previous detection is considered as a noise, and it is removed from tracking_list.
The number of detected quadrotors is also limited to the number of quadrotors being used (n_quad), which is generally known beforehand. The length of the list is truncated to n_quad as it is impossible to have more quadrotor than that.
The new data are matched with the tracking_list in the same way as fusion of the head. As the quadrotor tends to move faster, the ratio of old data to new data is 1:1.
After the matching process, if there are any points left, they are grouped together by distance, averaged, and added to extra list. If there is a true flag in extra and the length of tracking_list is less than n_quad, the quadrotor in extra with true flag is added to tracking_list. If the length of tracking_list is n_quad, but there is a false flag in the list, then extra‘s quadrotor with true flag replaces that false quadrotor. The process is repeated until there is no more true flag in extra or false flag in tracking_list. Before publishing tracking_list to ROS’s tf topic, the outdated quadrotor is removed from tracking_list.

Quadrotor control

We proposed the threshold algorithm,¹⁴ with the experiments using one Kinect for controlling a quadrotor to track a person. In this article, the same algorithm is used with multiple Kinect sensors controlling one quadrotor.
When a person enters the area, the head is detected and fused, and the goal

{(x_{d}^{T}, ψ_{d})}^{T}

is calculated. Instead of directly using this goal for the control, it is averaged for 1 s, and then set as the goal for the quadrotor, that is,

{(x_{d, f}^{T}, ψ_{d, f})}^{T} = ({\bar{x}}_{d}^{T}, {\bar{ψ}}_{d})^{T}

. A threshold

{(x_{t h}^{T}, ψ_{t h})}^{T}

is applied around the fixed goal position.

From this point, new head’s position and orientation are constantly used to compute goal, while the quadrotor’s goal is kept unchanged. If the person is standing still, the new goal is only affected by the noise and should not go beyond the threshold consecutively for some time. If the new goal breaks the threshold consecutively for 1 s, it is recognized as the person is in movement and the quadrotor’s goal

{(x_{d, f}^{T}, ψ_{d, f})}^{T}

should be recalculated. The process of goal averaging then starts again. In this way, quadrotor’s goal is not frequently changed, resulting in less oscillating trajectory, but the quadrotor can still move correspondingly to track human face. Sudden change of goal position can also cause a problem to the proportional integral derivative (PID) controller, as the derivative value can be very high, resulting in large control command and the instability of the control system. To reduce the effect of this surge and provide smooth trajectory, a simple low-pass filter is applied to the goal position output from the threshold algorithm by

(\begin{matrix} x_{d, q} \\ ψ_{d, q} \end{matrix}) [i] = w (\begin{matrix} x_{d, q} \\ ψ_{d, q} \end{matrix}) [i - 1] + (1 - w) (\begin{matrix} x_{d, f} \\ ψ_{d, f} \end{matrix}) [i],

where

{(x_{d, f}^{T}, ψ_{d, f})}^{T} [i]

is the output of the threshold algorithm at time step i,

(x_{d, q}^{T}, ψ_{d, q}) [i]

is the filtered goal to be used by the PID controller to control the quadrotor at time step i, and 0 ≤ w ≤ 1 is the weight of the filter.

Experiment and results

For evaluating the optimization result and tracking performance, Kinects were set up following the optimization, and tracking experiment was conducted. Position-tracking accuracy and tracking success ratio were also investigated.

System and implementation

Hardware components

The system composes of: Xbox 360 Kinect sensors, Crazyflie Quadrotor, and controlling computers.

Xbox 360 Kinect

Xbox 360 Kinect has four major components: RGB camera, 3-D depth sensor (IR projector and IR camera), microphones, and accelerometer for reading current tilt angle.²⁴
Xbox 360 Kinect uses structured light pattern and triangulation in depth calculation. The IR emitter projects a constant pattern of IR light speckles onto the scene, where the IR camera captures those speckles and correlates them with the memorized reference pattern to give depth values.²⁵ Depth images are transmitted at 30 Hz for 640 × 480 pixels resolution.

Interference between multiple Kinect sensorsMultiple Kinects project unmodulated IR patterns into the same area and therefore detect multiple patterns, resulting in confusion and loss of depth information. A mechanical modification using a vibrating unit, made of a simple DC motor and an unbalanced load, to vibrate each Kinect was proposed by Maimone and Fuchs²⁶ and Butler et al.²⁷ The motion synchronously moves the Kinect’s projector and IR camera and therefore its own pattern is clearly visible. On the other hand, patterns from other Kinects move in different direction and frequency and are blurred. Altogether, the depth sensing of the Kinect is recovered.
As the system is relatively easy to reproduce, the method is chosen to solve the interference problem. The vibration has a drawback on blurry RGB images, but as they are not used in our processing, this does not affect the system. The other drawback is that the vibration creates a disturbing sound, possibly due to multiple movable parts of the structures holding the cameras. Redesigning the structure by adding some cushion and reducing excessive vibration may help reducing the noise, but this will be ignored at this point.

Small-sized quadrotor

Even though large quadrotors are more stable, controllable, and have more payloads, in an indoor environment with people, the size of the selected quadrotor matters. Concerning about the programmability and modifiability, Crazyflie 2.0 was chosen.
The Crazyflie 2.0 is a tiny quadrotor built using the PCB as the frame.²⁸ With open-source firmwares and libraries, wiki page, and active communities of developers, it is one of the platforms suitable for developments. Its tiny size and light weight make it a good choice for indoor use with people. The specification is given in Table 2.

Table 2. Specifications of Crazyflie 2.0.

View larger version

Software implementation

The codes were improved from the original code by Dunkley,²⁹ which uses one Kinect to detect and control one quadrotor by setting goal position using graphical user interface or Playstation3 Joystick. The system is implemented on ROS.³⁰ The programs were modified to use the goal position based on human tracking. The system was also expanded to use multiple Kinects across multiple computers. Final block diagram is illustrated in Figure 10, using five Kinects running on three computers with different specifications.

Quadrotor detection

Figure 10. Block diagram of the final system.

Each Kinect individually detects quadrotor by background subtraction on depth images. To increase the surface area, paper is attached on the quadrotor, as shown in Figure 11. Colors added were for visibility of the direction of the quadrotor in video recording and were not used in detection. The algorithm is adopted with little modification.

Figure 11. Crazyflie 2.0 with paper for depth image detection.

Prior to the quadrotor detection, the background is created from the minimum value of each pixel in a period of time, which corresponds to the nearest depth. Then, new depth images are compared to the background for any additional objects, which are classified as a quadrotor based on the detected size so that small noise and human being are not detected as a quadrotor. The detected quadrotors’ positions are published into ROS in the form of tf frames, where they are processed further by the sensor fusion process to merge data of the same quadrotor into a single one.

Human skeleton tracking

In ROS, openni_tracker³¹ package is available for detection and tracking of humans using depth images based on the open-source OpenNI framework.³² The package outputs the positions and orientations of 15 body joints in the form of ROS’s tf frames. From our observation, the orientation of the head joint is perpendicular to the line connecting two shoulders. openni_tracker initially supports human skeleton tracking using only one RGB-D sensor. Some modifications have been made so that it can be used with multiple Kinects.

Sensor fusion and tracking

A quadrotor or a person is likely to be detected by more than one Kinect, so correct correlation and fusion are necessary, as discussed in the section “Sensor fusion and quadrotor’s tracking and control.”
If data from different Kinects are close enough, they are considered as the same person or quadrotor and are fused together by averaging. In case of quadrotor, if more than one Kinect can see the same object, that object is more likely to be the quadrotor. The number of detected quadrotors is limited by the number of quadrotors being used, predefined by the user. Tracking of both person and quadrotor is done based on the previous data.

Goal position setup

The camera must be in front of the person, with the camera facing to his/her face to capture the face. The fused head has x-axis pointing into the face and z-axis pointing up to the ceiling as shown in Figure 12. The angle

{\hat{ψ}}_{head}

is therefore the frame’s yaw angle. The goal position

x_{d}

is defined at a distance d_goal in front of the face and h_goal above. These values should be adjusted according to the camera’s specification and settings. For the current experiment,

d_{goal} = 1.5 m

and

h_{goal} = 0.6 m

. By assuming that the camera is installed along the quadrotor’s x-axis, rotating

{\hat{ψ}}_{h e a d}

around the world’s z-axis will point the camera toward the face. So, we set

ψ_{d} = {\hat{ψ}}_{head}

x_{d}

and ψ_d are then used to set the goal frame for the quadrotor.

Control system

Figure 12. Goal position related to detected head (top view).

PID controllers are used to control the quadrotor. By setting the quadrotor’s x-axis to point in the forward direction, y-axis to the left, and z-axis upward, pitch angle controls x-axis motion and roll angle controls y-axis motion (Figure 13). Yaw angle ψ is the counterclockwise rotation angle around z-axis.

Figure 13. From the current position to goal (top view).

The quadrotor receives the command set of roll, pitch, yaw speed, and thrust

(u_{ϕ}, u_{θ}, u_{ψ}, u_{T})

, and the on-board firmware calculates the power of each motor. The error is obtained by finding the transformation from the quadrotor’s frame to the goal frame, resulting in e_x, e_y, e_z, and e_ψ, which are used to compute u_θ, u_ϕ, u_T, and u_ψ, respectively. For x, y, and ψ, proportional derivative (PD) controllers are used, with the same coefficients for x and y. The integral part is included for z to compensate the power drop during the flight.

Camera placement and calibration

The method of calibrating relative position and orientation between two cameras was performed. Another camera was added as the reference for the calibration, with its FOV intersecting with other cameras’ FOVs. Its orientation was also set to be simple, that is,

ϕ, θ, ψ \in {0, \pm \frac{π}{2}}

to minimize possible error in setting up the reference, comparing to setting up at any specific non-zero angle of one of the five cameras.
In ROS, there is a package, camera_pose_calibration,³³ for extrinsic camera calibration. To calibrate for depth sensor, the images from the IR camera are used, with the IR pattern projector covered. The calibration process involves holding a checkerboard in different locations, outputting each camera’s position and orientation.
The calibration was done one-by-one between each camera and the reference camera. For each camera, after the coarse placement, its relative position and orientation was calibrated. If the error from the optimization result exceeds the error limit assumed for the optimization (i.e. 2° for angles and 0.03 m for position), the adjustment is made and the calibration process is repeated until the error is within the error limit. Figure 14 shows the Kinects set in place.

Figure 14. Kinects set up in position from optimization result: 5 Kinects above, with the reference camera on the tripod.

Experimental setup

An experiment with one moving person was set to test the constructed system. A path was created to cover the space as shown in Figure 15, where the blue dots are the approximate positions (marked by markers on the floor for the person to stand), the black arrows show the approximated facing direction, and the dashed red line shows the order of moving with the movement numbers. The path was designed such that the quadrotor’s goal will not be outside the area, with some margins to allow errors. The path ends in the center and the person rotates and stops at around

- \frac{π}{2}

, −π,

\frac{π}{2}

, and 0 radian, respectively, before finishing at

- \frac{π}{2}

radian.

Figure 15. Path for the experiment with the area’s boundary.

The quadrotor’s detection size was set to (10±3)×10⁻² m. PID controllers’ coefficients are given in Table 3. Threshold values of 0.15 m, 15° were applied, with the weight w in equation (9) set as 0.98. The displacement threshold for quadrotor fusion and head fusion was set to 0.5 m.

Table 3. PID controllers for each dimension.

View larger version

The concern about safety problem was not explicitly dealt with in this implementation. The goal position of the quadrotor was set to be higher than the height of the person for 0.6 m, and 1.5 m away in the horizontal direction to reduce the risk of collision, but there was no limit on the closest distance the quadrotor can be from the person.
For better evaluation, the motion capture system by motion analysis³⁴ was used for more accurate positions and orientations. As two systems were run separately, time synchronization was manually done. Our motion analysis uses 12 units of “Kestrel Digital RealTime System,” with 3 units at each corner of the area. Each unit can record up to 2048 × 1088 pixels at 300 fps and can provide the position of each marker at millimeter-level accuracy. The output of the position of each marker was set at 120 Hz in the experiment.
For the head’s position and rotation, the person wore a cap with four markers. On the quadrotor, four markers were attached to the tip of each wing, with another marker in front.

Results

Tracking of person’s position

The person’s 2-D positions measured by the system are compared to those obtained by motion analysis system in Figure 16. Black dots indicate the markers’ positions, blue solid line shows the person’s path tracked by motion analysis, while dashed red line draws the path tracked by the proposed system. Black solid line draws the boundary of the experiment area. Plots of x, y, z, and ψ of the person’s head are shown in Figure 17.

Figure 16. 2-D projection of the person’s trajectory as tracked by motion analysis and the proposed system, with the markers.

Figure 17. Comparison of position tracking of person’s head by the proposed system with motion analysis.

Tracking and controlling of quadrotor

The quadrotor’s 2-D positions tracked by motion analysis (blue solid line) and the proposed system (dashed red line) are plotted in Figure 18. This shows that the experiment involved controlling the quadrotor throughout the area. There were also some moments in which the quadrotor went beyond the limit, but the system could still manage to control it back into the region.

Figure 18. Trajectory of the quadrotor tracked by motion analysis and the proposed system on xy-plane.

Human-tracking result

Figure 19 shows the snapshots of the tracking experiment. Graphs showing the quadrotor’s position and orientation compared to the goal position set up by the person’s position are plotted in Figure 20. Dotted black line indicates the thresholded goal position and orientation, while dashed red line indicates the quadrotor’s fused position and orientation. Blue solid line shows the position and orientation tracked by motion analysis. The average absolute errors, comparing motion analysis result to the goal position, were 0.23 m (SD 0.17 m) for 3-D position and 0.15 rad (SD 0.13 rad) for rotation angle. Video of the tracking experiment can be found in the supplementary file.

Figure 19. Snapshots of quadrotor following the person.

Figure 20. Comparison of position tracking of quadrotor by the proposed system with motion analysis.

Discussion

The error of the tracking result compared to the ground truth from motion analysis was calculated. As the data rate of the motion capture system is higher (120 Hz compared to approximately 30 Hz), motion analysis’s data were down-sampled and only the data with the closest time-stamp with the Kinect’s data were used in calculating error. Means and standard deviations of the errors, d: displacement obtained by

\sqrt{e_{x}^{2} + e_{y}^{2} + e_{x}^{2}}

and ψ: rotation angle, are shown in Table 4.

Table 4. SD of the system with reference to motion analysis.

View larger version

The accuracy of the system is within 10 cm for the position and about 0.2 rad (approximately 12°) for the direction. The result also shows that IMU can provide accurate orientation of the quadrotor in a short time. The error in the quadrotor’s rotation was partly due to the setup of the quadrotor’s initial orientation, that it might not be exactly zero. The effect of the gyroscope drift is also visible during the end of the experiment, where the errors of yaw angle tend to increase. Possible solutions to this can be the attachment of camera and using the face for quadrotor control, or the use of magnetometer to obtain global orientation relative to the Earth’s magnetic field. There can be some improvement to the precision of the person’s direction, which would help set the goal more precisely.
We analyzed the number of cameras tracking the person and the quadrotor by using the positions obtained from motion analysis system, the calibrated camera’s positions, and the full FOV model of the Kinects. Position and orientation tracking errors divided by the number of cameras, coded by colors, can be found in Figure 21. All time samples had at least one camera seeing the person and the quadrotor. For the person’s head, the smallest number of tracking cameras was 2, and there were only a few time
samples in which only one camera was tracking the quadrotor. From this result, we can see that the optimization algorithm can deploy the cameras such that at least one camera is observing the object inside the predefined area.

Figure 21. 3-D Euclidean distance, angle α, and height difference between the person and the quadrotor (the numbers show the order of movement).

By analyzing further into the relationship between the number of tracking cameras and the tracking error, we can see that the error significantly changed, either increased or decreased, when the number of tracking camera changed. This corresponds to the object crossing one of the FOV borders and that the object was moving. We can also see that the moment in which only two cameras were tracking the person or the quadrotor, the tracking error got considerably high. Adding more cameras to cover the area with low coverage may help reduce the tracking errors, but is not guaranteed as seen in the end of the experiment. This, anyway, certainly increases the total number of cameras.
Based on the position of the quadrotor and the relative orientation with the person, our positioning system could guarantee full coverage, get good estimate of the positions and orientations, and use them to control the quadrotor to perform tracking of person based on the obtained position and orientation, keeping good control when the person was not moving. The quadrotor could also adjust to the movement of the person and recover the desired position, distance, and orientation.

Conclusion and future works

A position-tracking system was proposed in this article using multiple Kinect sensors fixed in the environment. Position and orientation of person was tracked, together with the position of a flying quadrotor. The quadrotor was controlled based on the positions and orientations to follow the movement of the person, mimicking the attempt of face tracking. The experiment inside a 3-by-3.5-m area showed that our optimization algorithm guarantees full camera coverage in the area, and our system can respond to the movement of the person properly, keeping good position and orientation for performing face tracking. With the addition of camera onto the quadrotor and the feedback of the images, the quadrotor’s position and orientation can be finely adjusted to obtain better images of the face.
Apart from addition of on-board camera, the expansion of the system to perform tracking of multiple people with multiple quadrotors is possible. Topics such as collision avoidance and matching and switching of human-quadrotor pairs when people are moving need to be concerned in future works. Safety issue should also be addressed, for example, minimum distance should be implemented such that the quadrotor quickly moves away from the person when it gets too close. The attitude of people toward flying quadrotors should also be studied to find the best distance which does not make people feel dangerous or being offended.

Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was partially supported by JSPS KAKENHI (grant number 15H01698).Supplemental material
Supplemental material for this article is available online.

References

1.	Haritaoglu, I, Harwood, D, Davis, L. W⁴: real-time surveillance of people and their activities. IEEE Trans Pattern Anal Mach Intell 2000; 22(8): 809–830. Google Scholar \| Crossref \| ISI
2.	Zhao, T, Aggarwal, M, Kumar, R. Real-time wide area multi-camera stereo tracking. In: CVPR 2005. IEEE computer society conference on computer vision and pattern recognition, Vol. 1, 20–25 June 2005, pp. 976–983. IEEE. Google Scholar
3.	Brščić, D, Kanda, T, Ikeda, T. Person tracking in large public spaces using 3-d range sensors. IEEE Trans Hum Mach Syst 2013; 43(6): 522–534. Google Scholar \| Crossref \| ISI
4.	Cielniak, G, Duckett, T,, Lilienthal, AJ. Data association and occlusion handling for vision-based people tracking by mobile robots. Rob Auton Syst 2010; 58(5): 435–443. Google Scholar \| Crossref \| ISI
5.	Ali, B, Qureshi, A, Iqbal, K. Human tracking by a mobile robot using 3d features. In: 2013 IEEE international conference on robotics and biomimetics (ROBIO), 12–14 December 2013, pp. 2464–2469. IEEE. Google Scholar
6.	Bellotto, N, Hu, H. Multisensor-based human detection and tracking for mobile service robots. IEEE Trans Syst Man Cybern Part B Cybern 2009; 39(1): 167–181. Google Scholar \| Crossref \| Medline
7.	Vadakkepat, P, Lim, P, De Silva, L. Multimodal approach to human-face detection and tracking. IEEE Ind Electron Mag 2008; 55(3): 1385–1393. Google Scholar \| Crossref \| ISI
8.	Suzuki, S, Mitsukura, Y, Takimoto, H. A human tracking mobile-robot with face detection. In: IECON ‘09. 35th annual conference of IEEE industrial electronics, 3–5 November 2009, pp. 4217–4222. IEEE. Google Scholar
9.	Munaro, M, Basso, F, Menegatti, E. Tracking people within groups with RGB-D data. In: 2012 IEEE/RSJ international conference on intelligent robots and systems 7–12 October 2012, pp. 2101–2107. IEEE. Google Scholar
10.	Teuliére, C, Eck, L, Marchand, E. Chasing a moving target from a flying UAV. In: 2011 IEEE/RSJ international conference on intelligent robots and systems (IROS), 25–30 September 2011, pp. 4929–4934. IEEE. Google Scholar
11.	Pestana, J, Sanchez-Lopez, J, Saripalli, S. Computer vision based general object following for GPS-denied multirotor unmanned vehicles. In: American control conference (ACC), 4–6 June 2014, pp. 1886–1891. IEEE. Google Scholar
12.	Stillman, S, Tanawongsuwan, R, Essa, I. A system for tracking and recognizing multiple people with multiple cameras. In: Proceedings of second international conference on audio-vision-based person authentication. pp. 96–101. Google Scholar
13.	Wheeler, F, Weiss, R, Tu, P. Face recognition at a distance system for surveillance applications. In: 2010 Fourth IEEE international conference on biometrics: theory applications and systems (BTAS), 27–29 September 2010, pp. 1–8. IEEE. Google Scholar
14.	Srisamosorn, V, Kuwahara, N, Yamashita, A. Automatic face tracking system using quadrotors: Control by goal position thresholding. In: 2014 IEEE international conference on robotics and biomimetics (ROBIO), 5–10 December 2014, pp. 1314–1319. IEEE. Google Scholar
15.	Liu, H, Darabi, H, Banerjee, P. Survey of wireless indoor positioning techniques and systems. IEEE Trans Syst Man Cybern Part C Appl Rev 2007; 37(6): 1067–1080. Google Scholar \| Crossref
16.	Gu, Y, Lo, A,, Niemegeers, I. A survey of indoor positioning systems for wireless personal networks. IEEE Commun Surveys Tuts 2009; 11(1): 13–32. Google Scholar \| Crossref \| ISI
17.	Oh, H, Won, DY, Huh, SS. Indoor UAV control using multi-camera visual feedback. J Intell Rob Syst 2011; 61(1–4): 57–84. Google Scholar \| Crossref \| ISI
18.	Morioka, K, Lee, JH,, Hashimoto, H. Intelligent space for human centered robotics. In: Ahn, HS (ed.) Advances in service robotics, Chap. 11, InTech, 2008, pp. 181–192, https://www.intechopen.com/books/advances_in_service_robotics/intelligent_space_for_human_centered_robotics. Google Scholar
19.	Xu, Y, Lei, B,, Hendriks, EA. Camera network coverage improving by particle swarm optimization. Eurasip J Image Video Process 2011; 2011: 458283. https://doi.org/10.1155/2011/458283. Google Scholar
20.	Angella, F, Reithler, L,, Gallesio, F. Optimal deployment of cameras for video surveillance systems. In: AVSS 2007. IEEE conference on advanced video and signal based surveillance, 5–7 September 2007, pp. 388–392. IEEE. Google Scholar
21.	Fu, YG, Zhou, J,, Deng, L. Surveillance of a 2D plane area with 3D deployed cameras. Sensors 2014; 14(2): 1988–2011. Google Scholar \| Crossref \| ISI
22.	Kennedy, J, Eberhart, R. Particle swarm optimization. In: Proceedings, IEEE international conference on neural networks, Vol. 4, 27 November–1 December 1995, pp. 1942–1948. IEEE. Google Scholar
23.	Particle swarm optimization (PSO) with constraint support—pyswarm 0.6 documentation, http://pythonhosted.org/pyswarm/ (accessed 16 July 2015). Google Scholar
24.	Microsoft . Kinect for windows sensor components and specifications, https://msdn.microsoft.com/en-us/library/jj131033.aspx (accessed 13 July 2015). Google Scholar
25.	Khoshelham, K, Elberink, SO. Accuracy and resolution of Kinect depth data for indoor mapping applications. Sensors 2012; 12(2): 1437. Google Scholar \| Crossref \| ISI
26.	Maimone, A, Fuchs, H. Reducing interference between multiple structured light depth sensors using motion. In: Virtual reality short papers and posters (VRW), IEEE, 4–8 March 2012, pp. 51–54. IEEE. Google Scholar
27.	Butler, A, Izadi, S, Hilliges, O. Shake’n’sense: reducing interference for overlapping structured light depth cameras. In: Proceedings of the 2012 ACM annual conference on human factors in computing systems, 5–10 March 2012, ACM New York, USA, pp. 1933–1936. Google Scholar
28.	Bitcraze AB Company . Bitcraze, http://www.bitcraze.io/ (accessed 25 August 2015). Google Scholar
29.	Dunkley, O . GitHub omwdunkley/crazyflieROS, 2014; http://github.com/omwdunkley/crazyflieROS (accessed and downloaded branch joy Manager on 15 April 2014). Google Scholar
30.	ROS.org—Powering the world’s robots, http://www.ros.org/ (accessed 25 August 2015). Google Scholar
31.	Field, T . openni_tracker—ROS Wiki, http://wiki.ros.org/openni_tracker (accessed 07 July 2015). Google Scholar
32.	Open-source SDK for 3D Sensors—Openni , http://www.openni.ru (accessed 25 August 2015). Google Scholar
33.	Pradeep, V, Meeussen, W. camera_pose_calibration—ROS Wiki, http://wiki.ros.org/camera_pose_calibration (accessed 28 July 2015). Google Scholar
34.	Motion Analysis Corporation . Motion Analysis Corporation, the Motion Capture Leader, http://www.motionanalysis.com (accessed 23 August 2015). Google Scholar

Score:	Dealer	Card(s):
	Player	Card(s):

Checkrite Shopping @OMORTUNDEY Freelance Multi-Level Ventures

Search here for your information

Translate to view in your language

CHECK YOUR DAILY CALENDAR HERE

Perpetual Calender

Friday, November 30, 2018

Human-tracking system using quadrotors and multiple environmental

Abstract

Introduction

Research background

Related works

Objectives

Problem statement and specifications

System design with environmental cameras

Indoor position-tracking system using cameras

System configuration

System overview

Optimization for camera configuration

Assumptions

Camera’s field of view

Placement error prevention

Possible placement locations

Camera placement optimization algorithm

Particle’s state

Coverage test

Process of adding a new camera

Process of readjustment

Process of final optimization

Implementation

Optimization result

Optimization using only environmental cameras

Sensor fusion and quadrotor’s tracking and control

Person fusion and tracking

Quadrotor fusion and tracking

Quadrotor control

Experiment and results

System and implementation

Hardware components

Software implementation

Camera placement and calibration

Experimental setup

Results

Tracking of person’s position

Tracking and controlling of quadrotor

Human-tracking result

Discussion

Conclusion and future works

References

No comments:

Post a Comment