DSpace at VNU: Real-time 3D human pose recovery from a single depth image using principal direction analysis

Appl Intell DOI 10.1007/s10489-014-0535-z Real-time 3D human pose recovery from a single depth image using principal direction analysis Dong-Luong Dinh · Myeong-Jun Lim · Nguyen Duc Thang · Sungyoung Lee · Tae-Seong Kim © Springer Science+Business Media New York 2014 Abstract In this paper, we present a novel approach to recover a 3D human pose in real-time from a single depth image using principal direction analysis (PDA) Human body parts are first recognized from a human depth silhouette via trained random forests (RFs) PDA is applied to each recognized body part, which is presented as a set of points in 3D, to estimate its principal direction Finally, a 3D human pose is recovered by mapping the principal direction to each body part of a 3D synthetic human model We perform both quantitative and qualitative evaluations of our proposed 3D human pose recovering methodology We show that our proposed approach has a low average reconstruction error of 7.07 degrees for four key joint angles and performs more reliably on a sequence of unconstrained poses than conventional methods In addition, our methodology runs at a speed of 20 FPS on a standard PC, indicating that our system D.-L Dinh · S Lee ( ) Department of Computer Engineering, Kyung Hee University, Seocheon-dong, Giheung-gu, Yongin-si, Gyeonggi-do, Republic of Korea e-mail: sylee@oslab.khu.ac.kr D.-L Dinh e-mail: luongdd@oslab.khu.ac.kr M.-J Lim · T.-S Kim ( ) Department of Biomedical Engineering, Kyung Hee University, Seocheon-dong, Giheung-gu, Yongin-si, Gyeonggi-do, Republic of Korea e-mail: tskim@khu.ac.kr M.-J Lim e-mail: mjlim@khu.ac.kr N D Thang Department of Biomedical Engineering, International University, Ho Chi Minh City, Vietnam e-mail: ndthang@hcmiu.edu.vn is suitable for real-time applications Our 3D pose recovery methodology is applicable to applications ranging from human computer interactions to human activity recognition Keywords 3D human pose recovery · Depth image · Body part recognition · Principal direction analysis Introduction Recovering 3D human body poses from a sequence of images in real-time is a challenging computer vision problem Potential applications of this methodology in daily life include entertainment games, surveillance, sports science, health care technology, human computer interactions, motion tracking, and human activity recognition [14] In conventional systems, human body poses are reconstructed by solving inverse kinematics using the motion information of optical markers attached to human body parts that are tracked by multiple cameras These marker-based systems are capable of recovering accurate human body poses, but are not suitable for real-life applications because sensors have to be attached, multiple cameras need to be installed, expensive equipment is required, and the set-up is complicated [16] In contrast to marker-based approaches, some recent studies have focused on markerless-based methods that can be utilized in daily applications These markerless systems are typically based on a single RGB image [19] or multi-view RGB images [18, 22, 28] Recently, with an introduction of depth imaging devices, depth images can be readily obtained from which human depth silhouettes can be derived, and each pixel provides distance information in 3D Therefore, a 3D human pose recovery from a single depth image or silhouette without optical markers and multi-view RGB images has D.-L Dinh et al become an active research topic in computer vision Novel human pose estimation methodologies have been developed based on depth information [4] In [20, 21], depth data was used to build a graph-based representation of a human body silhouette from which the geodesic distance map of body parts was computed to find primary landmarks such as the head, hands, and feet A 3D human pose was reconstructed by fitting a human skeleton model to the landmarks In [2], using the information of primary landmarks as features of each pose, the best matching pose was found from the set of poses coded in the hierarchical tree structure In [7, 8, 15], depth data was presented as 3D surface meshes and then a set of geodesic feature points such as the head, hands, and feet were used to track poses These approaches are generally based on alternative representations of the human depth silhouette and the detection of body parts Another 3D body pose recovery approach utilizes a learning methodology to recognize each body part [10, 12] Based on the recognized body parts, the corresponding 3D pose is reconstructed In [26], the authors developed a new algorithm based on expectation maximization (EM) with two-step iterations: namely, body part labeling (E-step) and model fitting (M-step) The depth silhouette and the estimated 3D human body model of this method were represented by a cloud of points in 3D and a set of ellipsoids, respectively Each 3D point of the cloud was assigned and then fitted to one corresponding ellipsoid This process was iterated by minimizing the discrepancies between the model and depth silhouette However, this algorithm is too slow for real-time applications due to the high computational costs of labeling In [23], a new human pose recognition approach based on body parts from a single depth image was developed Human body parts were inferred from the depth image based on per-pixel classification via some randomized decision trees trained using a large database (BD) of synthetic depth images This allowed efficient identification of human body parts in real-time This system was able to recognize up to 31 body parts from a single human depth silhouette To model 3D human poses, the authors then applied the mean-shift algorithm [6] to the recognized human body parts to estimate body joint positions Human body poses were recovered from these joint points However, joint position estimation via the mean-shift algorithm generally suffers from the following limitations: (i) the position of estimated joints depends on the shape and size of the subject, (ii) the computed relative information concentrates on the surface of the body parts, whereas joints are positioned inside of these parts, (iii) the method requires input of values for the parameters such as window size To overcome the limitations of existing studies [23, 26] and develop a more robust 3D human pose recovery methodology, we propose a novel algorithm recovering 3D human poses in real-time based on principal direction analysis (PDA) of recognized human body parts from a series of depth images Human body parts in the depth silhouette are first recognized via trained random forests (RFs) with our own synthetic training DB similar to the method described in [11] Using PDA, principal direction vectors are estimated from the recognized body parts Then, the directional vectors are mapped to each body part of the 3D human body model to recover the 3D human pose In our work, instead of using a simple human skeleton model without constraints on the configuration of joints for 3D human pose recovery as done in [23], we develop a more sophisticated model that uses a kinematic chain with predefined degrees of freedom (DOF) for each joint which allows only feasible body movements and recovery of the 3D human pose The rest of the paper is organized as follows In Section 2, we describe our overall system In Sections 3, 4, and we introduce our proposed methodology including synthetic DB creation, RFs for pixel-based classification, body parts recognition, PDA, and 3D human pose representation Experimental results and comparisons with existing methodologies [23, 26] are presented in Section Conclusions and a discussion of our findings are provided in Section Our methodology Our goal in this study is to recover a 3D human pose from a single human depth silhouette Figure shows the key steps involved in our proposed 3D human pose recovering methodology In the first step, a single depth image is captured by a depth camera The human depth silhouette is then extracted by removing the background In the second step, human body parts of the silhouette are recognized via trained RFs In the third step, the principal directions of the recognized body parts are estimated by PDA Finally, these directions are mapped to the 3D synthetic human model, resulting in recovery of a 3D human body pose Body parts recognition As mentioned above, to recognize body parts from a depth human silhouette, we utilize RFs as in [3, 11, 23] This learning-based approach requires a training DB; we therefore created our own synthetic training DB [11] More details are provided in the following sub-sections 3.1 A synthetic DB of depth maps and corresponding body parts labeled maps To create the training DB, we created synthetic human body models using 3Ds Max, a commercial 3D graphic package 3D human pose recovery from a depth image using principal direction analysis Fig Key processing steps of our proposed system These steps consist of taking the depth image, removing background, labeling body parts, applying PDA to the body parts, and finally recovering a 3D human pose [1] The human body model consists of a total 31 body parts [11] To create various poses, motion information from Carnegie Mellon University (CMU)’s motion DB [5] was mapped to the model Finally, a depth silhouette and its corresponding body part-labeled map were saved in the DB Our DB comprised 20,000 depth maps and corresponding body part labeled maps A set of samples of the human body model, a map of the labeled body parts, and the map’s corresponding depth silhouette are shown in Fig Images in the DB had a size of 320 × 240 with 16-bit depth values 3.2 Depth feature extraction We computed depth features based on differences between neighboring pixel pairs Depth features, f , were extracted from a pixel x of the depth silhouette as described in [13, 23]: fθ (I, x) = dI x + o1 dI (x) − dI x + o2 dI (x) (1) where dI (x) is the depth value at pixel x in image I , and parameters θ = (o1 , o2 ) describe offsets o1 and o2 from pixel x The maximum offset value of o1 , o2 pairs was 60 pixels, corresponding to meters, which was the distance of the subject to the camera Normalization of the offset by dI 1(x) ensured that the features were distance invariant 3.3 RFs for body parts labeling To create trained RFs, we used an ensemble of five decision trees The maximum depth of the trees was 20 Each tree in the RFs was trained with different pixels sampled randomly from the synthetic depth silhouettes and their corresponding body part indices A subset of 2,000 training sample pixels was drawn randomly from each synthetic depth silhouette in the DB A sample pixel was extracted to obtain 2,000 candidate features as computed using (1) At each splitting node in the tree, a subset of 50 candidate features was considered For pixel classification, each pixel x of a tested depth silhouette was extracted to obtain candidate features For each tree, starting from the root node, if the value of the splitting Fig a A 3D graphic human body model used in a training DB generation, b a body part-labeled model, and c a depth silhouette in the synthetic DB D.-L Dinh et al function was less than the threshold of the node, x went to the left and otherwise x went to the right based on all built trees in RFs [3, 9] The optimal threshold for splitting the node was determined by maximizing the information gain in the training process The probability distribution over 31 human body parts was computed at the leaf nodes in each tree Final decision to label each depth pixel for a specific body part was based on the voting result of all trees in the RFs 3D human pose recovery from recognized body parts 4.1 Joint position proposal based on the mean shift In [23], to recover a 3D human pose, a 3D human skeleton model of joints was used where the joints were fitted from recognized body parts using a mean-shift algorithm with a weighted Gaussian kernel The mean shift algorithm is a nonparametric density estimation used to find the nearest mode of a point sample distribution [27] This technique is commonly used in image segmentation and object tracking fields of computer vision studies [6, 24] Given n data points xi , i = 1, , n on a d-dimensional space R d , the multivariate density obtained with a kernel K(x), window radius of kernel h, and weight function w is fˆh (x) = d nh n wi K i=1 x − xi h (2) The sample mean with kernel K (G(x) = K (x)) at point x is defined as mh (x) = n x−xi i=1 xi wi G h n x−xi i=1 wi G h (3) The difference between mh (x) and x is called the mean shift The mean shift vector always points toward the direction of the maximum value of the density Therefore, the mean shift procedure is guaranteed to converge to a point where the gradient of the defined density function approaches zero The mean shift algorithm process is illustrated in Fig Starting on the data point in cyan, the mean shift procedure is performed to find the stationary point in red of the density function In [23], to optimize parameters and improve the efficiency of the mean shift, the size of window h was replaced by bc , which is a learned per-part bandwidth, whereas the weight wic was obtained from the probability distribution of each pixel in class C and the given depth dI (xi ) as follows: wic = P (x|I, xi ).dI (x)2 (4) To reconstruct and visualize an estimated 3D human pose using the mean shift, a skeleton model is represented by joint points estimated from the recognized body parts This is a simple model and the mean shift used to estimate joint points has some limitations: the optimal window size is difficult to find, but an inappropriate window size can cause modes to merge; the position of estimated joints depends on the shape and size of the recognized body parts and is only computed on the surface of body parts, whereas joints are positioned inside of the parts To overcome the limitations of this approach, we propose a PDA algorithm, which we describe in the following section (Section 4.2) 4.2 Principal direction analysis of recognized body parts In this section, instead of estimating joint positions from the recognized body parts as in [23], we focus on analyzing principal directions of human body parts including the torso, upper arms, lower arms, upper legs, and lower legs, which can be grouped together as two, three, or four successive recognized body parts We denote the human body parts as {P ,P , ,P M }, where M is the number of human body parts (in our work M = 9) Each human body part is presented as the 3D point cloud P m consisting of n 3D points P m ={xi }ni=1 , where the value of n changes with the size of human body parts 3D point clouds {P m }M m=1 are used by the PDA algorithm to determine principal direction vectors Vd1 , Vd2 , VdM More details of PDA are provided in the following sub-sections 4.2.1 Outlier removal Recognized body parts, represented as a cloud of points, contain some outliers and mislabeled points These points can hinder PDA, resulting in inaccurate directional vectors of human body parts For this reason, before applying PDA, we devised a technique to select only points of interest from the labeled point cloud To select these points from the cloud, we determined the weight value of all points in the selected cloud using a logistic function and the Mahalanobis distance The logistic function of a population w can be written as w(ti ) = L + eα(ti −t0 ) (5) 3D human pose recovery from a depth image using principal direction analysis Fig Mean shift iteration process to find the centroid of a cloud where t0 denotes a rough threshold value that is defined based on the size of the cloud of points, α a constant value, and L the limiting value of the output (in our case L = 1) Here, t0 and α are chosen based on the shape and size of each human body part ti is the Mahalanobis distance computed at the i th point in the cloud as follows: in the cloud This means that if we assume that the weight value of function w ∈ [0,1], then the weight of points in red near the centroid of the cloud is approximately 1, while the weight of the points in green far from the centroid is approximately The weight of points around the threshold value t0 is approximately 0.5 4.2.2 PDA ti = (xi − μ)T (S)−1 (xi − μ) (6) where xi is the i th point in the cloud, μ is the mean vector of the cloud, and S is the covariance matrix of the cloud, which is computed as n S= i=1 (xi − μ)(xi − μ)T n (7) Our proposed approach is illustrated in Fig Selected points subject to PDA that are used to determine the direction vector are shown in red Points in green are regarded as outliers The size or population of the region containing the points is controlled by the threshold parameter t0 , while the parameter α is used to control the weight value of points Here we describe how we estimate the directional vector Vdm from the point cloud P m First, to reduce the influence of outliers on estimating the principal direction of the point cloud, a weight vector is computed using a logistic function and the threshold parameter as described in (5) The weight value of each point in the cloud is determined based on its corresponding value of Mahalanobis distance as in (6) Then, based on a statistical approach and the known weight vector, the mean vector and covariance matrix of PDA are calculated as follows: μ∗ = ∗ S = n i=1 w ti xi , n i=1 w ti n i=1 w (8) ti2 (xi − μ∗ )(xi − μ∗ )T n i=1 w ti2 − (9) D.-L Dinh et al Fig a Logistic function with t0 = and α = 2, α = b Effect of parameters t0 and α on the threshold value of 3D point clouds to eliminate outliers (t0 = 2, α = 4) Finally, to estimate the principal direction vector Vdm from the cloud P m , the following equation is solved: Vdm (Ek ) = arg max EkT S ∗ Ek (10) {Ek }3k=1 where, E is eigenvectors of S ∗ and Vdm (Ek ) corresponds to the largest eigenvalue of S ∗ Algorithm Principal Direction Analysis (PDA) Input: A 3D point cloud P m Output: A principal direction vector Vdm Method: Step Find the mean vector μ and the covariance matrix S of the point cloud P m , as in (7) Step Compute the Mahalanobis distance of all points in the cloud P m with its mean vector μ and covariance matrix S as described in (6) Step Assign the weight value for all points in the cloud P m using a logistic function and the vector of determined Mahalanobis distances, as in (5) Step Compute the PDA mean vector μ∗ and PDA covariance matrix S ∗ of the point cloud P m using the assigned weight value of each point as in (8) and (9) Step Find the eigenvector corresponding to the largest eigenvalue computed from the covariance matrix S ∗ in (10) The eigenvector is a determined principal direction vector Vdm We apply PDA to estimate the principal direction vectors of human body parts on their corresponding 3D point clouds Note that a 3D point cloud P m is represented as a n × matrix, where n denotes the number of 3D points in the cloud P m and each 3D point consists of three coordinate in the x, y, and z dimensions To determine the principal direction vector Vdm of the cloud P m , PDA starts with a covariance matrix S determined from the matrix [P m ]n×3 and a mean vector [μ]1×3 as in (7) By using the mean vector and covariance matrix, the Mahalanobis distance vector of all points in cloud P m is computed by (6) The results of (5) return a weight vector corresponding to all points in this cloud Based on the weight vector, a PDA covariance matrix S ∗ and mean vector μ∗ are determined by (8) and (9) Finally, a principal direction vector Vdm of the cloud P m is estimated by (10) Details of the PDA algorithm are presented in Algorithm The performance of PDA on point clouds with and without outlier removal is illustrated in Fig The results of the estimated principal directions are shown as blue lines on the clouds 3D human pose representation To represent a recovered 3D human pose, we utilized a 3D synthetic human model created by a set of super-quadrics The joints of the model were connected with a kinematic chain and parameterized with rotational angles at each joint [25, 26] Our 3D synthetic human body model is defined in 4-D projective space as me (X) = XT VθT QT DQVθ X − = (11) 3D human pose recovery from a depth image using principal direction analysis Fig Comparison results of PDA (a) without outlier removal and (b) with outlier removal The resultant principal directions are blue lines superimposed on the point clouds Two sets of 3D point clouds indicate an upper arm part (left, cyan) and a lower arm part (right, green) with some outliers where X is the coordination of the 3D point on the surface of super-quadrics, D is a diagonal matrix containing the size of super-quadrics, Q locates the center of super-quadrics in the local coordination system, and Vθ is a matrix containing relative kinematic parameters computed from the directional vectors Vd Our model is comprised of 10 human body parts (head, torso, left and right upper and lower arms, left and right upper and lower legs) and joints (two knees, two hips, two elbows, two shoulders, and one neck) There were a total of 24 DOFs (including two DOFs at each joint and six free transformations from the global coordinate system to the local coordinate system at the hip) as shown in Fig In Fig 6a, the dashed line and arrow superimposed on the model show the results of PDA and Fig, while the corresponding recovered 3D human pose and the 3D model of super-quadrics are shown in Fig 6b Experimental results In this section, we evaluated our proposed methodology through quantitative and qualitative assessments using synthetic and real data as well as though comparison with previous approaches [23, 26] 6.1 Experimental set-ups To assess our approach quantitatively, we utilized synthetic depth silhouettes and ground-truth information extracted from synthetic 3D body poses For each synthetic 3D human pose, we measured joint angles of four major joints, including the left-right elbows and knees from the 3D human body model, and saved these as the ground truth Then, each recovered 3D human pose from the corresponding body depth silhouettes was recognized via trained RFs, and principal directions were estimated by PDA Finally, we derived the same joint angles from the recovered 3D pose and compared them to the ground truth For qualitative assessment of real data, we utilized the depth silhouettes captured by a depth camera [17] These directions were finally mapped on to the 3D human body model, resulting in recovery of the 3D human body pose To assess real data, visual inspection between the results of the recovered 3D human poses and RGB images was performed Pose recovery was performed using a standard desktop PC with an Intel Pentium Core i5, 3.4 GHz CPU, and GB RAM 6.2 Experimental results with synthetic data Fig 3D synthetic human model a Orientation model and b 3D model with super-quadrics shapes We performed a quantitative evaluation using a series of 500 depth silhouettes containing various unconstrained movements Evaluation results for synthetic poses are shown in D.-L Dinh et al Figs and Each plot in Fig corresponds to a joint angle estimated by PDA Solid and dashed lines indicate the PDA estimated and the corresponding ground truth joint angles, respectively We computed the average reconstruction error between the estimated joint angles and ground truth joint angles as θ = nf i=1 grd θiest − θi nf , (12) where nf is the number of frames, i is the frame index, grd θi is the ground-truth angle, and θiest is the estimated angle To assess the reconstruction errors, we evaluated the four different sequences of swimming, boxing, cleaning, and dancing activities Each sequence contained 100 frames The average errors at the four considered joint angles of the second experiment are given in Table The average reconstruction error of the four different sequences at the four considered joint angles was 7.07 degrees Fig Sample results of our proposed 3D human pose estimation based on our synthetic data The 1st and 3rd rows show the synthetic depth map The 2nd and 4th rows show the estimated 3D human poses 3D human pose recovery from a depth image using principal direction analysis Fig Comparison of the ground-truth and the estimated joint angles in synthetic data: a joint angle of left elbow, b joint angle of right elbow, c joint angle of left knee, and d joint angle of right knee 6.3 Experimental results with real data To evaluate real data, we asked three subjects to perform unconstrained movements Two experiments were performed In the first experiment, we examined principal direction estimation using PDA for one subject Figure shows the results; the principal directions are shown as lines superimposed on the subject’s poses In the second experiment, we assessed movements of the elbows and knees with arbitrary poses (some simple and Table Average reconstruction error of evaluated joint angles (degrees) based on analysis of 100 frames of each activity Activities Left elbow Right elbow Left knee Right knee Swimming Boxing Cleaning Dancing Average reconstruction error (◦ ) 5.11 6.78 5.45 5.42 5.69 5.12 6.57 5.62 5.19 5.63 8.34 8.12 7.56 8.86 8.22 8.67 9.24 7.67 9.34 8.73 D.-L Dinh et al Fig Sample results of PDA Blue lines indicate the directions of the four human body parts of the upper arms and legs Red lines indicate the directions of the lower arms and legs complex poses) The experimental results for arm and leg movements of the first subject are shown in Fig 10 The 2nd and 3rd rows show 3D human poses reconstructed from the front and side views, respectively Because ground truth joint angles are not available for the real data, we only performed qualitative assessments by visual inspection of the results of the 2nd and 3rd rows and RGB images in the 1st row Figure 11 shows the qualitative assessments of two other subjects with different body sizes and shapes from the first subject and each other Fig 10 Sample results of our proposed 3D human pose estimation for four different arm and leg movements The 1st row shows RGB images of four different poses, while the 2nd and 3rd rows show the estimated 3D human pose results from the front and side views respectively 3D human pose recovery from a depth image using principal direction analysis Fig 11 Sample results of our proposed 3D human pose estimation for four different poses of differently-shaped subjects RGB images are shown in the 1st row and estimated 3D human pose from two different subjects are shown in the 2nd row 6.4 Comparisons to the conventional methods We evaluated the performance of our proposed methodology by comparing its performance with those of conventional methods [23, 26] For comparison to the mean shift method [23], we implemented the real-time human pose recognition system as described in [23] We used our synthetic DB to train RFs in this system We evaluated the mean shift method through quantitative and qualitative assessments using synthetic and real data Quantitative assessments of the same tested synthetic data obtained using different methods are presented in Table 2; our approach resulted in an average reconstruction error at the four considered joint angles of our method of 7.07 degrees compared to 9.79 degrees for the mean shift method We also performed a qualitative assessment of the same real data The recovered 3D human poses are represented on the same 3D synthetic pose model shown in Fig 12 As can be seen in Fig 12, our proposed methodology significantly improved accuracy compared with pose reconstruction based on the mean shift method In particular, our proposed method was more robust than the mean shift method in poses involving overlapped or intersected human body parts In addition, our system utilized a 3D synthetic human model created by a set of super-quadrics connected with the defined DOF at each joint for recovering 3D human poses Our system can therefore be used in real-time for practical applications For comparison to the EM method [26], we used average reconstruction errors computed from the four experiments as given in [26] The average reconstruction errors of the EM method for the left-right elbows and knees were 7.50, 7.63, 8.03, and 13.81 degrees compared to 5.69, 5.63, 8.22, and 8.73 degrees using our proposed method, as shown in Table Table Comparison of the average reconstruction error (degrees) Evaluated angles Left elbow Right elbow Left knee Right knee Average error of the four joints Method proposed in [26] Method proposed in [23] Our proposed method 7.50 9.24 5.69 7.61 9.41 5.63 8.03 10.15 8.22 13.81 10.34 8.73 9.24 9.79 7.07 D.-L Dinh et al Fig 12 Comparison of our approach versus that outlined in [23] for four different poses The 1st row shows RGB images, the 2nd row shows depth silhouettes, the 3rd row shows the results obtained from the mean shift algorithm and the 4th row shows the results obtained using our proposed PDA algorithm Conclusions and discussion We have developed a novel method to recover a 3D human pose from a single depth silhouette Our method estimates principal direction vectors from the recognized body parts using PDA, which is novel and effective in recovering 3D human poses Quantitative assessments revealed that our method had an average reconstruction error of only 7.07 degrees for four key joint angles, whereas the mean shift and the EM methods had an average reconstruction error of 9.79 degrees and 9.24 degrees, respectively Our algorithm runs at a speed of 20 FPS on a standard PC, which indicates that our system is suitable for real-time human activity recognition and human computer interaction applications such as those for personal life-care and health-care services By analyzing real data, we demonstrated that our system performs reliably on sequences consisting of unconstrained movements of subjects with different appearance and shapes Acknowledgments This research was supported by the MSIP (Ministry of Science, ICT & Future Planning), Korea, under the ITRC (Information Technology Research Center) support program supervised by the NIPA (National IT Industry Promotion Agency (NIPA2013-(H0301-13-2001)) This work was also supported by the Industrial Strategic Technology Development Program (10035348, Development of a Cognitive Planning and Learning Model for Mobile Platforms) funded by the Ministry of Knowledge Economy(MKE, Korea) References Autodesk 3Ds MAX, 2012 Baak A, Mller M, Bharaj G, Seidel HP, Theobalt C (2011) Data-driven approach for real-time full body pose reconstruction from a depth camera In: Proceedings of the 2011 international conference on computer vision pp 1092–1099 Breiman L (2001) Random forests Mach Learn 45(1):5–32 Chen L, Wei H, Ferryman J (2013) A survey of human motion analysis using depth imagery Pattern Recognit Lett 34(15):1995– 2006 CMU motion capture database http://mocap.cs.cmu.edu Comaniciu D, Meer P (2002) Mean shift: a robust approach toward feature space analysis IEEE Trans Pattern Anal Mach Intell 24(5):603–619 Ganapathi V, Plagemann C, Koller D, Thrun S (2010) Real time motion capture using a single time-of-flight camera In: IEEE 3D human pose recovery from a depth image using principal direction analysis 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 conference on computer vision and pattern recognition (CVPR) pp 755–762 Ganapathi V, Plagemann C, Koller D, Thrun S (2012) Real-time human pose tracking from range data In: Proceedings of the 12th European conference on computer vision pp 738–751 Hastie T, Tibshirani R, Friedman J (2008) The elements of statistical learning Springer, New York Holt B, Ong EJ, Bowden R (2013) Accurate static pose estimation combining direct regression and geodesic extrema In: IEEE international conference and workshops on automatic face and gesture recognition Jalal A, Sharif N, Kim JT, Kim TS (2013) Human activity recognition via recognized body parts of human depth silhouettes for residents monitoring services at smart home J Indoor Built Environ 22:271–279 Jiu M, Wolf C, Taylor G, Baskurt A (2013) Human body part estimation from depth images via spatially-constrained deep learning Pattern Recognit Lett Lepetit V, Lagger P, Fua P (2005) Randomized trees for real-time keypoint recognition In: IEEE computer society conference on computer vision and pattern recognition pp 775–781 Moeslund TB, Hilton A, Krger V (2006) A survey of advances in vision-based human motion capture and analysis Comp Vision Image Underst 104(2):90–126 Plagemann C, Ganapathi V, Koller D, Thrun S (2010) Real-time identification and localization of body parts from depth images In: IEEE international conference on robotics and automation (ICRA) pp 3108–3113 Poppe R (2007) Vision-based human motion analysis: an overview Comp Vision Image Underst 108(1–2):4–18 PrimeSense Ltd http://www.primesense.com Rosenhahn B, Kersting UG, Smith AW, Gurney JK, Brox T, Klette R (2005) A system for marker-less human motion estimation Lect Notes Comput Sci 3663:230–237 Rosenhahn B, Schmaltz C, Brox T, Weickert J, Cremers D, Seidel HP (2008) Markerless motion capture of man-machine interaction In: IEEE Conference on Computer Vision and Pattern Recognition, 2008 pp 23–28 Schwarz LA, Mkhitaryan A, Mateus D, Navab N (2011) Estimating human 3d pose from time-of-flight images based on geodesic distances and optical flow In: IEEE conference on automatic face and gesture recognition pp 700–706 Schwarz LA, Mkhitaryan A, Mateus D, Navab N (2012) Human skeleton tracking from depth data using geodesic distances and optical flow J Image Vision Comput 30(3):217–226 Shen J, Yang W, Liao Q (2013) Part template: 3d representation for multiview human pose estimation Pattern Recog 46(7):1920– 1932 Shotton J, Fitzgibbon A, Cook M, Sharp T, Finocchio M, Moore R, Kipman A, Blake A (2011) Real-time human pose recognition in parts from single depth images In: Proceedings of the 2011 IEEE conference on computer vision and pattern recognition pp 1297–1304 Shuang Z, Yu-ping Q, Hao D, Gang J (2012) Analyzing of meanshift algorithm in extended target tracking technology Lect Notes Electr Eng 144:161–166 Sundaresan A, Chellappa R (2008) Model-driven segmentation of articulating humans in laplacian eigenspace IEEE Trans Pattern Anal Mach Intell 30(10):1771–1785 Thang ND, Kim TS, Lee YK, Lee SY (2011) Estimation of 3-d human body posture via co-registration of 3-d human model and sequential stereo information Appl Intell 35(2):163–177 Vilaplana V, Marques F (2008) Region-based mean sift tracking: application to face tracking In: IEEE international conference on image processing pp 2712–2715 28 Yuan ZH, Lu T (2013) Incremental 3d reconstruction using bayesian learning Appl Intell 39(4):761–771 Dong-Luong Dinh received his B.S and M.S degree in Information and Communication Technology from Ha Noi University of Science and Technology, Vietnam, in 2001 and 2009 respectively In 2001, he was a Lecturer with the Department of Information Technology, Nha Trang University, Vietnam He is currently working toward his Ph.D degree in the Department of Computer Engineering, Kyung Hee University, South Korea His research interests include computer vision, machine learning, human computer interaction, human motion analysis and estimation Myeong-Jun Lim received his B.S degree in Biomedical Engineering from Kyung Hee University, South Korea He is currently working toward his M.S degree in the Department of Biomedical Engineering at Kyung Hee University, Republic of Korea His research interests include image processing, pattern recognition, artificial intelligence, and computer vision D.-L Dinh et al Nguyen Duc Thang received his B.E degree in Computer Engineering from Posts and Telecommunications Institute of Technology, Vietnam in 2005 and Ph.D degree from the Department of Computer Engineering at Kyung Hee University, South Korea in 2011 He is currently a lecturer of the Department of Biomedical Engineering at International University, National University Ho Chi Minh, Vietnam His research interests include biomedical image and signal processing, computer vision, and machine learning Sungyoung Lee received his B.S from Korea University, Seoul, South Korea in 1978 He got his M.S and Ph.D degrees in Computer Science from Illinois Institute of Technology (IIT), Chicago, Illinois, USA in 1987 and 1991 respectively He has been a professor in the Department of Computer Engineering, Kyung Hee University, Korea since 1993 He is a founding director of the Ubiquitous Computing Laboratory, and has been affiliated with a director of Neo Medicinal ubiquitous-Life Care Information Technology Research Center, Kyung Hee University since 2006 Before joining Kyung Hee University, he was an assistant professor in the Department of Computer Science, Governors State University, Illinois, USA from 1992 to 1993 His current research focuses on Ubiquitous Computing, Cloud Computing, Intelligent Computing, Context-Aware Computing, WSN, Embedded Realtime and Cyber- Physical Systems, and eHealth He has authored/coauthored more than 405 technical articles (130 of which are published in archival journals) He is a member of the ACM and IEEE Tae-Seong Kim received the B.S degree in Biomedical Engineering from the University of Southern California (USC) in 1991, M.S degrees in Biomedical and Electrical Engineering from USC in 1993 and 1998 respectively, and Ph.D in Biomedical Engineering from USC in 1999 After his postdoctoral work in cognitive sciences at the University of California, Irvine in 2000, he joined the Alfred E Mann Institute for Biomedical Engineering and Dept of Biomedical Engineering at USC as a Research Scientist and Research Assistant Professor In 2004, he moved to Kyung Hee University in Korea where he is currently Professor in the Biomedical Engineering Department His research interests have spanned various areas of biomedical imaging including Magnetic Resonance Imaging (MRI), functional MRI, E/MEG imaging, DT-MRI, transmission ultrasonic CT, and Magnetic Resonance Electrical Impedance Imaging Lately, he has started research work in proactive computing at the u-Lifecare Research Center where he serves as Vice Director Dr Kim has published more than 90 peer reviewed papers and 150 proceedings, international book chapters, and holds 10 international and domestic patents He is a member of IEEE from 2005–2013, KOSOMBE, and Tau Beta Pi, and listed in Who’s Who in the World ’09-’13 and Who’s Who in Science and Engineering ’11-’12 ... database http://mocap.cs.cmu.edu Comaniciu D, Meer P (2002) Mean shift: a robust approach toward feature space analysis IEEE Trans Pattern Anal Mach Intell 24(5):603–619 Ganapathi V, Plagemann... of depth maps and corresponding body parts labeled maps To create the training DB, we created synthetic human body models using 3Ds Max, a commercial 3D graphic package 3D human pose recovery from. .. recovering 3D human poses in real-time based on principal direction analysis (PDA) of recognized human body parts from a series of depth images Human body parts in the depth silhouette are first

DSpace at VNU: Real-time 3D human pose recovery from a single depth image using principal direction analysis

Thông tin tài liệu

Từ khóa liên quan

Mục lục

3D human pose recovery from a depth image using principal direction analysis

Abstract

Introduction

Our methodology

Body parts recognition

A synthetic DB of depth maps and corresponding body parts labeled maps

Depth feature extraction

RFs for body parts labeling

3D human pose recovery from recognized body parts

Joint position proposal based on the mean shift

Principal direction analysis of recognized body parts

Outlier removal

PDA

3D human pose representation

Experimental results

Experimental set-ups

Experimental results with synthetic data

Experimental results with real data

Comparisons to the conventional methods

Conclusions and discussion

Acknowledgments

References

Tài liệu cùng người dùng

Tài liệu liên quan