Human Pose Estimation: OpenPose vs. HRNet

3 min readJul 13, 2020

Human Pose Estimation refers to the process of inferring poses/locating body joints of humans in an image/video. It is one of the most active areas of research in computer vision because of its diverse applications:

Gait/Posture Analysis
Activity recognition
Animation/Gaming
Virtual/Augmented reality
Surveillance

Due to recent advances in deep learning , numerous deep learning based pose estimation models have come up in the last few years. In this article, we compare performance of 2 models — OpenPose and HRNet.

OpenPose is one of the most famous real-time multi-person pose estimation methods based on bottom-up approach, it starts by detecting all body parts in an image followed by grouping the detected parts into each individual person.

HRNet on the other hand is based on top-down approach, it starts with a person detector and then estimates body parts of each detected person. HRNet is one of the most recent models and has outperformed all existing methods (including OpenPose) on two benchmark data-sets — COCO and MPII human pose data-sets. Unlike other methods, HRNet maintains a high-resolution representation throughout the process, key reason behind its superior performance.

We have evaluated OpenPose and HRNet on 3 videos selected carefully:

The 1st video has high contrast between the dancing lady and the background.
The 2nd video is a bit challenging as there is low contrast between the lady and the background (color of the lady’s clothes is similar to that in the background creating partial camouflage situation)
The 3rd video is even more challenging as the boy is performing dance moves upside down, evaluating performance of algorithms in non-upright poses.

Dancing Lady High Contrast

Dancing Lady Low Contrast

Boy Dancing Upside Down

Code and Computing Infrastructure

All comparisons/results below have been obtained using this implementation of OpenPose and this implementation of HRNet. Performance was compared on CPU (Intel Xeon CPU @2.30 GHz, 8 GB RAM) as well as GPU system(Tesla K80, 12 GB RAM).

We used the following pre-trained models for evaluation:

HRNet

Architecture: POSE_HRNET_W32
Input Size: 384 x 288
# Parameters: 28.5M
Model File Size: 112 MB

OpenPose

Architecture: POSE_ITER_584000
# Parameters: 26.2M
Model File Size: 102 MB

Here are the results of running OpenPose on above 3 videos:

OpenPose — Dancing Lady High Contrast

OpenPose — Dancing Lady Low Contrast

OpenPose — Body Dancing Upside Down

Here are the results of running HRNet on above 3 videos:

HRNet — Dancing Lady High Contrast

HRNet — Dancing Lady Low Contrast

HRNet — Boy Dancing Upside Down

To do a more clear comparison we now overlay results from both HRNet and OpenPose on same video:

HRNet (Red) vs OpenPose (Blue) — Dancing Lady High Contrast

HRNet (Red) vs OpenPose (Blue) — Dancing Lady Low Contrast

HRNet (Green) vs. OpenPose (Blue)— Boy Dancing Upside Down

Here is a quick comparison:

HRNet model currently allows for only 18 key points detection, foot key points detection is not available. Whereas OpenPose allows up to 135 key points (including hands, face and feet), though we used 25 key points detection model (including foot key points).
OpenPose throws missing values when the joints are invisible, whereas HRNet does try to estimate the location of invisible joints. It might not be clearly visible in above videos, but it is noticeable when we look at the individual frames slowly in the 2nd video (when the lady does rotations).
Accuracy: In terms of accuracy, both the models perform similarly well on first 2 videos, but the performance degrades in the 3rd video. This is likely because of insufficient images of non-upright poses available in the COCO training data. As per the HRNet paper, their best model achieves mAP of 77.0 compared to mAP of 61.8 achieved by OpenPose on COCO data-set. So, HRNet is a winner in terms of accuracy (24.5% higher mAP).
Here is comparison of FPS for HRNet and OpenPose on GPU (Tesla K80, 12 GB RAM) and CPU (Intel Xeon CPU @2.30 GHz, 8 GB RAM). OpenPose achieves 0.0250 to 0.0325 FPS on CPU and 3.16 to 3.56 FPS on GPU, whereas HRNet achieves 0.040 to 0.060 FPS on CPU and 0.154 to 0.162 FPS on GPU (depending upon complexity of the video frame). We need to further examine the reasons behind poor computation performance of OpenPose on CPU; however, on GPU system, OpenPose is a clear winner if you look at the FPS(~22x times faster).
Here is model size comparison, while HRNet has 28.5M parameters, OpenPose has 26.2M parameters. HRNet has model file size of 112 MB, OpenPose has model file size of 102 MB.

This article has been written by Author (Kartik Wason) as a part of collaboration between DView, Zovaco and Carnot Research. DView provides Artificial Intelligence applied research and development services in the field of Computer Vision, Natural Language Processing and Anomaly Detection. Contact us at: hello@dview.ai.

Human Pose Estimation: OpenPose vs. HRNet

Code and Computing Infrastructure

Written by Kartik Wason, Founder, Zovaco

No responses yet