Generalizing to Unseen Head Poses in Facial Expression Recognition and Action Unit Intensity Estimation
Facial expression analysis is challenged by the numerous degrees of freedom regarding head pose, identity, illumination, occlusions, and the expressions itself. It currently seems hardly possible to densely cover this enormous space with data for training a universal well-performing expression recognition system. In this paper we address the sub-challenge of generalizing to head poses that were not seen in the training data, aiming at getting along with sparse coverage of the pose subspace. For this purpose we (1) propose a novel face normalization method called Fan-C that massively reduces pose-induced image variance; (2) we compare the impact of the proposed and other normalization methods on (a) action unit intensity estimation with the FERA 2017 challenge data (achieving new state of the art) and (b) facial expression recognition with the Multi-PIE dataset; and (3) we discuss the head pose distribution needed to train a pose-invariant CNN-based recognition system. The proposed Fan-C method normalizes pose and facial proportions while retaining expression information and runs in less than 2 ms. When comparing results achieved by training a CNN on the output images of Fan-C and other normalization methods, Fan-C generalizes significantly better than others to unseen poses if they deviate more than 20° from the poses available during training.