Clinical Evidence
Pharynx AI

Link to paper

Automatic segmentation of the pharyngeal airway space with convolutional neural network

Sohaib Shujaat a, Omid Jazil a, Holger Willems b, Adriaan Van Gerven b, Eman Shaheen a, Constantinus Politis a, Reinhilde Jacobs a c

a OMFS IMPATH Research Group, Department of Imaging & Pathology, Faculty of Medicine, KU Leuven & Oral and Maxillofacial Surgery, University Hospitals Leuven, Kapucijnenvoer 33, Leuven, Belgium

b Relu, R&D, Leuven, Belgium

c Department of Dental Medicine, Karolinska Institutet, Stockholm, Sweden



This study proposed and investigated the performance of a deep learning based three-dimensional (3D) convolutional neural network (CNN) model for automatic segmentation of the pharyngeal airway space (PAS).


A dataset of 103 computed tomography (CT) and cone-beam CT (CBCT) scans was acquired from an orthognathic surgery patients database. The acquisition devices consisted of 1 CT (128-slice multi-slice spiral CT, Siemens Somatom Definition Flash, Siemens AG, Erlangen, Germany) and 2 CBCT devices (Promax 3D Max, Planmeca, Helsinki, Finland and Newtom VGi evo, Cefla, Imola, Italy) with different scanning parameters. A 3D CNN-based model (3D U-Net) was built for automatic segmentation of the PAS. The complete CT/CBCT dataset was split into three sets, training set (n = 48) for training the model based on the ground-truth observer-based manual segmentation, test set (n = 25) for getting the final performance of the model and validation set (n = 30) for evaluating the model's performance versus observer-based segmentation.


The CNN model was able to identify the segmented region with optimal precision (0.97±0.01) and recall (0.96±0.03). The maximal difference between the automatic segmentation and ground truth based on 95% hausdorff distance score was 0.98±0.74mm. The dice score of 0.97±0.02 confirmed the high similarity of the segmented region to the ground truth. The Intersection over union (IoU) metric was also found to be high (0.93±0.03). Based on the acquisition devices, Newtom VGi evo CBCT showed improved performance compared to the Promax 3D Max and CT device.


The proposed 3D U-Net model offered an accurate and time-efficient method for the segmentation of PAS from CT/CBCT images.

Clinical significance

The proposed method can allow clinicians to accurately and efficiently diagnose, plan treatment and follow-up patients with dento-skeletal deformities and obstructive sleep apnea which might influence the upper airway space, thereby further improving patient care.


The pharyngeal airway space (PAS), also referred to as the upper airway, is a complex anatomical region having an intimate relationship with the surrounding skeletal and soft tissue structures. It is mainly responsible for performing functions such as respiration, phonation and swallowing [1]. The PAS assessment has been an area of interest for clinicians since several studies confirmed its relationship with the craniofacial growth and development [2, 3, 4, 5]. Previously, two-dimensional (2D) lateral cephalometry was applied for assessing the airway changes at the stages of  diagnosis, treatment planning and follow-up in patients with dento-facial and skeletal abnormalities [6,7]. However, based on the potential limitations offered by the 2D techniques, these have been widely replaced by computed tomography (CT), cone-beam CT (CBCT) and magnetic resonance imaging (MRI) as a clinical standard for analyzing PAS volume and dimensional changes  for a better understanding of its pathophysiology [8, 9, 10].

The first and the most vital step for analyzing PAS volume consists of segmentation which   enables the delineation and separation of the airway space from the rest of the scan, thereby, allowing 3D visualization and quantification. Over the past decade, various upper airway segmentation tools and algorithms have been developed which are either manual, semiautomatic or automatic in nature [11]. Although manual segmentation does offer the most accurate replication of the anatomical structure and is considered as a gold standard, nevertheless, it is time-consuming and labor-intensive. Various threshold-based semiautomatic software packages have been validated for volumetric assessment as well, where the user defines a volume of interest (VOI) and the software automatically combines the gray threshold values in that region without efficiently considering the image intensity and anatomical variations,  thereby manual post-processing is required for corrections [12]. Similarly, some studies have suggested a fixed threshold value for segmenting PAS [13,14], which might vary depending on the CBCT device, scanning parameters, machine calibration and the noise caused by the patient movement or metal artefacts [15,16]. Studies have also proposed fully automatic sophisticated and hybrid image segmentation algorithms for the segmentation of the PAS. However they offer limited value based on either insufficient accuracy, fixed thresholding, localization of seed points through manual intervention, manual VOI selection, dependency on image orientation or failure of the algorithm across variable scanning parameters [[12][17][18][19]].

Recently, deep learning convolutional neural networks (CNNs), a multilayer structure-learning algorithm has gained much attention in the dentomaxillofacial field, allowing processing of the data that mimics a brain functioning through neural networks and automatic learning of complex data [20]. Hence, providing with  an accurate voxel-wise segmentation. The CNNs have been successfully applied to segment medical image data from different acquisition devices [21]. Nevertheless, lack of evidence exists related to the automatic CT/CBCT based segmentation of the PAS by applying a deep learning based CNN model. This study lays the groundwork to overcome the shortcomings associated with previous methods for potential future investigations. Therefore, the aim of the study was to propose and investigate the performance of a deep learning based 3D CNN model for PAS segmentation from CT/CBCT images.

Section snippets

Materials and methods

This study was conducted in compliance with the World Medical Association Declaration of Helsinki on medical research. Ethical approval was obtained from the Local Ethical Review Board and all data were anonymized.

Statistical analysis

The data were analyzed with IBM SPSS Statistics for Windows, version 21.0 (IBM Corp., Armonk, NY, USA). Mean and standard deviations were calculated for each evaluation metric and error of manual segmentation,. Intra-class correlation coefficient (ICC) was applied at a 95% confidence interval for calculating the inter- and intra-observer reliability of the manual segmentation. The performance metrics for each imaging modality were compared using one-way ANOVA followed by Tukey's multiple


The average time required for the data processing and segmentation with the CNN model was 14.5 ± 2.3 s and 70.0 ± 26.9 min when applying manual segmentation irrespective of the imaging modality. The ICC for both inter- and intra-observer manual segmentation was excellent. The intra-observer ICC was found to be 0.998 (95% confidence interval: 0.991–0.999) with a mean error of 0.06±0.26mm, whereas the inter-observer reliability was 0.996 (95% confidence interval: 0.986–0.999) with a  mean error


In dentomaxillofacial surgery, the most important aspect from the patient treatment planning, follow-up evaluation and research perspective is to achieve a precise and fast segmentation of the anatomical structures. Manual delineation  still remains the gold standard for an accurate PAS segmentation followed by semi- and fully automatic software programs, as currently no other alternative exists offering comparable accuracy to that of manual delineation by an expert [24]. Some studies have


We established a self-constrained deep learning based 3D U-Net model with encouraging performance for the detection and segmentation of PAS, thus providing clinicians and researchers with a time-efficient and labor-free method which can further increase their diagnostic efficiency. Future studies should investigate the PAS segmentation with a larger training set  offering anatomical variability  and to compare the performance of different CNN architectures.

Credit authorship contribution statement

Sohaib Shujaat: Conceptualization, Methodology, Software, Investigation, Formal analysis, Visualization, Investigation, Writing – original draft, Writing – review & editing. Omid Jazil: Validation, Investigation, Software, Formal analysis. Holger Willems: Conceptualization, Methodology, Investigation, Software, Writing – review & editing. Adriaan Van Gerven: Conceptualization, Methodology, Investigation, Software, Writing – review & editing. Eman Shaheen: Conceptualization, Supervision, Writing

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References (36)

Would you like to learn more?

Feel free to schedule a meeting with us.