Clinical Evidence
Full virtual patient generated by artificial intelligence-driven integrated segmentation of craniomaxillofacial structures from CBCT images

Full virtual patient generated by artificial intelligence-driven integrated segmentation of craniomaxillofacial structures from CBCT images


  • The study compared automated and semiautomated approaches for integrated segmentation of craniomaxillofacial structures.
  • Minimal standard deviation values across all performance metrics substantiate a high reliability of the automated approach.
  • Accurate and consistent AI-driven virtual patient could potentially enhance the efficacy of digital workflows.



To assess the performance, time-efficiency, and consistency of a convolutional neural network (CNN) based automated approach for integrated segmentation of craniomaxillofacial structures compared with semi-automated method for creating a virtual patient using cone beam computed tomography (CBCT) scans.


Thirty CBCT scans were selected. Six craniomaxillofacial structures, encompassing the maxillofacial complex bones, maxillary sinus, dentition, mandible, mandibular canal, and pharyngeal airway space, were segmented on these scans using semi-automated and composite of previously validated CNN-based automated segmentation techniques for individual structures. A qualitative assessment of the automated segmentation revealed the need for minor refinements, which were manually corrected. These refined segmentations served as a reference for comparing semi-automated and automated integrated segmentations.


The majority of minor adjustments with the automated approach involved under-segmentation of sinus mucosal thickening and regions with reduced bone thickness within the maxillofacial complex. The automated and the semi-automated approaches required an average time of 1.1 min and 48.4 min, respectively. The automated method demonstrated a greater degree of similarity (99.6 %) to the reference than the semi-automated approach (88.3 %). The standard deviation values for all metrics with the automated approach were low, indicating a high consistency.


The CNN-driven integrated segmentation approach proved to be accurate, time-efficient, and consistent for creating a CBCT-derived virtual patient through simultaneous segmentation of craniomaxillofacial structures.

Clinical relevance

The creation of a virtual orofacial patient using an automated approach could potentially transform personalized digital workflows. This advancement could be particularly beneficial for treatment planning in a variety of dental and maxillofacial specialties.


Computer-generated 3D imaging

Artificial intelligence

Computer neural networks

Cone-beam computed tomography

1. Introduction

A ‘virtual patient’ is defined as the amalgamation of various digital data pertaining to a single patient, which culminates in a comprehensive knowledge entity. This entity facilitates rehabilitation through a digital treatment plan and digital simulations [1]. In the field of dentistry, research on constructing virtual patients has primarily focused on identifying structures and biomechanical parameters [2]. To encapsulate the entire process, a digital dental workflow consists of several phases: imaging, diagnosis, planning, transfer, and follow-up. The segmentation method is pivotal in the initial stage of this workflow, where an accurate delineation of anatomical structures is crucial for identifying and constituting a complete virtual patient [3].

When considering the segmentation process, automation can mitigate the inconsistencies inherent in manual methodologies, leading to a more efficient process that aligns with daily clinical practice. The majority of current software programs provide options for manual or semi-automated segmentation, which has spurred a surge in research on artificial intelligence (AI)-based automated techniques for anatomical structures segmentation. Recently, significant progress has been made for the automated segmentation of individual structures in the dentomaxillofacial region, such as bones, sinuses, teeth, nerves, and pharyngeal airway space on cone-beam computed tomography (CBCT) scans using AI-driven convolutional neural networks (CNNs) [4], [5], [6], [7], [8], [9]. Specifically, 3D U-Net type CNNs [10], which feature convolution layers with an encoding contracting path and a symmetrical decoding expanding path, have proven to be most effective for performing automated segmentation.

In order to provide a comprehensive view of the patient's anatomy, it is crucial to not only segment dentomaxillofacial structures individually but also collectively. This simultaneous segmentation could allow dental clinicians to evaluate the interrelation between these structures, thereby enhancing their diagnostic and treatment decision-making capabilities in various scenarios. For instance, understanding the anatomical relationship between the posterior teeth and maxillary sinus is critical to avoid complications during dental procedures such as apical surgery, tooth extraction, endodontic treatment, and implant placement [6,11]. Therefore, an automated integrated segmentation that can generate a virtual patient would significantly enhance personalized care in dentistry.

The main limitation shared by semi-automated segmentation processes include manual selection of threshold values which vary for each anatomical region based on structural density and manual corrections are often required [12]. In this context, a fully automated segmentation of craniomaxillofacial structures has demonstrated satisfactory performance results on CT images [13]. However, it is essential to assess the feasibility of automated integrated segmentation craniomaxillofacial structures on CBCT scans, which are predominantly used in dentistry and possess specific characteristics and limitations related to their contrast resolution and noise. While the performance of semi-automated and automated methods has been validated for individual structures, a lack of evidence exists for integrated segmentation. Moreover, such an integration cannot be performed simultaneously by the semi-automated methods, necessitating additional steps in the digital workflow which can add to the cumulative error.

Therefore, the present study aimed to evaluate the performance, time-efficiency, and consistency of an automated CNN-based integrated segmentation of CBCT-derived craniomaxillofacial structures in comparison with the semi-automated approach. We hypothesize that an automated integrated virtual model can be obtained with comparable accuracy to that of the semi-automated approach. However, it is expected to offer improved consistency and a considerable increase in time efficiency.

2. Materials and methods

This study received approval from the Research Ethics Committee of the University Hospitals Leuven (reference number: S65708) and was conducted in accordance with the principles of the Declaration of Helsinki on medical research.

2.1. Dataset

A virtual patient was created by integrated segmentation of six craniomaxillofacial structures: maxillofacial complex bones (palatine, maxillary, zygomatic, nasal, and lacrimal), maxillary sinus, upper and lower dentition, mandible, mandibular canal, and pharyngeal airway space. The sample size was determined based on the test set from previous comparable studies [14,15], using a priori power analysis with a power of 80 % and a significance level of 5 %. A paired t-test was assumed to compare the performance of the intervention groups (automated and semi-automated) in generating the integrated segmentation. The central limit theorem was followed to achieve a normal distribution in the sample [16]. Hence, a dataset of 30 scans (903 teeth, 60 maxillary sinuses, 30 maxillofacial complexes, 30 mandibles, 60 mandibular canals, 30 pharyngeal airway spaces) was selected. The dataset was distributed between two CBCT devices, Accuitomo 3D (n = 15) and Newtom VGi evo (n = 15), with varying scanning parameters (Table 1).

Table 1. CBCT scanning parameters of dataset.

Empty CellkVpmAVoxel size (mm)Field of view- Diameter x Height (cm)Newtom VGi evo (Cefla, Imola, Italy)1106–120.25; 0.324×193D Accuitomo 170 (J. Morita, Kyoto, Japan)9050.2; 0.2517×12; 14×10; 10×10

kVp: kilovoltage peak; mA: milliampere.

The scans that fulfilled the eligibility criteria were obtained from patients with a gender distribution of 30 % males and 70 % females (average age: 22.1 years). The study involved CBCT scans from patients having permanent dentition with (56.7 %) or without (43.3 %) coronal and/or root fillings. In addition, both completely dentulous (43.3 %) and partially edentulous upper and lower jaws (56.7 %) were included, ensuring that the maxillary posterior teeth were not in close proximity or in contact with the floor of maxillary sinus. The selected scans had fields of view that could encompass either all or most of the craniomaxillofacial structures of interest. Furthermore, any scan that had not been previously utilized in training the specific CNNs were included. The exclusion criteria comprised of images from patients with malformations, pathologies, post-orthognathic plates, implants, and edentulous areas in close proximity to the sinus floor, all of which could potentially interfere with the segmentation task.

2.2. Semi-automated and automated segmentation of integrated structures

Semi-automated segmentation was performed by an expert, using the Mimics Innovation Suite software (version 23.0, Materialise N.V., Leuven, Belgium) with thresholding ranging from 400 to 600 Hounsfield Units (H.U.) up to the maximum value for maxillofacial complex, mandible, and dentition; −1024 to 200 HU for maxillary sinuses; −1024 to 100 HU for pharyngeal airway space. These values were based on thresholds used in previous studies [4,6,9,17], but slight variations were made according to the need of each scan based on different acquisition parameters.

Following the creation of segmentation masks through thresholding, each structure was isolated from its surroundings using edit tools. These masks were then converted into three-dimensional objects in Standard Triangle Language (STL) format and merged using boolean operations tools to achieve an integrated segmentation. The segmentation of mandibular canal could not be produced through thresholding, so this structure was omitted from the comparison of semi-automated and automated approaches. However, it was included in obtaining the performance and inter-observer consistency of the integrated automated segmentation against the refined reference.

For the purpose of achieving integrated automated segmentation of the craniomaxillofacial structures, the same expert uploaded CBCT images in Digital Imaging and Communications in Medicine (DICOM) format, to a cloud-based online platform known as virtual patient creator (, Relu BV, Version March 2022). In this platform, various 3D U-Net CNNs were amalgamated to operate concurrently as a single unit. The structure of the CNNs is shown in Fig. 1, along with the stages of the integration process, which begins with pre-processing the CBCT images so that they have a lower resolution (Fig. 1.A) and serve as input for the previously validated networks. In this CNN, the convolution layers stand out, with a coding contraction path and a symmetrical decoding expansion path, forming a U-shaped architecture (Fig. 1.B).

Fig 1
Fig. 1. The integration process through the six 3D-Unet networks operated simultaneously to generate the automated integrated segmentation. (A) Pre-processing stage, in which the images are converted to a lower resolution to serve as input into the 3D-UNets; (B) In each network, the gray boxes correspond to a multi-channel feature map, and the white boxes copied feature maps. The gray arrows represent the convolution operations in the analysis path. The coding contraction pathway is represented by red arrows, and the decoding expansion pathway by blue arrows. The orange arrow corresponds to the residual skip connections; (C) The outputs for each pipeline are automatically converted to separate 3D.stl files and put together in a single segmentation map. (For interpretation of the references to color in this figure, the reader is referred to the web version of this article).Download : Download high-res image (533KB)Download : Download full-size image

The cloud-based platform [15] enables the user to segment distinct structures, which include mandible, dentition, maxillofacial complex, pharyngeal airway, maxillary sinus, and mandibular canal. Each selected structure initiates a unique AI pipeline, as outlined in earlier validation studies [4], [5], [6], [7], [8], [9]. Highlighting the training process of previously validated networks for individual structure segmentation, each model underwent evaluation on a distinct validation set. Thus, the model with the best performance from this validation set was chosen, determined by binary-cross entropy loss.

The outputs from these pipelines are automatically transformed into individual 3D models in STL format. These are then combined into a single segmentation map (Fig. 1.C), allowing for the visualization of the class for each volume voxel. Both three-dimensional integrated segmentation objects, produced by semi-automated and automated methods, were exported in STL format.

2.3. Qualitative assessment of automated and refined segmentation

The qualitative analysis of the integrated segmented structures was conducted by visually inspecting their corresponding colors in the orthogonal planes of CBCT images (Fig. 2) in a quiet, dimly lit environment. An expert identified the minor adjustments necessary for each structure's segmentation and recorded them in an Excel sheet. If under or over-segmentation was observed in all three orthogonal planes (axial, coronal, and sagittal) across more than three consecutive reconstructions, it was deemed a refinement. After a month, the same expert reassessed 30 % of the integrated automated segmentation to establish intra-rater agreement. Subsequently, two dentomaxillofacial radiologists, each with a minimum of five years of experience in three-dimensional image evaluation, independently implemented the refinements identified in the qualitative assessment across the entire dataset.

Fig 2
Fig. 2. 3D "Full Virtual Patient" and its integrated automated segmentation on axial, coronal and sagittal reconstructions in the "Virtual Patient Creator" (, Relu BV, Version March 2022).Download : Download full-size imageDownload : Download high-res image (522KB)

It is worth noting that individual validation studies for each structure demonstrated high accuracy, necessitating minor over- or under-segmentation adjustments in each structure [4], [5], [6]. Therefore, this qualitative analysis aimed to identify the primary types of minor refinements and their frequencies, and to facilitate the generation of refined segmentations as a reference for the quantitative analysis of semi-automated and automated integrated segmentation.

2.4. Quantitative assessment

2.4.1. Timing

In terms of the time required for integrated automated segmentation, the algorithm directly logged the duration necessary to construct a full-resolution segmentation. As for semi-automated integrated segmentation, the time was measured from the moment of DICOM data import into the software until the final STL was exported. Lastly, for the refined segmentation reference, the duration of automated segmentation was combined with the average time taken for manual corrections performed by two dentomaxillofacial radiologists.

2.4.2. Evaluation metrics of semi-automated and automated segmentation

The performance of integrated segmentation based on semi-automated and automated approaches was assessed, for which automated integrated segmentation after being refined by the experts was used as a reference.

The evaluation of segmentation similarity was conducted using the dice similarity coefficient (DSC), 95 % Hausdorff distance (HD), and root mean square (RMS). These metrics were computed by superimposing the three-dimensional objects derived from the segmentation maps produced by each type of integrated segmentation onto the reference. The DSC metric implies that a value closer to 1 indicates a higher degree of comparability between the superimposed objects. Conversely, 95 % HD and RMS metrics, which are discrepancy metrics, suggest that a value closer to zero represents a higher similarity between the objects. These two metrics provide complementary information as the 95 % HD is utilized to mitigate the influence of potential outliers. The performance of CNN models in segmenting the entire virtual patient was determined using a specific expression where ‘x’ represents the comparison metric of interest, such as DSC. The dentition metric was defined as the average of all individual tooth types.

2.4.3. Inter-rater consistency of refined segmentations

Segmentations corresponding to the structures that required refinement were combined to ensure inter-rater consistency. This was evaluated by superimposing the STL files generated from the segmentations carried out by each expert.

2.5. Statistical analysis

Data were analyzed using IBM SPSS software version (Armonk, NY). For qualitative analysis, the weighted Kappa test with a 95 % confidence interval, was utilized to determine the intra-rater agreement between the number of minor refinements identified by each integrated automated segmentation. In relation to the quantitative analysis, mean and standard deviation of all evaluation metrics were calculated. A paired t-test was conducted to compare the evaluation metrics of automated and semi-automated approaches.

3. Results

3.1. Qualitative assessment

When considering all 30 integrated automated segmentations, the need for refinements ranged from 1 to 13, culminating in a total of 193 refinements. On average, each integrated segmentation required 6.43 minor refinements. The distribution of minor refinements is described in Table 2, where maxillary sinuses showed the highest required refinements (42.5 %) followed by maxillofacial complex (38.4 %), teeth (9.8 %), and pharyngeal airway space (8.3 %). Both mandible and mandibular canal required minimal refinements (0.5 % each). Out of 193 refinements, 148 were categorized as under-segmentation and 45 as over-segmentation. The teeth, maxillary sinuses, and oropharyngeal space predominantly exhibited under-segmentation. In contrast, the bones of the maxillofacial complex displayed a balanced rate of under and over-segmentation. The weighted Kappa test yielded an intra-rater agreement of 0.855 with a 95 % confidence interval from 0.746 to 0.965, indicating a strong agreement according to the McHugh classification.

Table 2. Frequency and description of refinements detected based on individual anatomical structure and integrated automated segmentation.

StructureRefinement descriptionRefinement typeFrequency by structureFrequency by IASMaxillary sinusesMucosal thickening, and air voidsUnder97.6 %42.5 %Overextension in ethmoidal air sinusOver2.4 %Maxillo-facial complexBone discontinuities in maxillary sinuses walls and septa, around palatine foramina.Under50.0 %38.4 %Partially closed infraorbital and palatine foramina, and nasopalatine canal.Over50.0 %DentitionSmall missing parts in the tooth contourUnder84.2 %9.8 %Small extra parts overextended the tooth contourOver15.8 %Pharyngeal airway spaceAir voids inside the pharyngeal airway spaceUnder93.8 %8.3%Overextension to the soft tissueOver6.2 %MandibleNot foundUnderN/A0.5%Small extra part overextended its contourOver100 %Mandibular canalNot foundUnderN/A0.5 %Small extra part overextended its contourOver100 %

Under: under segmentation; Over: over segmentation; IAS: Integrated automated segmentation.

The analysis of specific locations and types of refinement most frequently detected in craniomaxillofacial structures is depicted in Fig. 3. The most common areas of bone discontinuity were found around the regions of palatine foramina, medial wall of maxillary sinus, incisive canal, and foramina within the maxillofacial complex. The teeth were most often under-segmented at their root level and apex, while the errors in the pharyngeal airway space were primarily found in the middle region (oropharynx). Fig. 4 provides a visual representation of the most common refinements in the four most representative structures [18].

Fig 3
Download : Download high-res image (437KB)Download : Download full-size imageFig. 3. Graph with the frequency of the refinement's location by structure. Each bar chart considers only the refinements associated with the different descriptions of each row.
Fig 4
Fig. 4. CBCT reconstructions illustrating the refinements most frequently detected in the qualitative assessment: (A) Under-segmentation on tooth root contour; (B) under-segmentation associated with the superior wall of the maxillary sinus; (C) under-segmentation associated with the lateral wall of the maxillary sinus; (D) under-segmentation around palatine foramina; (E) over-segmentation related to the nasopalatine canal location; (F) under-segmentation at the middle part (oropharynx) of the pharyngeal airway space.Download : Download high-res image (591KB)Download : Download full-size image

3.2. Quantitative assessment

3.2.1. Timing

The average duration for achieving integrated automated segmentation for a single patient was approximately 1 min and 8 s (1.1 min). In contrast, the semi-automated segmentation process through thresholding took an average of 48 min and 22 s (48.4 min). The average time for refined segmentation was calculated by adding the average time per case between the two experts (16.8 min) to the average time of automated segmentation (1.1 min), resulting in a total of 17 min and 54 s (17.9 min). The time per case varied from 4 to 37.5 min, and after adding their respective AI times, the refined segmentation ranged from 5.3 to 38.1 min.

The first observer required an average of 2 min and 24 s (2.4 min), with times ranging from 48 s (0.8 min) to 4 min and 36 s (4.6 min). The second observer required an average of 2 min and 54 s (2.9 min), with times ranging from 48 s (0.8 min) to 4 min. Thus, the average time needed by both radiologists for refinement was approximately 2.7 min. The average times and range values are detailed in Table 3.

Table 3. Time (minutes) required for achieving one integrated segmentation.

TimeSemi-automatedAutomatedRefined automatedAverage48.41.117.9Minimum40.00.65.3Maximum55.01.438.1

3.2.2. Performance and inter-rater consistency

Table 4 presents the results of the evaluation metrics. The superimposed STLs demonstrated a remarkable overlap between the automated and refined automated segmentations, with a DSC of 99.6 %, indicating that only minor adjustments were necessary. The 95 % HD and RMS values of 0.012 mm and 0.067 mm, respectively, also indicated a minimal difference between the integrated automated segmentation and reference standard.

Table 4. Metric's results for evaluating performance of semi-automated and automated segmentation and inter-observer consistency of refined segmentation.

MetricDAPerformance SA (Thresholding vs Refined)Performance automated (AI vs Refined)Performance automated (AI vs Refined) including MCInter-rater Consistency (Refined1 vs Refined2)DSCMean0.8830.9960.9960.997SD0.1250.0570.0560.009Min0.8110.9420.9430.981Max0.9360.9990.9990.99995 % HD (mm)Mean2.7950.0120.0120.060SD1.4910.0950.0940.221Min2.0900.0000.0000.000Max3.5810.0950.0940.221RMS (mm)Mean1.2990.0670.0670.098SD0.7120.1630.1630.335Min0.9950.0220.0210.022Max1.7070.1850.1840.357

DA: Descriptive analysis; SA: Semi-automated; SD: Standard deviation, DSC: Dice Similarity Coefficient, 95 % HD: Hausdorff Distance, and RMS: root mean square.

The semi-automated thresholding segmentation yielded a DSC of 88.3 %. Moreover, the 95 % HD and RMS metrics showed values not very close to zero, with discrepancies of 2.795 mm and 1.299 mm, respectively. A high standard deviation value was observed across all metrics evaluated for the semi-automated approach. The paired t-test revealed significant differences in the values of the three metrics between the automated and semi-automated methods (p<0.001). Fig. 5 illustrates the three-dimensional appearance of the three STLs generated by semi-automated (Fig. 5.A) and automated (Fig. 5.B) approaches from the same CBCT scan.

Fig 5
Download : Download full-size imageDownload : Download high-res image (914KB)Fig. 5. 3D Full virtual patient generated by semi-automated segmentation thresholding-based (A) and by integrated CNN models (B) in frontal lateral and back views.

The consistency between the refinements showed a DSC of 99.7 %, along with detected values very close to zero for the 95 % HD and RMS, which were 0.060 mm and 0.098 mm, respectively. The standard deviation values ranged from 0.009 to 0.335 across all evaluated metrics, suggesting a high consistency between the two experts. Additionally, the automated segmentation results that included the mandibular canal showed a 99.6 % similarity with the reference, a 95 % HD of 0.012 mm, and an RMS of 0.067 mm, with small variations in the standard deviation values compared to the metrics that did not include the mandibular canal.

4. Discussion

The need for simultaneous segmentation of anatomical structures of varying densities is crucial for integrating digital data from a single patient, offering the dentist a more comprehensive view. However, employing a semi-automated method for this task can complicate the digital workflow due to the increased steps involved and risk of observer variability. Therefore, this study assessed the efficiency of an integrated automated segmentation approach using previously validated models of craniomaxillofacial structures. The results demonstrated a significantly improved performance compared to semi-automated segmentation in terms of time efficiency and evaluation metrics. Additionally, there was a strong agreement amongst raters in relation to consistency, further supporting the creation of a comprehensive virtual patient.

Based on the qualitative assessment of integrated automated segmentation, it was demonstrated that an observer is more likely to identify the need for minor corrections due to either over- or under-segmentation of the anatomical structures. The under-segmentation in the maxillary sinuses primarily occurred along the thin contours of its walls, particularly in regions with mucosal thickening. It is crucial to note that the results regarding specific location frequencies are directly related to the sample characteristics and do not include specific pathologies associated with the sinus that could affect the particular wall exhibiting under-segmentation. In terms of the maxillofacial complex bones, unsatisfactory segmentation occurred around the palatal canals and thin walls of the maxillary sinuses. Indeed, maxillofacial bones exhibit high anatomical complexity and reduced bone thickness in specific areas [4], especially in the aforementioned areas, making them more susceptible to under-segmentation. At the same instance, over-segmentation was also noticed in small open spaces such as nasopalatine canals, and palatine and infraorbital foramina. As for tooth segmentation, isolated under-segmentations were primarily associated with tooth roots and apices. This could be attributed to the low signal-to-noise ratio of CBCT images and presence of filling artifacts, which can result in a lack of definition at the edges of tooth roots [19]. As for the pharyngeal airway space, under-segmentation was predominantly located in its middle part, which could be explained by its extension to the soft palate. However, it is important to highlight that these findings represent minor refinements and a clinician has to determine whether these refinements are necessary based on the treatment plan required for each patient.

In the process of integrating automated segmentations, the segmentation occurred concurrently, which resulted in no additional time requirement compared to the outcomes seen in the validation of individual models. Moreover, there was no time loss as the average duration for the current evaluation was 1.1 min, while the cumulative time for previously validated single models was 2.2 min (maxillofacial complex: 39.1 s, maxillary sinus: 24.4 s, dentition: 13.7 s; mandible: 17 s; mandibular canal: 21.3 s; pharyngeal airway space: 14.5 s) [4], [5], [6], [7], [8], [9]. However, maintaining a constant AI processing time is a challenge due to technical variations, including processes not initiated by users [20]. This could have potentially lead to the time variation among the thirty cases, which ranged from 37 s to 1.4 min.

The automated integration method demonstrated significantly higher time efficiency compared to the semi-automated segmentation. It is important to highlight that the existing semi-automated software programs do not support simultaneous integrated segmentation. In scenarios where segmentation of multiple combined structures is required, the expert has to manually set a distinct threshold for each structure and subsequently modify the generated mask, which often does not restrict itself to a specific anatomical structure. This limitation makes the digital workflow both time-consuming and impractical. As for the integrated automated segmentation, even if post-segmentation refinements are deemed necessary, the time saved per case (30.5 min) in comparison to the semi-automated method is unquestionably substantial.

The quantitative performance of both the integrated automated and semi-automated segmentation was assessed by comparing their STL outputs with those produced by the automated segmentation following post-expert refinements. The automated segmentation demonstrated superior accuracy, as observed by its higher degree of congruence with the reference standard, achieving a DSC of 99.6 %. This was further corroborated by a 95 % HD metric discrepancy of 0.012 mm and RMS value of 0.067 mm, emphasizing the resemblance between the two integrated segmentations. Previous validation studies of the frameworks employed in this integration, which also assessed the similarity between automated and refined segmentations, reported DSCs ranging from 99.6 % to 99.8 % [4,6]. Our findings are consistent with a study that compared multiclass and binary segmentations of the jaw and teeth and found similar high performance [13]. In the aforementioned study, a multiclass segmentation of the maxilla, mandible, and teeth was performed on CBCT images, which is analogous to the non-simultaneous binary segmentation of these structures. In contrast, our study introduced an integrated and automated segmentation of structures with significantly different densities, such as the maxilla and maxillary sinuses, demonstrating high performance and compatibility with routine clinical settings. This innovation is crucial for constructing a more comprehensive and personalized virtual patient through an optimized digital workflow.

Conversely, the degree of similarity between the integrated semi-automatic segmentation and refined reference standard was found to be lower, with a DSC of 88.3 % and a larger discrepancy 2.795 mm (95 %HD) and 1.299 mm (RMS). This performance can be attributed to the fact that the primary software used for this task has been fine-tuned based on CT scan characteristics, which are not directly transferable to CBCT scans due to the non-calibration of Hounsfield units (HU), low contrast resolution and the fact that the estimated grayscale density values are influenced by the presence of artefacts [21], which is considered the main constraint for the use of semi-automated segmentation by thresholding on CBCT scans [22]. While these differences between the automated and semi-automated integration performances were anticipated and hypothesized, the variation in standard deviation values across all performance metrics emphasize the consistency achieved when segmenting craniomaxillofacial structures of varying densities by AI-based CNN models. The automated segmentation of mandibular canal also showed an excellent performance. Contrary to the fact that the segmentation of mandibular canal cannot be achieved by semi-automated segmentation by thresholding, the integrated automated segmentation by incorporating CNNs represents a significant advancement for creating a virtual patient.

In addition to the consistency of the method, the inter-rater consistency showed a very high similarity in the overlap of the refined structures, with a DSC of 99.7 %. This finding not only confirms that detected refinements were minimal but also indicates that the integrated model can provide an automated foundation for users with varying levels of expertise to perform segmentation. Given the high inter-observer variability reported with manual and semi-automated segmentation techniques, automation of integration appears to be crucial.

The simultaneous segmentation of structures closely associated with teeth, such as the mandibular canal, paves the way for enhanced diagnosis and treatment planning for various surgical procedures, such as implant placement, bone grafting, orthognathic surgery, and general tooth extraction [8]. The segmentation of pharyngeal airway space is crucial in diagnosing, treating, and monitoring patients with dental-skeletal deformities and obstructive sleep apnea. This makes automated segmentation indispensable to measure changes in airway dimensions following orthognathic surgical procedures [23]. Moreover, precise segmentation of teeth and bones, including the mandible and all bones of the maxillofacial complex, is a fundamental step for 3D printing of personalized models used in clinical orthodontics, implant rehabilitation, and maxillofacial reconstructive surgery [4].

The automated segmentation produced by combining various CNNs has a limitation concerning the generalizability of its results. Despite selecting CBCT images from different devices and acquisition parameters, it is not possible to generalize the current results to a CBCT dataset originating from different devices with varying acquisition parameters. Even with the use of multiple devices, it is known that the contrast resolution, noise, and artifact expression in the presence of restorative filling materials vary across different CBCT scanners, which could negatively affect the generation of segmentation maps [24]. This underscores the need for future studies to seek increased sample heterogeneity, either through multicenter studies or by using entirely external datasets for greater generalizability [25].

Moreover, the results of this study should be cautiously interpreted as they may not be universally applicable to CBCT scans of patients who did not meet the eligibility criteria of this study. Consequently, the efficacy demonstrated does not encompass patients with primary teeth, mixed dentition or dental implants. Nonetheless, considering the continual advancements and enhanced reliability of networks, a better generalizability for the segmentation of variable anatomical and pathological structures is to be expected in near future. Additionally, there is potential for augmenting the tool's capabilities by integration intraoral and facial scan data, thereby creating a more comprehensive and personalized virtual patient model.

In light of the increasing utilization of U-Net and 3D U-Net networks in the field of medical image segmentation [26], their recent application in CBCT image processing suggests a need for future studies to evaluate their performance in comparison with other networks. To summarize, the creation of a comprehensive virtual patient through high-performance, automated segmentation represents a significant progression in personalized digital workflows, which could act as a practical tool in a routine clinical practice.

5. Conclusion

The CNN models proved to be accurate, time-efficient, and consistent for the integrated simultaneous segmentation of maxillofacial complex bones and mandible, maxillary sinus, dentition, mandibular canal, and the pharyngeal airway space.

CRediT authorship contribution statement

Fernanda Nogueira-Reis: Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Validation, Visualization, Writing – original draft, Writing – review & editing, Funding acquisition, Project administration. Nermin Morgan: Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Software, Validation, Writing – review & editing. Isti Rahayu Suryani: Data curation, Investigation, Software, Writing – review & editing. Cinthia Pereira Machado Tabchoury: Conceptualization, Supervision, Writing – review & editing. Reinhilde Jacobs: Conceptualization, Resources, Supervision, Writing – review & editing.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.


This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior – Brasil (CAPES) – Finance Code 001.

Would you like to learn more?

Feel free to schedule a meeting with us.