Skip to main content

Machine learning for classification of cutaneous sebaceous neoplasms: implementing decision tree model using cytological and architectural features



This observational study aims to describe and compare histopathological, architectural, and nuclear characteristics of sebaceous lesions and utilized these characteristics to develop a predictive classification approach using machine learning algorithms.


This cross-sectional study was conducted on Iranian patients with sebaceous tumors from two hospitals between March 2015 and March 2019. Pathology slides were reviewed by two pathologists and the architectural and cytological attributes were recorded. Multiple decision tree models were trained using 5-fold cross validation to determine the most important predictor variables and to develop a simple prediction model.


This study assessed the characteristics of 123 sebaceous tumors. Histopathological findings, including pagetoid appearance, neurovascular invasion, atypical mitosis, extensive necrotic area, poor cell differentiation, and non-lobular tumor growth pattern, as well as nuclear features, including highly irregular nuclear contour, and large nuclear size were exclusively observed in carcinomatous tumors. Among non-carcinomatous lesions, some sebaceoma and sebaceous adenoma cases had features like high mitotic activity, which can be misleading and complicate diagnosis. Based on multiple decision tree models, the five most critical variables for lesion categorization were identified as: basaloid cell count, peripheral basaloid cell layers, tumor margin, nuclear size, and chromatin.


This study implemented a machine learning modeling approach to help optimally categorize sebaceous lesions based on architectural and nuclear features. However, studies of larger sample sizes are needed to ensure the accuracy of our suggested predictive model.


Sebaceous glands are usually partnered with a hair follicle to form pilosebaceous units, which are widely distributed across the skin. Primarily, these holocrine glands secrete a yellowish, waxy substance called sebum [1]. Few sebaceous glands are also present in hairless regions of skin, such as Meibomian glands in the tarsal region, Fordyce spots in buccal skin, and vermilion of the lip, Montgomery tubules in the areolae, and Tyson glands in prepuce and labia minora [2,3,4]. The sebaceous glands consist of secretory lobules composed of sebaceous gland cells (sebocytes) and a short tubular squamous duct [5]. At the gland’s outer layer, sebocytes form a layer of undifferentiated germinal cells, which grow toward the center and gradually differentiate into mature sebocytes [6]. As these cells differentiate, their cytoplasm is loaded with lipid vacuoles while other organelles get compressed, and their nucleus gets distorted [5].

There are a limited number of skin lesions with primarily sebaceous origins, namely sebaceous hyperplasia, sebaceous adenoma, sebaceoma, and sebaceous carcinoma [7]. With around one in every four adults, sebaceous hyperplasia is the most prevalent sebaceous lesion; however, it is not commonly considered a true sebaceous neoplasm [7, 8]. Sebaceous adenoma and sebaceoma are benign neoplasms that often develop as yellowish papules on the forehead and cheeks [9]. Sebaceous carcinoma is the only malignant lesion in this list which is commonly divided into periocular and extraocular subtypes and has a rare prevalence of around 0.5 to 2 cases in a million [10, 11]. These lesions can develop independently or be associated with Muir-Torre syndrome [8, 12]. This syndrome is defined by the presence of sebaceous gland tumors or keratoacanthoma that are associated with visceral malignant diseases [13].

Differentiation of benign sebaceous lesions from low-grade malignant tumors has remained a challenge. This observational study aims to describe and compare the architectural, cytological, and histopathological characteristics of sebaceous lesions and use these characteristics to develop a predictive classification approach.


Study design and setting

This is a cross-sectional study conducted on patients with sebaceous neoplasms referred to two hospitals associated with Tehran University of Medical Sciences from March 2015 to March 2019. During the study, pathology slides of sebaceous lesions were retrieved and reviewed. Two independent dermatopathologists classified the retrieved slides according to the established diagnostic criteria, and any possible disagreement was resolved by discussion or consulting a senior pathologist [14, 15]. The study was in concordance with the declaration of Helsinki and its later amendments. The ethical committee of Tehran University of Medical Sciences approved the study (registry code: IR.TUMS.MEDICINE.REC.1399.273).


We included patients afflicted with sebaceous neoplasms referred to two university-affiliated hospitals during a five-year period. No age or gender limitations were imposed to include patients. We excluded patients whose data was missing.

Variables, data sources and measurement

Pathology slides of cases were retrieved from the two hospitals by a pathology resident. Two dermatopathologists classified the retrieved slides. Demographic and histopathologic characteristics of cases were recorded into a checklist for descriptive analysis of each classification of sebaceous lesions. Each slide was assessed regarding architectural and cytological attributes. Architectural attributes included the presence of cellular growth pattern, neural and vascular invasion, circumscribed or infiltrative margins, cystic pattern, necrosis, ductal differentiation, squamous differentiation, ulceration, pagetoid spread to the epidermis, basaloid cell count of more than 50%, and basaloid layers count. Following cytological features were also investigated: degree of cellular differentiation, mitotic activity per 10 high power field, atypical mitosis, nuclear contour, chromatin appearance, and presence of nucleoli. In addition, nuclear size was estimated by comparison with adjacent keratinocyte cells.

Statistical methods

The collected data was analyzed using Statistical Package for the Social Sciences (SPSS) version 21 (IBM Corp., Armonk, N.Y., USA). Qualitative variables were reported as frequencies and percentages, and quantitative variables were reported as either median and interquartile range (IQR) or mean and standard deviations (SD). Chi-squared tests were used to compare categorical variables between groups. A decision tree method was used to determine the most important predicting variables and develop a prediction model. We used Python’s Scikitlearn library for this approach, and periocular and extraocular carcinomas were considered the same in this analysis. Multiple models were trained using the ExtraTreesClassifier technique, and the five most predictive variables were identified by averaging the feature importance of each variable across all models. To find a single simple and efficient decision tree, multiple decision tree models with depths of 1 up to 10 were cross-validated on our dataset, and the mean accuracy scores were calculated. Accuracy was calculated as the number of correct predictions divided by all predictions. The best decision tree model with a depth of 2 was also identified and visualized using five-fold cross-validation.


Participants and descriptive data

A total of 123 cases consisting of 52 sebaceous hyperplasias, 15 sebaceomas, 13 sebaceous adenomas, 20 carcinomas extraocular sebaceous, and 23 periocular sebaceous carcinomas were identified to be included in the study. 65.9% of patients were male, and the median age was 63.5 (IQR 20). No significant difference was observed between the mean age of male (62.7, SD 16.5) and female (59.3, SD 19.0) patients (P value 0.33). However, a comparison of mean age between patients with extra- or periocular sebaceous carcinomas (68.8, SD 16.4) and those with benign lesions (56.87, SD 16.5) showed a significant difference (P value 0.00). Age and gender distribution for each lesion are summarized in Table 1. The details of each tumor’s histopathological and nuclear characteristics are presented in Tables 2 and 3. Comparison of characteristics between combined cases of extra- and periocular sebaceous carcinomas and each benign lesion and between sebaceous adenomas and other benign lesions are summarized in Table 4. Here we briefly discuss each lesion.

Table 1 Demographic characteristics of different sebaceous lesions
Table 2 Histopathological characteristics of different sebaceous lesions
Table 3 Nuclear characteristics of different sebaceous lesions
Table 4 Comparison of histological characteristics between carcinomatous tumors and each benign lesion and between sebaceous adenoma and other benign lesions (P values)

Main results

Sebaceous hyperplasia

No atypical appearance, neurovascular invasion, atypical mitosis, necrotic area, or ulceration was observed. All cells were well-differentiated and circumscribed, and the number of peripheral basaloid layers did not exceed two (Fig. 1A). The nuclear contour was smooth, chromatin was fine, and nucleoli were inconspicuous in most cells. Nucleus sizes were less than that of adjacent keratinocytes (Fig. 1B).

Fig. 1
figure 1

 A, Sebaceous hyperplasia reveals well-demarcated sebaceous lobules in low power magnificent. B, Hyperplastic sebaceous lobules reflect the normal sebaceous gland and consist of a maximum of two outer layers of basaloid cells surrounding mature sebaceous cells with eosinophilic bubbly cytoplasm


Cystic appearance and ulceration were present in some, and squamous and ductal differentiation was observed in more than half of the cases (Fig. 2A). The majority of the cells were moderately differentiated and circumscribed, and more than two peripheral basaloid layers were seen in all of these cases. The nuclear contour was mostly smooth, and at least one nucleolus was present in most cells (Fig. 2B). Chromatin was fine or coarse in most cases, and nucleus size was less than twice of keratinocytes.

Fig. 2
figure 2

 A, Sebaceomas are well- circumscribed with conspicuous cyst formation. B, Mature sebocytes are mixed with basaloid cells in a high-power view

Sebaceous adenoma

Ulceration was observed in half of the cases, and some cases demonstrated cystic appearance and ductal differentiation. Most cases were well-differentiated and circumscribed without any necrotic regions (Fig. 3A). A variable number of peripheral basaloid cell layers was observed. No nuclear contour irregularity was seen in any of the cases; however, coarse and clumpy chromatin and prominent nucleoli were observed in several cases. Nucleus sizes were less than twice those of keratinocytes (Fig. 3B).

Fig. 3
figure 3

 A, Sebaceous adenoma with sharply circumscribed sebaceous lobules contiguous with the epidermis, surrounded by a compressed pseudo capsule of dermal stroma. B, Higher power of a sebaceous adenoma reveals an expansion of germinative basaloid cell layers at periphery, germinative cells populace, with centrally located mature sebaceous cyst

Extraocular sebaceous carcinoma

Ductal differentiation and ulceration were prominently present, and atypical mitosis and necrotic areas were observed in several cases (Fig. 4A). Tumor margins were mostly infiltrative, and no evidence of well differentiation was observed in any of the cases. Most cases had more than two layers of peripheral basaloid cells. Vesicular chromatin was present in more than half of the cases, and nuclear contour irregularity was prevalent. Most cells had at least one prominent nucleolus, and their nucleus sizes were greater than keratinocytes (Fig. 4B).

Fig. 4
figure 4

 A, Poorly differentiated extraocular sebaceous carcinoma with comedo necrosis. B, Tumoral cells show scant cytoplasmic vacuolation, marked atypical mitoses and nuclear polymorphism in high magnificent

Periocular sebaceous carcinoma

Atypical mitosis and ulceration were prevalently observed, and squamous differentiation and cystic and pagetoid appearances were seen in many cases (Fig. 5B). The majority of cases had infiltrative margins and moderate/poor cell differentiation (Fig. 5A). More than two peripheral basaloid cell layers were present in all cases. Nucleoli were observed in most of the cells.

Fig. 5
figure 5

 A, Periocular sebaceous carcinoma with infiltrative pattern of tumoral cells in the eye lid desmoplastic stroma. B, Pagetoid invasion of sebaceous gland carcinoma in the epidermis of eyelid

Prediction model

The five most important predictive variables included: basaloid cell count, peripheral basaloid cell layers, tumor margin, nuclear size, and chromatin. The mean accuracy after cross-validation for multiple models with depths ranging from 1 to 10 is summarized in Fig. 6. Details of the best prediction model with a depth of 2 are depicted in Fig. 7.

Fig. 6
figure 6

The mean accuracy of multiple decision tree models with depths ranging from 1 to 10

Fig. 7
figure 7

The most accurate decision tree prediction model with a depth of 2


This study describes and compares detailed nuclear, cytological, and architectural characteristics of sebaceous tumors and highlights the distinctive nuclear features of sebaceous neoplasms.

There is a debate on the distinction of sebaceous adenomas from carcinomas regarding current evidence. The currently available grading criteria are predominantly based on architectural features [7]. Only a fair to moderate degree of interobserver agreement among the specialized dermatopathologists has been reported [16]. While some authors have proposed that sebaceous adenomas should be considered malignant due to mitotic features, nuclear crowding, disorganized arrangement of mature and immature cells, and pleomorphism [17,18,19,20], the majority of experts categorize them as benign lesions [21]. Moreover, the distinction of sebaceoma from sebaceous carcinoma is generally based on invasive growth and pleomorphism of the latter. However, similar to the issue with sebaceous adenomas, the classification of a minor group of lesions that show well circumscription with a discordant degree of atypia remains controversial [22]. Our study suggests that the role of nuclear features should be taken into account for a more accurate diagnosis of cutaneous sebaceous neoplasm. The most predominant pleomorphic features observed in our study were the enlarged nuclear size, prominent nucleoli, and coarse chromatin present in sebaceomas, sebaceous adenomas, and carcinomas. Multiple nucleoli were primarily observed in malignant carcinomas, but intergroup differences with benign lesions did not reach a level of significance. Nuclear contour irregularity was only observed in carcinomas and two cases of sebaceoma. Mitotic activity was observed in both benign lesions and carcinomas, but there was a significant difference in favor of higher mitotic activity in malignant cases. Nucleolar and chromatin features between sebaceous adenoma and sebaceoma were statistically closer to sebaceous carcinomas than sebaceous hyperplasia. A statistical similarity was observed between sebaceoma and sebaceous carcinomas regarding the basaloid cell count. This finding can be explained by the current definition of sebaceoma, which indicates a germinative cell count of more than 50% as a cut-off value for distinguishing sebaceoma from sebaceous adenoma [7]. Ulceration and erosion were seen in nearly half of our cases and reached a significant level. Ulceration has previously been reported in sebaceous adenoma. However, some authors have suggested that it should be considered as a feature of malignancy that prompts careful assessment [17, 19].

Enlarged nuclear and nucleolar sizes, hyperchromasia, mitotic figures, decreased differentiation, and necrosis has been previously reported in malignant sebaceous tumors. Our findings regarding the nuclear contour in sebaceous adenoma and sebaceoma were consistent with the previous studies. However, in contrast to the previous findings, we found coarse chromatin in an increasing number of sebaceoma and sebaceous adenoma cases [8, 19, 21,22,23]. The observed pleomorphic features in both benign and malignant lesions of sebaceous glands, the relatively high interobserver variability [16], and the necessity of accurate diagnosis underline the importance of utilizing more reliable criteria to avoid mismanagements and tumor recurrences.

Decision tree algorithm

We proposed a machine-learning-based predictive modeling approach to obtain a descriptive model that classifies cases based on decision rules inferred from the features of a given set of studied variables. Decision trees are powerful classification tools that are easy to interpret and visualize and can handle problems with multiple outputs. However, they must be applied carefully since minor variations in the data might lead to a different classification algorithm. Additionally, a careful choice of parameters in the applied algorithm is necessary to avoid an over-complex model and increase the model’s generalizability. The decision tree algorithm has been implemented in previous studies as well [24,25,26,27]. In our study, the main predictive variables were peripheral basaloid cell layers and count, chromatin characteristics, nuclear size, and tumor margin. Our suggested decision tree model is highly consistent with a recent study that implemented this method to classify sebaceous neoplasms. Nevertheless, we cross-validated our model on five different folds of the dataset to reduce overfitting and mitigate the effects of chance associated with fitting on a single random data splitting [28]. Details of our most accurate model, with an accuracy of 83%, are shown in Fig. 7.


Several limitations of this study require consideration. The small sample size of our study undermines the generalizability of our model. We applied model training with different random train and test subsets of our data and reported the mean obtained accuracy. The possibility of misclassification of the specimens cannot be ruled out due to relatively high interobserver variation in the diagnosis of sebaceous lesions. Two independent specialized dermatopathologists, with guidance from a senior pathologist, reviewed the slides to reduce the interobserver variation effect. Another strength of our study lies in utilizing cross-validation to reduce the overfitting of the model.


The currently used classification criteria of sebaceous neoplasms rely mostly on architectural features and contain many diagnostic gray areas in cases of well-circumscribed architecture. This issue has led to high variability in diagnosis. We described these lesions’ cytological and nuclear features in a more detailed manner. We also implemented a novel modeling approach to help distinguish well-circumscribed lesions more easily. Studies of larger sample sizes are needed to ensure the accuracy of our suggested predictive model. Moreover, understanding the biological basis of these lesions may allow for a better concordant classification system.

Data Availability

The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.


  1. Shamloul G, Khachemoune A. An updated review of the sebaceous gland and its role in health and diseases part 1: Embryology, evolution, structure, and function of sebaceous glands. Dermatol Ther. 2021;34(1):e14695.

    Article  CAS  PubMed  Google Scholar 

  2. Butovich IA. Meibomian glands, meibum, and meibogenesis. Exp Eye Res. 2017;163:2–16.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Lee JH, Lee JH, Kwon NH, Yu DS, Kim GM, Park CJ, et al. Clinicopathologic manifestations of patients with Fordyce’s spots. Ann Dermatol. 2012;24(1):103–6.

    Article  PubMed  PubMed Central  Google Scholar 

  4. Requena L, Sangüeza O. Ectopic sebaceous glands: Fordyce’s spots, Tyson’s glands, and Montgomery’s tubercles. Cutaneous Adnexal Neoplasms: Springer; 2017. pp. 785–92.

    Google Scholar 

  5. Tsatsou F, Zouboulis CC. Anatomy of the sebaceous gland. Pathogenesis and treatment of Acne and Rosacea. Springer; 2014. pp. 27–31.

  6. Xia L, Zouboulis C, Detmar M, Mayer-da-Silva A, Stadler R, Orfanos CE. Isolation of human sebaceous glands and cultivation of sebaceous gland-derived cells as an in vitro model. J Invest dermatology. 1989;93(3):315–21.

    Article  CAS  Google Scholar 

  7. Iacobelli J, Harvey NT, Wood BA. Sebaceous lesions of the skin. Pathology. 2017;49(7):688–97.

    Article  PubMed  Google Scholar 

  8. Flux K. Sebaceous neoplasms. Surg Pathol Clin. 2017;10(2):367–82.

    Article  PubMed  Google Scholar 

  9. Danialan R, Mutyambizi K, Aung PP, Prieto VG, Ivan D. Challenges in the diagnosis of cutaneous adnexal tumours. J Clin Pathol. 2015;68(12):992–1002.

    Article  CAS  PubMed  Google Scholar 

  10. Dasgupta T, Wilson LD, Yu JB. A retrospective review of 1349 cases of sebaceous carcinoma. Cancer. 2009;115(1):158–65.

    Article  PubMed  Google Scholar 

  11. Mulay K, Aggarwal E, White VA. Periocular sebaceous gland carcinoma: a comprehensive review. Saudi J Ophthalmol. 2013;27(3):159–65.

    Article  PubMed  PubMed Central  Google Scholar 

  12. Alsaad KO, Obaidat NA, Ghazarian D. Skin adnexal neoplasms—part 1: an approach to tumours of the pilosebaceous unit. J Clin Pathol. 2007;60(2):129–44.

    Article  CAS  PubMed  Google Scholar 

  13. Ponti G, de Leon MP. Muir-torre syndrome. Lancet Oncol. 2005;6(12):980–7.

    Article  PubMed  Google Scholar 

  14. Patterson JW. Weedon’s Skin Pathology. 5th Edition ed. Philadelphia, PA: Elsevier; 2019 November 19.

  15. Eduardo Calonje J, Lazar TBA. Steven Billings. McKee’s Pathology of the Skin. 5th Edition ed. Edinburgh, Scotland: Elsevier 2018 October 29.

  16. Harvey NT, Budgeon CA, Leecy T, Beer TW, Kattampallil J, Yu L, et al. Interobserver variability in the diagnosis of circumscribed sebaceous neoplasms of the skin. Pathology. 2013;45(6):581–6.

    Article  PubMed  Google Scholar 

  17. Chen S. A different view: sebaceous adenoma is sebaceous carcinoma in situ. Dermatopathol Pract Conceptual. 2010;16:16.

    CAS  Google Scholar 

  18. Komforti MK, Asgari M, Chen S. Sebaceous carcinoma in situ as a concept and diagnostic entity. Dermatol Pract Concept. 2017;7(3):27–31.

    Article  PubMed  PubMed Central  Google Scholar 

  19. Nussen S, Ackerman AB. Sebaceous “adenoma” is sebaceous carcinoma. Dermatopathol Pract Concept. 1998;4:5–14.

    Google Scholar 

  20. Ansai Si. Topics in histopathology of sweat gland and sebaceous neoplasms. J Dermatol. 2017;44(3):315–26.

    Article  PubMed  Google Scholar 

  21. Harvey NT, Tabone T, Erber W, Wood BA. Circumscribed sebaceous neoplasms: a morphological, immunohistochemical and molecular analysis. Pathology. 2016;48(5):454–62.

    Article  PubMed  Google Scholar 

  22. Kazakov DV, Kutzner H, Spagnolo DV, Rütten A, Mukensnabl P, Michal M. Discordant architectural and cytological features in cutaneous sebaceous neoplasms-a classification dilemma: report of 5 cases. Am J Dermatopathol. 2009;31(1):31–6.

    Article  PubMed  Google Scholar 

  23. Rulon DB, Helwig EB. Cutaneous sebaceous neoplasms. Cancer. 1974;33(1):82–102.

    Article  CAS  PubMed  Google Scholar 

  24. Dong F, Li Q, Xu D, Xiu W, Zeng Q, Zhu X, et al. Differentiation between pilocytic astrocytoma and glioblastoma: a decision tree model using contrast-enhanced magnetic resonance imaging-derived quantitative radiomic features. Eur Radiol. 2019;29(8):3968–75.

    Article  PubMed  Google Scholar 

  25. Frings VG, Böer-Auer A, Breuer K. Histomorphology and Immunophenotype of Eczematous skin lesions revisited-skin biopsies are Not Reliable in differentiating allergic contact Dermatitis, Irritant Contact Dermatitis, and atopic dermatitis. Am J Dermatopathol. 2018;40(1):7–16.

    Article  PubMed  Google Scholar 

  26. Payabvash S, Aboian M, Tihan T, Cha S. Machine learning decision Tree Models for differentiation of posterior Fossa Tumors using Diffusion Histogram Analysis and Structural MRI Findings. Front Oncol. 2020;10:71.

    Article  PubMed  PubMed Central  Google Scholar 

  27. Yazdanparast T, Yazdani K, Ahmad Nasrollahi S, Nazari M, Darooei R, Firooz A. Differentiation of inflammatory papulosquamous skin diseases based on skin biophysical and ultrasonographic properties: a decision tree model. Indian J Dermatol Venereol Leprol. 2020;86(6):752.

    Article  PubMed  Google Scholar 

  28. Tirado M, Metze D, Sahlmann J, Böer-Auer A. Cytologic grading of cutaneous sebaceous neoplasms: does it help to Differentiate Benign from Malignant? Am J Dermatopathol. 2019;41(10):722–32.

    Article  PubMed  Google Scholar 

Download references


Not applicable.



Author information

Authors and Affiliations



Data acquisition: A.B., V.A., F.A.A; Conduct: A.B., V.A., F.A.A.; Reporting: V.A., F.A.A, K.K.; Conception and Design: A.B., K.K.; Data analysis: A.A., A.N., A.H.; Data interpretation: A.A., A.N., A.H.; Drafting: A.B., A.A., A.N.; Review: A.H. K.K.; All of the authors approved the final draft.

Corresponding author

Correspondence to Alireza Beikmarzehei.

Ethics declarations

Ethics approval and consent to participate

The study was in concordance with declaration of Helsinki and its later amendments. The ethical committee of Tehran university of medical sciences approved the study (registry code: IR.TUMS.MEDICINE.REC.1399.273).

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kamyab-Hesari, K., Azhari, V., Ahmadzade, A. et al. Machine learning for classification of cutaneous sebaceous neoplasms: implementing decision tree model using cytological and architectural features. Diagn Pathol 18, 89 (2023).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: