Skip to main content

The value of deep neural networks in the pathological classification of thyroid tumors



To explore the distinguishing diagnostic value and clinical application potential of deep neural networks (DNN) for pathological images of thyroid tumors.


A total of 799 pathological thyroid images of 559 patients with thyroid tumors were retrospectively analyzed. The pathological types included papillary thyroid carcinoma (PTC), medullary thyroid carcinoma (MTC), follicular thyroid carcinoma (FTC), adenomatous goiter, adenoma, and normal thyroid gland. The dataset was divided into a training set and a test set. Resnet50, Resnext50, EfficientNet, and Densenet121 were trained using the training set data and tested with the test set data to determine the diagnostic efficiency of different pathology types and to further analyze the causes of misdiagnosis.


The recall, precision, negative predictive value (NPV), accuracy, specificity, and F1 scores of the four models ranged from 33.33% to 100.00%. The area under curve (AUC) ranged from 0.822 to 0.994, and the Kappa coefficient ranged from 0.7508 to 0.7713. However, the performance of diagnosing FTC, adenoma, and adenomatous goiter was slightly inferior to other types of pathological tissues.


The DNN model achieved satisfactory results in the task of classifying thyroid tumors by learning thyroid pathology images. These results indicate the potential of the DNN model for the efficient diagnosis of thyroid tumor histopathology.


As the incidence of thyroid tumors is increasing year by year, it is extremely important to accurately diagnose their pathological types. The significant increase in the number of patients makes the doctor’s work burden heavier and work efficiency reduced. Common malignant thyroid tumors include PTC, MTC, and FTC, and benign nodules include adenomatous goiter and adenoma. All the above pathological tissues have varying degrees of similarity [1]. Once misdiagnosed, it will affect the subsequent treatment plan of patients [2]. Therefore, how to improve the efficiency of differential diagnosis of thyroid tumors has become a hot spot for current research.

The gold standard for thyroid tumor diagnosis remains pathology, but the method continues to face many challenges: (1) It takes years and months to train a good pathologist and cannot meet the rapid increase in surgical workload; (2) The varying levels of competence among pathologists and the uneven diagnostic accuracy; (3) The heavy workload can cause physician fatigue and increase the probability of misdiagnosis. Artificial Intelligence (AI) techniques has become the most eye-catching research hotspot in the field of science and technology, and AI software developed in large numbers in recent years has played an increasingly significant role in medical treatment. A large number of studies have now confirmed that AI can effectively address the above-mentioned problems. DNN models are good at learning intrinsic rules from large amounts of data. The application of high-efficiency DNN has become one of the important ways to solve the heavy clinical work. In particular, the rapid development of DNN models and their successful application in clinical settings have proven the ability to efficiently diagnose pathologies [3, 4] and improve the situation of misdiagnosis due to insufficient knowledge and fatigue of pathologists, playing an increasingly prominent role in medical care.

In summary, this study used DNN models represented by Resnet50, Resnext50, EfficientNet, and Densenet121 to diagnose different types of thyroid tumors, to analyze the causes of misdiagnosis of different pathological tissues, and finally to analyze whether DNN models have the potential to efficiently diagnose thyroid tumor pathology.

Materials and methods

Patients and data

The data for this paper were obtained from patients who underwent surgical treatment for thyroid nodules from July 2014 to August 2022. Inclusion criteria were: (1) Patients who underwent initial surgical treatment for thyroid nodules; and (2) Patients with clear pathology of thyroid nodules. Exclusion criteria were (1) Patients had no postoperative thyroid nodule pathology or with unclear pathology; (2) Patients had received 131I treatment; (3) Patients had received anti-tumor therapy. In total, there were 559 patients, including 381 PTC, 38 MTC, 41 FTC, 40 adenomatous goiter, and 59 adenoma. One or two pathology images were taken from each patients histopathology. 799 pathological images were collected, including 426 PTC, 40 MTC, 41 FTC, 44 adenomatous goiter, 59 adenoma, and 189 normal thyroid (189 cases were randomly selected from the above patients, and images of paracancerous tissue were collected as normal thyroid).

A total of 799 hematoxylin eosin (HE) stained pathological sections were used in this study. Specimens obtained by surgery or puncture were fixed in 4% neutral formaldehyde solution, dehydrated, paraffin-embedded, and stained with HE at a thickness of 4 μm in all patients. The Leica ASP300S fully enclosed tissue dehydrator was used for the dehydration process, and the Leica Auto Stainer XL automatic stainer was used for the staining process.

All pathological specimens were observed under a Leica DM4000B LED smart biomicroscope, and two highly qualified pathologists selected the area of interest and performed pathological diagnosis, selecting paracancerous tissue as normal thyroid tissue. The pathology images were captured manually and directly under the microscope using a Leica DFC495 microscope camera. Images that were controversial among physicians were excluded, and all pathology images were classified, as shown in Fig. 1. The acquired images were in TIF format and the average size of each image was 2500 pixels × 3200 pixels. The above instruments were manufactured by Leica Microsystems (Shanghai) Trading Co.

Fig. 1
figure 1

Pathological images of the thyroid gland. a PTC; b MTC; c Adenomatous goiter; d Adenoma; e FTC; f Normal thyroid

Data augmentation

In order to achieve satisfactory classification results, data expansion of the original images is required, so data augmentation is performed on the pathology image data. We increase the amount of training data by random flip (50% probability of horizontal flip), random rotation (-10°-10°), random scaling (100%-110%), and random brightness enhancement (0–20%) on the images. For each image, only one of the four transformations is randomly applied.

Network architecture


In traditional CNN networks, the problem of gradient disappearance becomes more and more serious as the depth of the network deepens. The structure of the Densenet121 model mainly consists of multiple dense blocks, and each dense block is processed using ordinary convolutional layers between them. The dense block is composed of multiple convolutional blocks, each of which in turn consists of convolutional kernels, as in Fig. 2. And each dense block takes a skip connection between them, that is, the output of the previous dense block is directly passed to the output of the latter dense block, i.e., \({x}_{0}\), \({x}_{1}\), …, \({x}_{l-1}\). The output \({x}_{l}\) is obtained through the composite function \({H}_{l}\). This network structure effectively achieves dimensionality reduction and reduces the parameter computation with the following equation.

$$x_l=H_l(\lbrack x_0,\;x_1,\;...,\;x_{l-1}\rbrack)$$
Fig. 2
figure 2

Structure diagram of Densenet121


Resnet50 is obtained by modifying the VGG19 network, and the model incorporates a residual block through a shortcut mechanism (as in Fig. 3). The main function of the residual block is to create a shortcut between the input and the output, making it possible to train the network by learning only the upper part of the learning residuals instead of learning the whole process. This not only saves the transmission time of information from the input to the output, but also reduces the learning difficulty of the neural network. ResNet50 contains 49 convolutional layers and one fully connected layer, where ID BLOCK x2 in the second to the fifth stage represents two residual blocks without changing the size, and the structure is shown in Fig. 4. CONV is the convolution layer of the convolution operation, Batch Norm means the batch normalization process, Relu is the activation function, MAX POOL denotes the maximum pooling operation, AVG POOL indicates the global average pooling layer operation, and stage1 to stage5 represents the residual blocks. After continuous convolution operation of the residual blocks, the result is output by the Softmax classifier.

Fig. 3
figure 3

Structure of Resnet residual block

Fig. 4
figure 4

Structure of Resnet50


ResNext is a special kind of residual network, which is a combination of a ResNet network and an Inception network. Its network block structure is composed of the simplified Inception structure block plus the shortcut of ResNet, which can guarantee the performance of the network while reducing the hyperparameters of the neural network. The structure is shown in Fig. 2. The essence of ResNeXt is group convolution, where the number of groups is controlled by the cardinality of variables, and the blocks of the original ResNet three-layer convolution are replaced by a parallel stack of blocks with the same topology. The network is designed to depart from the fixed mindset of improving network performance by deepening and widening the network hierarchy, and increases the number of paths with the same topology to perform group convolution using a split-transform-merge strategy in a simple and scalable manner. ResNext networks have shown remarkable results in applications for various computer vision tasks. The formula is as follows (Fig. 5).

$$Y=\sum _{i=1}^{C}Ti\left(x\right)$$
Fig. 5
figure 5

Structure diagram of Resnext50

where \(C\) denotes the cardinality, indicating the number of branches with the same topology in a module, and \(Ti\left( x\right)\) represents the transformation of each branch with the same topology.


EfficientNet is a new lightweight network developed by Google Research using the search technology of neural network architecture. It optimizes the three dimensions of network depth, number of channels, and resolution of input images by a fixed scale factor, which provides powerful performance of easy deployment, easy training, and high accuracy. EfficientNet is a stack of Mobile Inverted Bottleneck Convolution (MBConv), and each MBConv module contains an SE module. The SE module is a two-dimensional global pooling operation for the feature map. It transforms the high-dimensional global feature map into a low-dimensional feature vector by a compression operation to extract the channel-level global features, and then performs a nonlinear feature transformation using a multilayer perceptron (Fig. 6).

Fig. 6
figure 6

Structure of EfficientNet

Model training and testing

The pathology images were divided in the ratio of training set + validation set: test set = 7:3. The pathological images from the test set are diagnosed by training Resnet50, Resnext50, EfficientNet, and Densenet121 using the pathological images from the training set.

Statistical analysis

Statistical analysis and processing were performed using SPSS 20.0 software. The receiver operating characteristic (ROC) curve was plotted accordingly to the model. The true positives (TP), false positives (FP), true negatives (TN), and false negatives (FN) were counted. And their corresponding performance metrics - recall, precision, NPV, accuracy, specificity, F1 score, Kappa coefficient, and AUC - were calculated to evaluate the diagnostic performance among the models.


Resnet50 correctly classified 206 images from the test set. The results misclassified 37 images, which contained 2 PTC images, 5 FTC, 10 adenoma, 6 adenomatous goiter, and 14 MTC. The specific classification results and related performance indicators are detailed in Tables 1, 2, and Fig. 7.

Table 1 Confusion matrix of Resnet50 classification results
Table 2 Performance indicators of the Resnet50 classification
Fig. 7
figure 7

ROC curve of Resnet50

Densenet121 correctly classified 212 images from the test set. It misclassified 31 images, which contained 8 PTC images, 4 FTC, 9 adenoma, 6 adenomatous goiter, and 4 MTC. The specific classification results and related performance metrics are detailed in Tables 3, 4, and Fig. 8.

Table 3 Confusion matrix of Densenet121 classification results
Table 4 Performance indicators for the Densenet121 classification
Fig. 8
figure 8

ROC curve of Densenet121

EfficientNet correctly classified 208 images from the test set. It misclassified 35 images, which contained 8 PTC images, 1 normal thyroid, 7 FTC, 8 adenoma, 6 adenomatous goiter, and 5 MTC. The specific classification results and related performance indicators are detailed in Tables 5, 6, and Fig. 9.

Table 5 Confusion matrix of EfficientNet classification results
Table 6 Performance indicators of EfficientNet classification
Fig. 9
figure 9

ROC curve of EfficientNet

Resnext50 correctly classified 202 images from the test set. It misclassified 41, which contained 7 PTC images, 4 normal thyroid, 3 FTC, 11 adenoma, 5 adenomatous goiter, and 11 MTC. The specific classification results and related performance indicators are detailed in Tables 7, 8, and Fig. 10.

Table 7 Confusion matrix of Resnext50 classification results
Table 8 Performance metrics of Resnext50 classification
Fig. 10
figure 10

ROC curve of Resnext50


Resnet50, Resnext50, EfficientNet, and Densenet121 all had a high diagnostic performance. The AUC ranged from 0.822 to 0.994. The NPV, accuracy and specificity of the above four models for the diagnosis of 6 kinds of pathological images ranged from 88.52% to 100.00%, showing a stable performance. The study confirmed that the DNN model achieved satisfactory results in identifying pathological findings of thyroid tumors with a high accuracy rate. The analysis of misdiagnosed pathologies showed that the DNN model was slightly inferior to other pathological types in terms of performance in diagnosing FTC, adenoma, and adenomatous goiter. And the recall, precision and F1 score of DNN models for the diagnosis of the above three pathological images ranged from 44.44% to 80.00%. The results indicate that the DNN models has the ability to diagnose thyroid tumor pathology efficiently, but it was still insufficient in the diagnosis of FTC, adenoma and adenomatous goiter.


With the progress and development of science and technology, AI is becoming more and more perfect day by day. Especially in the medical field, great achievements have been made. Convolutional Neural Networks (CNN) is a class of neural networks that can perform convolutional computation with in-depth structure, and is one of the representative algorithms of DNN [5, 6]. CNN has been the core algorithm in image recognition technology and has a better performance with a large amount of data training [7]. Using this technique, images can be directly utilized in learning without the need for specialized feature extraction prior to learning, which is a breakthrough in the functionality of image recognition and classification [8]. CNNs can build a hierarchical classifier to handle a large number of image classification tasks, and CNNs can also extract features of images for other classifiers to learn for fine-grained classification [9]. In fine-grained classification, different parts of the image for feature extraction can be artificially fed into a convolutional neural network separately or can be extracted by the CNN itself through unsupervised learning [10]. Resnet50, Resnext50, EfficientNet, and Densenet121 used in this study are all CNN models, and numerous studies have confirmed their ability to efficiently identify tumor pathology images. However, CNNs are still less studied in the field of thyroid tumors, and the current studies have shown satisfactory results. For example, Wang et al. [2] also used the VGG-19 and Inception-ResNet-v2 models to classify a variety of thyroid tumor pathologies with an accuracy of 88.33% to 100%. Li et al. [11] used InceptionV3, VGG16BN, and Resnet50 to predict intraoperative frozen section pathology of thyroid nodules and correctly predicted 95.3% of benign nodules and 96.7% of malignant nodules. Other similar studies using DNN to diagnose tumor pathology have shown high accuracy rates. The accuracy of the model used in this study for diagnosing thyroid tumor pathology ranged from 92.18% to 97.53%. Similar to the results of the majority of previous studies, the final results all show that the DNN model has a high application in the classification of thyroid tumor pathological images.

The learning effect of the DNN model depends on the number and quality of images. Due to the limited number of patients eligible for enrollment, we provided more data through data augmentation, which allowed the DNN model to learn more features. In the present study, the diagnosis of FTC, adenomatous goiter, and adenoma was relatively unsatisfactory. Image analysis of misclassified pathological images and a review of the literature showed that both FTC and PTC are derived from follicular epithelial cells and have similar pathological manifestations [12], and some FTC also have papillary structures. Thus, DNN models can easily confuse them [11, 13]. Meanwhile, some MTCs have a follicular arrangement of some tumor cells, which can also lead to misdiagnosis of FTC [14]. Most of the misdiagnosed pathological images of FTC in this study were misclassified as PTC and MTC. There may be enlargement and fusion of follicles within the adenoma, forming a cystic structure [15], while PTC forms a cystic structure in some cases [16]. This results in misdiagnosis between the two. Moreover, the epithelial cell morphology of adenoma and adenomatous goiter are very similar and often appear as nodular changes under the microscope [17]. Therefore, some of the adenomas in this study were easily misdiagnosed as PTC and adenomatous goiter. Nodular-like changes are seen microscopically in adenomatous goiter. The DNN model may misclassify normal thyroid tissue when it has a large follicular structure [18, 19]. Therefore, adenomatous goiter in this study was not only easily misdiagnosed as adenoma but also partially misdiagnosed as the normal thyroid gland.

A complete pathological section includes tumor tissue, normal thyroid tissue, follicular cells, blood vessels, muscle, etc. [14]. Moreover, the differences in preparation methods and imaging equipment lead to variable representation of tissue images [20]. The pathological images used in this study were carefully selected by pathologists. The diagnosis was clear, and the DNN model performed well in diagnosing such images, but the limitation of this approach is that it limits the DNN model for atypical pathology images. We plan to include more atypical pathological pictures and collect radiomics data and metabolomics data to build models in future studies. In conclusion, this study confirms that the DNN model has high performance in the pathological diagnosis of thyroid tumors and fully demonstrates its potential in clinical applications.

Availability of data and materials

Not applicable.


  1. Ancker OV, Krüger M, Wehland M, et al. Multikinase inhibitor treatment in thyroid Cancer[J]. Int J Mol Sci. 2020;21(1):10.

  2. Wang Y, Guan Q, Lao I, et al. Using deep convolutional neural networks for multi-classification of thyroid tumor by histopathology: a large-scale pilot study. Ann Transl Med. 2019;7(18):468.

    Article  PubMed  PubMed Central  Google Scholar 

  3. He L, Long LR, Antani S et al. Histology image analysis for carcinoma detection and grading[J]. Comput Methods Programs Biomed 2012,107(3):538–56.

  4. Barker J, Hoogi A, Depeursinge A, et al. Automated classification of brain tumor type in whole-slide digital pathology images using local representative tiles. Med Image Anal. 2016;30:60–71.

    Article  PubMed  Google Scholar 

  5. Stabinger S, Peer D, Rodríguez-Sánchez A. Arguments for the unsuitability of convolutional neural networks for non-local tasks. Neural Netw. 2021;142:171–9.

    Article  PubMed  Google Scholar 

  6. Wang J, Hu X. Convolutional Neural Networks With Gated Recurrent Connections. IEEE Trans Pattern Anal Mach Intell. 2022;44(7):3421–35.

  7. Kudriavtseva P, Kashkinov M, Kertész-Farkas A. Deep convolutional neural networks help scoring tandem mass spectrometry data in database-searching approaches. J Proteome Res. 2021;20(10):4708–17.

    Article  CAS  PubMed  Google Scholar 

  8. Saha P, Dash S, Mukhopadhya S. Physics-incorporated convolutional recurrent neural networks for source identification and forecasting of dynamical systems. Neural Netw. 2021;144:359–71.

    Article  PubMed  Google Scholar 

  9. Görmez Y, Sabzekar M, Aydın Z. IGPRED: Combination of convolutional neural and graph convolutional networks for protein secondary structure prediction. Proteins. 2021;89(10):1277–88.

    Article  PubMed  Google Scholar 

  10. Lu Y, Lu G, Li J, et al. Multiscale conditional regularization for convolutional neural Networks. IEEE Trans Cybernetics. 2022;52(1):444–58.

    Article  Google Scholar 

  11. Li Y, Chen P, Li Z, et al. Rule-based automatic diagnosis of thyroid nodules from intraoperative frozen sections using deep learning. Artif Intell Med. 2020;108:101918.

    Article  PubMed  PubMed Central  Google Scholar 

  12. Dov D, Kovalsky SZ, Assaad S, et al. Weakly supervised instance learning for thyroid malignancy prediction from whole slide cytopathology images[J]. Med Image Anal. 2021;67:101814.

  13. Halicek M, Dormer JD, Little JV, et al. Tumor detection of the thyroid and salivary glands using hyperspectral imaging and deep learning. Biomedical Opt Express. 2020;11(3):1383.

    Article  CAS  Google Scholar 

  14. Yoon J, Lee E, Koo JS, et al. Artificial intelligence to predict the BRAFV600E mutation in patients with thyroid cancer. PLoS One. 2020;15(11):e242806.

    Article  Google Scholar 

  15. Stojsavljevic A, Rovcanin B, Jagodic J, et al. Alteration of trace elements in multinodular goiter, thyroid adenoma, and thyroid cancer. Biol Trace Elem Res. 2021;199(11):4055–65.

    Article  CAS  PubMed  Google Scholar 

  16. Sutherland R, Tsang V, Clifton Bligh RJ, et al. Papillary thyroid microcarcinoma: is active surveillance always enough? Clin Endocrinol (Oxf). 2021;95(6):811–7.

    Article  PubMed  Google Scholar 

  17. Reiners C, Drozd VM, Editorial Differentiated thyroid Cancer - risk adapted therapy, genetic profiling and clinical Staging. Front Endocrinol (Lausanne). 2021;12:755323.

  18. Tao T, Gang Y, Ji S, et al. Giant cervical goiter in Hashimoto’s thyroiditis: a case report. J Int Med Res. 2022;50(5):665809989.

    Article  Google Scholar 

  19. Zuo T, Gao Z, Chen Z, et al. Surgical Management of 48 patients with retrosternal goiter and tracheal stenosis: a retrospective clinical study from a single surgical center. Med Sci Monit. 2022;28:e936637.

    Article  PubMed  PubMed Central  Google Scholar 

  20. Xin C, Xie J, Fan H, et al. Association between serum cystatin C and thyroid diseases: a systematic review and meta-analysis. Front Endocrinol (Lausanne). 2021;12:766516.

    Article  PubMed  PubMed Central  Google Scholar 

Download references


Not applicable.


Construction project of Shanghai Key Laboratory of Molecular Imaging(18DZ2260400).

Author information

Authors and Affiliations



Chengwen Deng and Dan Li conceptualized the study, drafted, wrote, reviewed, and modified the manuscript and collected and analyzed the data. Ming Feng and Dongyan Han collected the data and reviewed the manuscript. Ming Feng assisted in data collation and discussion. Qingqing Huang provided guidance and revised and critically reviewed the manuscript. All the authors have reviewed the manuscript. The author(s) read and approved the final manuscript.

Corresponding author

Correspondence to Qingqing Huang.

Ethics declarations

Ethics approval and consent to participate

All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki Declaration and its later amendments or comparable ethical standards. This study is a retrospective study and received exemption from the institution’s review board.

Consent for publication

Written informed consent was obtained from all patients.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Deng, C., Li, D., Feng, M. et al. The value of deep neural networks in the pathological classification of thyroid tumors. Diagn Pathol 18, 95 (2023).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: