
, Thanh Dat Nguyen2,3,*
, Phu Qui Le Nguyen1
, Phuong Thi Bui4
, Minh Nam Nguyen1,2
1Faculty of Medicine, University of Health Sciences, Thu Duc District, Ho Chi Minh City, Vietnam
2Vietnam National University Ho Chi Minh City, Thu Duc District, Ho Chi Minh City, Vietnam
3Research Center for Genetics and Reproductive Health (CGRH), University of Health Sciences, Thu Duc District, Ho Chi Minh City, Vietnam
4Faculty of Pharmacy, University of Health Sciences, Thu Duc District, Ho Chi Minh City, Vietnam
© 2026 Yeungnam University College of Medicine, Yeungnam University Institute of Medical Science
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Conflicts of interest
All authors declare no conflict of interest related to this study.
Funding
This research was sponsored by Vietnam National University Ho Chi Minh City (VNU-HCMC) under project code C2024-44-27.
Author Contributions
Conceptualization, Methodology: MNN, TTN; Data curation: TTN, TDN, PTB; Formal analysis: TTN, TDN, LPQN; Funding acquisition: TTN, MNN, TDN, PTB; Investigation: MNN, TDN, PTB; Supervision: MNN; Visualization: TDN, LPQN; Writing-original draft: TDN; Writing-review & editing: all authors.
The GSE14520 cohort was used for model training, whereas all the other datasets were used for independent validation. Internal validation of the GSE14520 cohort was performed using a nested 10-fold cross-validation framework to ensure unbiased performance estimation. Metrics are reported as mean±standard deviation across the 10 folds. The GSE84005 cohort consisted of paired tumor and adjacent nontumor tissues, which may have contributed to the observed perfect classification performance.
AUC, area under the curve; F1, F1-score.
| Dataset | No. of cases/controls | AUC | Accuracy | Sensitivity | Specificity | F1 |
|---|---|---|---|---|---|---|
| GSE14520 | 488 (247/241) | 0.988±0.012 | 0.971±0.028 | 0.976±0.028 | 0.967±0.047 | 0.972±0.026 |
| GSE25097 | 557 (268/289) | 0.942 | 0.890 | 0.888 | 0.892 | 0.893 |
| GSE45436 | 134 (95/39) | 0.992 | 0.985 | 0.989 | 0.974 | 0.989 |
| GSE102079 | 257 (152/105) | 0.971 | 0.926 | 0.908 | 0.952 | 0.936 |
| GSE121248 | 107 (70/37) | 0.947 | 0.944 | 0.943 | 0.946 | 0.957 |
| GSE84005 | 76 (38/38) | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |
| GSE49515 | 20 (10/10) | 0.910 | 0.950 | 0.900 | 1.000 | 0.947 |
The GSE14520 cohort was used for model training, whereas all the other datasets were used for independent validation. Internal validation of the GSE14520 cohort was performed using a nested 10-fold cross-validation framework to ensure unbiased performance estimation. Metrics are reported as mean±standard deviation across the 10 folds. The GSE84005 cohort consisted of paired tumor and adjacent nontumor tissues, which may have contributed to the observed perfect classification performance. AUC, area under the curve; F1, F1-score.