A Quantitative Association Model for Male Fetal Y Chromosome Concentration Based on Random Forest Regression and SHAP Interpretability

Authors

  • Tong Yu School of Economics, Northeastern University at Qinhuangdao, Qinhuangdao, China, 066004
  • ShanRen Xiong School of Control Engineering, Northeastern University at Qinhuangdao, Qinhuangdao, China, 066004
  • Rui Shen School of Economics, Northeastern University at Qinhuangdao, Qinhuangdao, China, 066004

DOI:

https://doi.org/10.62051/c9kk6k39

Keywords:

Random forest regression; Y chromosome concentration; SHAP interpretability.

Abstract

This study aims to clarify the quantitative association patterns between maternal biological indicators and fetal Y chromosome concentration in male fetuses. First, comprehensive preprocessing was performed on 549 male fetal detection datasets, including key missing value deletion, Z-score outlier handling, and construction of derived features such as BMI grouping and gestational age segmentation, to ensure data quality and applicability for modeling. Spearman's rank correlation analysis confirmed a significant monotonically negative relationship between maternal BMI and Y chromosome concentration. Subsequently, a random forest regression model was established to capture the nonlinear relationship between features and Y concentration. Model parameters were optimized via grid search and 5-fold cross-validation, ultimately determining the number of decision trees T=800. On the test set, the model demonstrated excellent performance with a coefficient of determination R² and a mean absolute error MAE of 0.0183 ng/mL. Further SHAP interpretability analysis quantified feature contributions, revealing that X chromosome concentration, number of blood draws, and Y chromosome Z-score were the top three factors influencing Y concentration prediction. Additionally, SHAP dependency plots clearly demonstrated a sharp increase in the positive contribution of gestational age during weeks 12–15, while high BMI systematically reduced Y concentration.

Downloads

Download data is not yet available.

References

[1] Veuskens J R B ,Rossum V M ,Cattenstart E , et al. Common haplotypes within the chromosome 1q31.3 region determine systemic concentrations of the entire complement factor H protein family.[J].Journal of innate immunity,2025,21-26.DOI:10.1159/000545342.

[2] Kumar A ,Kumar S . The Role of X and Y Chromosomes in Semen Morphology and Concentration: A Study in Saran, Bihar, India[J]. Journal of Advances in Biology & Biotechnology,2025,28(3):515-523.DOI:10.9734/JABB/2025/V28I32111.

[3] Xiao W ,Akao S ,Okamoto R , et al. The formation of aggregated chromatin/chromosomes in mouse oocytes treated with high concentration of IBMX as a model for a chromosome transfer in human. [J].Systems biology in reproductive medicine,2024,70(1):195-203.

[4] Soberanis C F ,Simpson L E ,Beckett J A , et al. Near millimolar concentration of nucleosomes in mitotic chromosomes from late prometaphase into anaphase.[J].The Journal of cell biology,2024,223(11): DOI:10.1083/JCB.202403165.

[5] Çift A ,Benlioğlu C ,Yücel Ö M , et al. A New Sperm Concentration Threshold for Y Chromosome Microdeletion Analysis in Infertile Men: Could It Be Azoopermia? [J].Urology research & practice,2024,50(3):181-186.DOI:10.5152/TUD.2024.24061.

[6] Delinassios G J ,Hoffman M R ,Koumakis G , et al. Sub-toxic cisplatin concentrations induce extensive chromosomal, nuclear and nucleolar abnormalities associated with high malignancy before acquired resistance develops: Implications for clinical caution.[J].PloS one,2024,19(12):e0311976.DOI:10.1371/JOURNAL. PONE.0311976.

[7] Dwi R ,Sofiati P ,Agesti V S , et al. Preliminary study of chromosome aberrations using Giemsa, two-colour fish, and micronucleus assays in lymphocytes of individuals living in elevated radon concentration areas.[J].Radiation protection dosimetry,2023,199(14):1508-1515.

[8] Feng Y ,C F E S ,Hao L , et al. Antifungal Tolerance and Resistance Emerge at Distinct Drug Concentrations and Rely upon Different Aneuploid Chromosomes.[J].mBio,2023,14(2):e0022723-e0022723.

[9] Jung S L ,Akhil K ,Z K Y , et al. Concentration of non-myocyte proteins in arterial media of cerebral autosomal dominant arteriopathy with subcortical infarcts and leukoencephalopathy.[J].PloS one,2023,18(2):e0281094-e0281094.

[10] Son T T ,Ngoc N N ,Hien L T T , et al. Screening Y Chromosome Microdeletion in 1121 Men with Low Sperm Concentration and the Outcomes of Microdissection Testicular Sperm Extraction (mTESE) for Sperm Retrieval from Azoospermic Patients.[J].The application of clinical genetics,2023,16155-164.DOI:10.2147/TACG.S420030.

Downloads

Published

09-04-2026

How to Cite

Yu, T., Xiong, S., & Shen, R. (2026). A Quantitative Association Model for Male Fetal Y Chromosome Concentration Based on Random Forest Regression and SHAP Interpretability. Transactions on Computer Science and Intelligent Systems Research, 12, 157-168. https://doi.org/10.62051/c9kk6k39