![]() |
![]() |
| Clin Exp Reprod Med > Epub ahead of print |
| Application area | Study | Dataset | AI methodology used | Key findings |
|---|---|---|---|---|
| Clinical counseling and outcome prediction | Liao et al. (2020) [7] | Retrospective study of 60,648 IVF-ET cycles | Entropy-based feature discretization, random forest algorithm | The grading system classified infertility into five grades (A–E), with pregnancy rates of 53.82% for grade A and 0.90% for grade E. Cross-validation confirmed 95.94% stability (95% CI, 95.14%–96.74%). |
| Key predictors: age, BMI, FSH, AFC, AMH, number of oocytes, and endometrial thickness | ||||
| McLernon et al. (2022) [8] | Population-based cohort study of 88,614 patients, using SART CORS data | Logistic regression method at two key time points: before starting the first complete IVF cycle and after an unsuccessful first complete IVF cycle | Each additional IVF cycle reduced live birth odds by 42%, while higher AMH and more retrieved eggs were associated with increased success rates. | |
| Key predictors: age, BMI, previous full-term birth, male factor infertility, PCOS, DOR, uterine factor infertility, and unexplained infertility | Women in a second cycle had higher rates of DOR (39%) and PCOS (38%). |
AI, artificial intelligence; IVF-ET, in vitro fertilization and embryo transfer; BMI, body mass index; FSH, follicle-stimulating hormone; AFC, antral follicle count; AMH, anti-Müllerian hormone; CI, confidence interval; SART CORS, Society for Assisted Reproductive Technology Clinical Outcomes Reporting System; PCOS, polycystic ovary syndrome; DOR, diminished ovarian reserve.
| Application area | Study | Dataset | AI methodology used | Key findings |
|---|---|---|---|---|
| IUI | Kozar et al. (2021) [9] | 1,029 IUI procedures in 413 couples, including various stimulation methods | Random forest, artificial neural network, partial least squares regression, support vector machine, and linear models | The best predictive model for IUI outcome was random forest (AUC, 0.66; sensitivity, 0.4; specificity, 0.8). |
| Sperm parameters represented the most significant predictor. | ||||
| Ranjbari et al. (2021) [10] | Reproductive study of 11,255 IUI procedures | Combining complex network-based feature engineering and stacked ensemble | The best predictive model for IUI outcome achieved an AUC of 0.84, with a sensitivity of 0.8, specificity of 0.9, and accuracy of 0.9. | |
| Included the 20 most important features for IUI prediction | The most significant predictors were semen parameters (sperm concentration and motility) and female BMI. | |||
| IVF workflow management | Letterie et al. (2020) [12] | 2,603 IVF cycles (1,853 autologous and 750 donor cycles) | Classification and regression trees, random forests, support vector machines, logistic regression, and neural networks | The algorithm accurately predicted four key clinical decisions, aligning closely with expert decisions: (1) continuing or stopping stimulation (92% accuracy), (2) triggering or canceling retrieval (96%), (3) dose adjustments (82%), and (4) follow-up timing (87%). |
| Robertson et al. (2021) [13] | Retrospective analysis of 9,294 ultrasound scans from 2,322 IVF cycles | Random forest regression and random forest classifier method | Stimulation day 5 was the best day for predicting the trigger date (mean squared error, 2.16±0.12) and over-response (AUC, 0.91±0.01). | |
| Letterie et al. (2022) [14] | 1,591 autologous IVF cycles | Linear regression, random forest, extra tree regression, K-nearest neighbor | The algorithm (1) predicted the optimal monitoring day (mean error, 1.4 days), (2) identified trigger day options within a 3-day retrieval range, with oocyte count variance of 0–3 across these days, and (3) estimated mature oocyte count with a sensitivity of approximately 0.8. |
| Application area | Study | Dataset | AI methodology used | Key findings |
|---|---|---|---|---|
| COS | Correa et al. (2022) [16] | Observational study of 3,487 patients from five centers | Linear regression | The model outperformed clinician-prescribed FSH doses, achieving performance scores of 0.87 vs. 0.83 (p=2.44e-10) in development and 0.89 vs. 0.84 (p=3.81e-05) in validation. |
| Predictor variables: age, BMI, AMH, AFC, and previous live births | ||||
| Murillo et al. (2023) [17] | Retrospective study of 20,081 IVF cycles, using SART CORS | Logistic regression, K-nearest neighbor | When comparing flare and antagonist protocols in poor responders, (1) CLBR was similar in the first cycles (14.2% vs. 13.6%); and (2) CLBR improvement was comparable across protocol transitions. | |
| Hariton et al. (2021) [19] | 7,886 IVF with ICSI cycles | T-learner with bagged decision trees | For trigger timing decisions, physician-model agreement was 53% for 2PNs and 62% for blastocysts. | |
| The algorithm yielded 1.4 more 2PNs and 0.6 more blastocysts per cycle compared to physicians. | ||||
| Fanton et al. (2022) [20] | 30,278 IVF cycles from three centers | Linear regression | When predicting mature oocyte outcomes for triggering today versus tomorrow: (1) early triggers resulted in 2.3 fewer MII oocytes, 1.8 fewer 2PNs, and 1.0 fewer blastocysts compared to optimal triggers; and (2) late triggers led to 2.7 fewer MII oocytes, 2.0 fewer 2PNs, and 0.7 fewer blastocysts than optimal triggers. | |
| Follicular monitoring with 3D ultrasound | Noor et al. (2020) [23] | Randomized controlled trial of 130 IVF-ET recipients | Sonography-based automated volume count | Both methods yielded comparable ART outcomes, including oocyte retrieval, fertilization, cleavage, and pregnancy rates. However, 2D ultrasound required significantly more time (p<0.01). |
| Automated 3D vs. manual 2D ultrasound | ||||
| Liang et al. (2022) [24] | 515 IVF cases | Deep learning-based segmentation algorithm | The optimal follicle volume cut-off for predicting mature oocytes was 0.5 cm3, outperforming conventional 2D ultrasound. | |
| The optimal leading follicle volume for hCG triggering was 3.0 cm3, significantly correlating with more mature oocytes (p=0.01). | ||||
| For predicting ovarian hyper-response, a multi-layer perceptron model achieved higher accuracy (0.890 vs. 0.785) than 2D measurements. |
AI, artificial intelligence; COS, controlled ovarian stimulation; 3D, three-dimensional; BMI, body mass index; AMH, anti-Müllerian hormone; AFC, antral follicle count; FSH, follicle-stimulating hormone; SART CORS, Society for Assisted Reproductive Technology Clinical Outcomes Reporting System; CLBR, clinical live birth rate; IVF, in vitro fertilization; ICSI, intracytoplasmic sperm injection; 2PN, fertilized oocytes; MII, metaphase II; ET, embryo transfer; 2D, two-dimensional; ART, assisted reproductive technology; hCG, human chorionic gonadotropin.
| Application area | Study | Dataset | AI methodology used | Key findings |
|---|---|---|---|---|
| Oocyte assessment | Murria et al. (2023) [25] | Retrospective study of 165 oocyte images from 24 patients | Violet (Future Fertility, Canada), 2D images | Predicted fertilization with 82% accuracy, blastocyst development with 67% accuracy, and live birth potential with 75% accuracy per oocyte cohort |
| Fjeldstad et al. (2022) [26] | Prospective multicenter study of images of 392 oocytes from 46 patients | Magenta (Future Fertility, Canada), 2D images | Magenta oocyte scores correlated significantly with blastocyst formation, with higher-scoring oocytes (7.1–10) developing into blastocysts at a 46.1% rate vs. 26.6% for lower-scoring oocytes (1.0–4.0) (p<0.005). | |
| Fjeldstad et al. (2024) [27] | 37,133 Images of mature oocytes combined with clinical data from patients across eight fertility clinics in six countries | Deep learning | Test dataset: AUC, 0.64; accuracy, 0.60; specificity, 0.55; sensitivity, 0.65; best performance in 38–39-year age group (AUC, 0.68), minimal male factor impact, good clinic generalizability | |
| External validation: AUC, 0.63; accuracy, 0.58; specificity, 0.57; sensitivity, 0.59. Higher scores were correlated with better blastocyst outcomes. | ||||
| Semen analysis | Hicks et al. (2019) [29] | 85 Sperm motility videos from the VISEM dataset | Deep learning (CNNs), linear regression | Deep learning-based prediction was fast and consistent, although adding participant data did not improve performance. |
| Ottl et al. (2022) [30] | 85 Sperm motility videos from the VISEM dataset | Multiple neural networks, support vector regression models | The model can predict the percentage of progressive, non-progressive, and immotile spermatozoa. | |
| The mean absolute error improved from 8.83 to 7.31 compared to previous studies. | ||||
| Javadi et al. (2019) [32] | 1,540 Images from 235 male infertility patients from the MHSMA dataset | Deep learning (CNNs) | The model exhibited high accuracy in identifying morphological deformities in real time. | |
| The model achieved F0.5 scores of 84.74% (acrosome), 83.86% (head), and 94.65% (vacuole). | ||||
| Abbasi et al. (2021) [33] | 1,540 Non-stained grayscale sperm images from the MHSMA dataset | Deep learning (CNNs) | The model improved accuracy rates to 84.00% (head), 80.66% (acrosome), and 94.00% (vacuole) compared to previous studies. | |
| Bachelot et al. (2023) [34] | 201 Patients who underwent TESE | Eight machine learning models | Aim: to predict the success of TESE in NOA | |
| Random forest performed best (AUC, 0.90; sensitivity, 100%; specificity, 69.2%) | ||||
| Inhibin B and varicocele history were key predictors. | ||||
| Wu et al. (2021) [35] | 702 Images from 30 patients | Deep learning CASA system | Aim: to automate sperm identification for TESE | |
| The model achieved a mean average precision of 0.741 and an average recall of 0.376, performing near human level. |
| Application area | Study | Dataset | AI methodology used | Key findings |
|---|---|---|---|---|
| Embryo assessment | Khosravi et al. (2019) [38] | Retrospective analysis of 12,001 time-lapse images from 1,774 embryos | STORK, a deep learning-based automated embryo quality assessment system | AUC >0.98, surpassing embryologists and generalizing across clinics. |
| Pregnancy likelihood ranged from 13.8% (poor-quality embryos, age ≥41) to 66.3% (good-quality embryos, age <37). | ||||
| Tran et al. (2019) [39] | Retrospective analysis of time-lapse videos of 10,638 embryos from eight centers | IVY, a deep learning model that predicts pregnancy with fetal heartbeat | AUC of 0.93 (95% CI, 0.92–0.94) in cross-validation, with results reproducible across clinics (AUC, 0.90–0.95). | |
| Lee et al. (2021) [41] | Retrospective analysis of 690 time-lapse videos with PGT-A results | Deep learning | AUC of 0.74 in distinguishing aneuploid embryos from euploid/mosaic embryos. | |
| Illingworth et al. (2024) [42] | Multicenter, randomized, double-blind trial | Deep learning | Clinical pregnancy rates of 46.5% (deep learning) vs. 48.2% (morphology) (risk difference, −1.7%; 95% CI, −7.7 to 4.3; p=0.62). | |
| 1,066 Patients from 14 centers | A significant, 10-fold reduction in evaluation time compared to the morphology group (21.3±18.1 seconds vs. 208.3±144.7 seconds, p<0.001). | |||
| Deep learning-based embryo selection vs. standard morphology assessment |
Criteria for implementing artificial intelligence systems in reproductive medicine2024 March;51(1)
Prognostic Value of Day 3 Inhibin-B on Assisted Reproductive Technology Outcome.1997 August;24(2)
Luteal Phase Support in Assisted Reproductive Technology.2007 March;34(1)

![]() |
![]() |