Current state of artificial intelligence applications in assisted reproductive technology: A narrative review

Article information

Korean J Fertil Steril. 2025;.cerm.2024.07710
Publication date (electronic) : 2025 September 9
doi : https://doi.org/10.5653/cerm.2024.07710
Department of Obstetrics and Gynecology, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea
Crresponding author: Ju Hee Kim Department of Obstetrics and Gynecology, Asan Medical Center, University of Ulsan College of Medicine, 88 Olympic-ro 43-gil, Songpa-gu, Seoul 05505, Republic of Korea Tel: +82-2-3010-0857 Fax: +82-2045-4564 E-mail: xjuheex@gmail.com
Received 2024 November 21; Revised 2025 April 25; Accepted 2025 May 21.

Abstract

Artificial intelligence (AI) has rapidly advanced in healthcare, demonstrating significant potential in analyzing large, heterogeneous datasets using optimized algorithms for disease prediction and personalized treatment. Assisted reproductive technology (ART), particularly in vitro fertilization (IVF) and embryo transfer, generates extensive data, making it especially suitable for AI-driven analysis. AI-based applications aim to improve clinical outcomes through personalized ART strategies and predictive algorithms, with potential applications categorized into various procedural stages. Despite its promising nature, most AI-related ART studies appear in general scientific journals rather than core obstetrics and gynecology publications. Moreover, limited clinician understanding of AI methodologies, strengths, and limitations represents a barrier to clinical implementation. This review summarizes recent advancements in AI applications within ART, covering areas such as clinical counseling, outcome prediction, IVF workflow management, controlled ovarian stimulation and follicular monitoring, oocyte and semen analysis, and embryo assessment. It also addresses future considerations for the responsible integration of AI technologies in ART, emphasizing the importance of multidisciplinary collaboration. Integrating AI into ART holds substantial promise and, with targeted research and development, is expected to meaningfully advance the achievement of successful pregnancies.

Introduction

Over the past 5 years, the emergence of artificial intelligence (AI) has rapidly spread to healthcare. Obstetrics and gynecology (OBGY) encompasses diverse medical practices; however, applications of AI concepts have only recently appeared in the relevant literature [1]. Approximately one-third of AI-related publications in OBGY indexed in PubMed focus on assisted reproductive technology (ART). These studies explore various subfields, including hypothesis generation for ART techniques and implantation pathophysiology, as well as the development of predictive algorithms. However, most related studies have appeared in general scientific journals, with only 18% published in core OBGY journals [1]. AI offers significant potential for addressing complex challenges in diagnosis, treatment, and large-scale biomedical data analysis. Nonetheless, clinicians face considerable barriers due to a general lack of understanding regarding AI methodologies and their associated strengths and limitations [2].

In general, AI is focused on enabling machines to perform tasks typically requiring human intelligence, such as language understanding or problem-solving. Machine learning (ML), a subset of AI, enables systems to improve their performance by learning from data rather than being explicitly programmed for each task. Deep learning (DL), a specialized branch of ML, utilizes multi-layered neural networks to identify complex patterns in large datasets, particularly excelling in tasks like image recognition and thus proving valuable in areas such as medical imaging [3]. The overarching concept of AI now encompasses subfields such as ML, computer vision, and language processing [4]. For clarity and to minimize terminological confusion, this review will consistently use the term ‘AI’ as a generalized reference to the studies discussed (Figure 1).

Figure 1.

Overview of artificial intelligence, machine learning, and deep learning approaches.

Research on the application of AI in healthcare has been primarily focused on the development of ML algorithms, including predictive modeling and decision support systems (DSS), leveraging large datasets to improve patient care [2]. ML enables automated analysis and interpretation of patient data from diverse sources using algorithmic methodologies. In ART, ML aids in designing personalized patient protocols and predicting procedure success rates [4]. ML approaches include supervised, semi-supervised, and unsupervised learning (Figure 1) [5]. Supervised learning is most frequently applied in real-world scenarios, employing algorithms such as decision trees, support vector machines, artificial neural networks, and random forests for prediction and classification tasks. It relies on labeled datasets to train algorithms that predict outcomes based on learned patterns. Human involvement is essential during dataset construction, but once trained, the model operates autonomously. The process has three stages: the use of a training set for model development, a validation set for performance optimization, and a test set to evaluate accuracy on unseen data [5]. In contrast, unsupervised learning aims to uncover hidden patterns in unlabeled data, often employing clustering techniques such as K-means clustering. However, assessing its success is challenging due to the absence of labeled outputs. Semi-supervised learning combines labeled and unlabeled data to enhance model performance, which is particularly beneficial in scenarios with limited labeled data. Despite these advantages, it is less commonly applied than supervised learning [5].

AI holds the potential to analyze large, diverse datasets using optimized algorithms to predict disease diagnoses and prognoses, as well as to deliver personalized treatment. By complementing clinical expertise, AI augments real-time, evidence-based decision-making, ultimately improving patient outcomes and healthcare quality. The ART field generates vast amounts of data during in vitro fertilization and embryo transfer (IVF-ET) cycles, making it especially suitable for leveraging AI-driven analyses to yield better clinical outcomes and increased efficiency [4]. Potential AI applications in IVF can be categorized into various stages, including pre-procedure planning, controlled ovarian stimulation (COS) and triggering, oocyte and sperm evaluation, embryo assessment, and luteal phase support following embryo transfer. Accordingly, this review summarizes the latest research on AI applications in ART and discusses future considerations for the integration and development of this technology.

Current applications

1. Clinical counseling and outcome prediction

Infertility diagnosis and treatment planning are typically guided by factors such as patient age, ovarian reserve, and sperm condition. However, due to the complexity of these parameters, establishing an effective treatment plan and accurately predicting treatment success—both before and during infertility treatment—is extremely challenging. AI technology can assist healthcare providers in offering more accurate counseling to patients. Additionally, AI can facilitate patient access to fertility-related information and appropriate medical care. Efforts to integrate AI into patient-provider interactions are already underway, and the potential of the large language model-based ChatGPT (Google) service has recently been acknowledged [6]. ChatGPT has demonstrated the capacity to generate responses to infertility-related queries that are comparable in length, accuracy, and sentiment to responses from the US Centers for Disease Control. However, 6.12% of its factual statements were found to be inaccurate, and only 0.68% of responses included citations. Despite its limitations, such as unreliable citation practices and a risk of misinformation, ChatGPT shows promise in addressing clinical questions.

Several AI algorithm models have been developed to diagnose infertility and predict pregnancy success rates using basic patient demographics and data from previous ART cycles. Liao et al. [7] developed a dynamic diagnosis and grading system for infertility by retrospectively analyzing data from 60,648 couples who underwent IVF-ET. Seven key indicators—age, body mass index (BMI), follicle-stimulating hormone (FSH) level, antral follicle count (AFC), anti-Müllerian hormone (AMH) level, number of oocytes, and endometrial thickness—were identified. The weights of these indicators were determined using entropy-based feature discretization and a random forest algorithm. The system categorized infertility into five grades, achieving a predictive stability of 95.94% through cross-validation. The highest grade, A, was associated with a 54% pregnancy success rate, while the lowest, E, had a success rate below 1%. The primary advantages of this scoring system include comprehensive diagnostic capability using various indicators and flexibility to update in real time as new samples are introduced. Additionally, McLernon et al. [8] developed a personalized cumulative live birth prediction model using logistic regression at two critical time points: before starting the first complete IVF cycle and after an unsuccessful initial IVF cycle. Their population-based cohort study included data from 88,614 patients, obtained from the Society for Assisted Reproductive Technology Clinical Outcomes Reporting System (SART CORS) and representing a nationwide United States cohort. The pre-treatment model indicated that cumulative live birth rates declined after age 30 and with higher BMI. AMH levels were significantly correlated with live birth probability up to 2 ng/mL. Key predictors identified included age, BMI, previous full-term birth, male factor infertility, polycystic ovary syndrome (PCOS), diminished ovarian reserve, uterine factor infertility, and unexplained infertility. Each additional IVF cycle reduced live birth odds by 42%, while higher AMH levels increased live birth odds. The post-treatment model indicated that women entering a second cycle had higher rates of diminished ovarian reserve (39%) and PCOS (38%) compared to their first cycle. A higher number of eggs retrieved during the initial cycle was associated with greater live birth odds. Other predictors in the post-treatment model closely aligned with the pre-treatment model. This model is publicly accessible online, allowing patients to input values such as their age, BMI, and the number of previously retrieved eggs to estimate their live birth rate (https://www.sart.org/). The cited studies are detailed in Table 1.

Summary of AI applications in clinical counseling and outcome prediction

2. Intrauterine insemination

Intrauterine insemination (IUI) has a lower success rate than IVF-ET but is less invasive and more convenient for patients, making it a relatively accessible option for subfertile couples [9]. However, AI applications in ART have predominantly focused on IVF-ET, with limited research addressing its application in IUI. Kozar et al. [9] investigated how AI could improve IUI success rates by selecting appropriate patients for the procedure. They analyzed 1,029 IUI procedures using various stimulation methods. The random forest model exhibited the best performance, achieving an area under the curve (AUC) of 66%, accuracy of 71%, sensitivity of 0.4, and specificity of 0.8. Semen parameters were identified as the most significant predictors. Although this model’s performance was modest compared to AI studies in IVF, the inherently lower success rates of IUI, which have not significantly improved over recent decades, complicate direct comparisons with IVF outcomes.

A large reproductive study introduced a novel method combining complex network-based feature engineering and a stacked ensemble (CNFE-SE) to predict IUI outcomes [10]. This dataset comprised 11,255 IUI procedures and included 20 key predictive features. The model achieved an AUC of 84%, sensitivity of 0.8, specificity of 0.9, and accuracy of 0.9. The most significant predictors were semen parameters (sperm concentration and motility) and female BMI. The superior performance of the CNFE-SE model suggests the potential for its use in a DSS to assist physicians in selecting optimal IUI treatment plans. The cited studies are detailed in Table 2.

Summary of AI applications in IUI and IVF workflow management

3. IVF workflow management

AI can improve ovarian stimulation outcomes by assisting clinicians with optimized dose adjustments, precise trigger timing, and better patient monitoring. DSS tools, computer algorithms developed for patient management based on electronic medical records, are increasingly used in clinical settings [11]. Recently, a DSS for IVF management demonstrated high accuracy and alignment with evidence-based expert team decisions in four key areas: stopping or continuing stimulation (0.92), triggering or canceling the cycle (0.96), determining the number of days until follow-up (0.87), and adjusting the dose (0.82) [12]. The algorithm exhibited the lowest accuracy in dosage decisions, attributed to the infrequent and physician-preference-driven nature of dosage adjustments during ovarian stimulation. The trained model tended to be more conservative in maintaining dosages compared to expert decisions; notably, however, when experts opted to increase the dosage, the algorithm never suggested a reduction.

Additionally, AI-based studies have aimed to reduce the frequency of monitoring during COS. Robertson et al. [13] sought to identify the optimal follicular tracking day to predict trigger timing and the risk of over-response. By retrospectively analyzing 9,294 ultrasound scans from 2,322 IVF cycles using random forest regression and classification methods, they identified stimulation day 5 as optimal for predicting trigger timing (mean squared error, 2.16±0.12) and risk of over-response (AUC, 0.91±0.01). Furthermore, Letterie et al. [14] developed an AI-based workflow algorithm intended to limit COS monitoring to a single visit while optimizing and distributing oocyte retrieval schedules. This study defined three key objectives: first, to identify the optimal single day for the monitoring visit based on the pre-treatment clinical profile; second, to provide a range of three potential trigger days to maximize scheduling flexibility without compromising outcomes; and third, to predict the number of mature oocytes. The dataset comprised 1,591 autologous cycles, each from a unique patient. The algorithm achieved a mean prediction error of 1.4 days for predicting the optimal monitoring day, with an accuracy of 0.8. The variance in retrieved oocyte count was 0–3 across the recommended trigger days. Additionally, the sensitivity for predicting mature oocyte count based on pre-treatment profiles was approximately 0.8. Future DSS platforms could thus design individualized workflows using pre-IVF clinical data. These workflows could be adaptively modified based on data collected during COS, enabling outcome prediction and personalized treatment. The cited studies are detailed in Table 2.

4. COS

The use of COS protocols varies widely depending on clinician, ART center, and patient preferences, leading to a lack of standardized guidelines. AI-driven analysis of diverse and scattered datasets can help select appropriate, personalized stimulation protocols for individual patients [15]. Correa et al. [16] developed an ML model to recommend the initial dose of FSH to achieve 12 mature oocytes across patient types. Using baseline information including age, BMI, AMH, AFC, and prior successful pregnancies from over 3,000 patients across five centers, they analyzed the linear relationship between the initial FSH dose and the number of eggs retrieved. The model significantly outperformed clinician-prescribed doses in both development and validation datasets. However, one limitation of these simple linear models is their failure to capture nonlinear dose-response relationships [15]. Another AI-driven analysis compared COS protocols, analyzing approximately 20,000 cycles for poor responders. The results showed comparable outcomes between the antagonist (n=15,000) and flare protocols (n=5,000) [17]. Given varied outcomes across protocols, the conclusion of Wald et al. [18] that protocol choice appears to only minimally impact outcomes remains relevant. However, future AI studies may contribute to the development of methods that directly predict patient-specific outcomes for each protocol, enabling personalized recommendations for injection types, dosages, and egg retrieval protocols [15].

DSS can also assist in determining optimal trigger timing. A comparative study was performed to determine the optimal trigger timing to maximize the yield of fertilized oocytes (2PNs) and blastocysts [19]. Utilizing causal inference with a T-learner and bagged decision trees, the algorithm, despite having physician-model agreement rates of only 53% and 62%, yielded an average of 1.4 more 2PNs and 0.6 more usable blastocysts compared to physician decisions. Another AI model was designed to predict mature oocyte outcomes for triggering either today or tomorrow, employing follicle counts and estradiol levels in a linear regression and estradiol forecasting model [20]. This study, encompassing three ART centers and 30,278 IVF cycles, demonstrated that early or late triggers resulted in fewer mature oocytes (up to 2.7 fewer), 2PNs (up to 2.0 fewer), and usable blastocysts (up to 1.0 fewer) compared to optimal timing. These AI platforms, though promising, are not yet commercially available; furthermore, the research supporting them has limitations due to its retrospective nature and primary focus on decision timing. Nevertheless, the findings suggest that DSS could effectively support daily COS management decisions.

Ultimately, applying AI in COS requires extensive testing and validation across diverse regions and populations [21]. Recent efforts have focused on developing algorithms that predict the number of mature oocytes, incorporating not only basic patient demographics and endocrine hormone profiles but also genetic information [22]. Therefore, international multicenter prospective studies are essential to ensure that AI algorithms are broadly applicable across clinical settings. The cited studies are detailed in Table 3.

Summary of AI applications in COS and follicular monitoring with 3D ultrasound

5. Follicular monitoring with three-dimensional ultrasound

AI research is widely employed in imaging studies, and in the field of ART, AI technology can be applied to follicular monitoring using ultrasound. A randomized controlled trial compared oocyte yield using AI-based three-dimensional (3D) automated volume calculation with traditional two-dimensional (2D) manual ultrasound follicle tracking in women undergoing IVF cycles [23]. The study found comparable ART outcomes between the two groups, with the 2D manual ultrasound group requiring significantly more time for scans (p<0.01). These results suggest that automated 3D ultrasound can be effectively used for follicle tracking, offering time-saving benefits. A DL-based segmentation algorithm using follicle volume measured by 3D ultrasound could also assist in assessing oocyte maturity, optimizing trigger timing, and predicting ovarian hyper-response [24]. The study identified a follicle volume of 0.5 cm3 or greater on triggering day as optimal for predicting mature oocytes, while a threshold of 3.0 cm3 for the leading follicle was significantly associated with a higher yield of mature oocytes compared to traditional 2D ultrasound. Furthermore, a multi-layer perceptron model outperformed traditional 2D ultrasound methods in predicting ovarian hyper-response. In conclusion, AI technology using 3D ultrasound may offer a promising alternative strategy for follicular monitoring beyond traditional 2D ultrasound methods, potentially improving ART success rates and broadening the range of available techniques. The cited studies are detailed in Table 3.

6. Oocyte assessment

The cumulus surrounding the oocyte poses challenges in assessing oocyte maturity, and selection based on morphology reduces the number of usable oocytes, making oocyte selection uncommon in standard IVF procedures. However, in cases where cumulus removal is necessary for selecting mature oocytes prior to intracytoplasmic sperm injection (ICSI), or specifically in fertility preservation scenarios, non-invasive AI-based methods can be valuable for assessing oocyte maturity and predicting reproductive potential [4].

Violet (Future Fertility), an oocyte AI image analysis tool trained with a convolutional neural network (CNN), predicted fertilization with 82% accuracy, blastocyst development with 67% accuracy, and live birth potential with 75% accuracy per oocyte cohort [25]. These findings suggest that AI-based analysis of oocyte morphology could inform clinical outcomes in IVF and support patient counseling, especially in fertility preservation. Similar software, Magenta (Future Fertility), analyzes oocyte images and scores each oocyte on a scale from 1 to 10 to predict blastocyst development quality [26]. Magenta scores revealed a significant difference in blastocyst development rates between the highest- and lowest-scored oocytes (46% vs. 27%, p<0.005). In a recent study, images of mature oocytes combined with clinical data from patients at eight fertility clinics across six countries were analyzed using AI to assess developmental competence in oocytes likely to progress to the blastocyst stage [27]. The study found the highest predictive performance in the 38 to 39 age group (AUC, 0.68), with minimal impact from male factors. Compared to previous studies on oocyte image analysis, this research incorporates clinical data and extends globally, highlighting the potential to standardize oocyte quality assessment. Although currently challenging to implement as an exclusive tool for oocyte selection or exclusion, it holds promise for application as part of a broader selection toolkit when combined with other clinical indicators, such as sperm evaluation and oocyte gene expression profiling, to accurately predict blastocyst formation and quality [4]. The cited studies are detailed in Table 4.

Summary of oocyte assessment and semen analysis

7. Semen assessment

Computer-aided sperm analyzer (CASA)-based semen evaluation aims to reduce inter-observer variation and has demonstrated good correlation with manual assessments for analyzing sperm concentration and motility. However, for morphological evaluation, CASA typically shows the lowest correlation with manual assessment, likely due to semen sample heterogeneity and the subjective nature of morphological interpretation. Therefore, integrating AI techniques has the potential to improve the accuracy and consistency of sperm assessment [4].

Several studies assessing sperm motility (progressive, non-progressive, and immotile spermatozoa) have utilized the VISEM dataset, a multimodal video dataset of human spermatozoa [28]. Analyzing sperm video sequences using CNN methods or linear support vector regression can provide consistent and efficient predictions of sperm motility within 5 minutes [29,30]. The VISEM dataset was extended to VISEM-Tracking, which includes longer video clips with annotated bounding boxes and sperm tracking information, making it more valuable than the original dataset for training complex ML models [31]. Recently, AI has been used to analyze sperm motility patterns in real time to select sperm for ICSI, reducing the time spent by embryologists during selection and providing an objective method that may improve fertilization and blastocyst rates [4]. Open access to the Modified Human Sperm Morphology Analysis (MHSMA) dataset facilitates studies on sperm morphology analysis. CNN-based DL methods have demonstrated high accuracy, achieving detection rates of up to 94% for morphological deformities in the sperm acrosome, head, and vacuole in real time without staining [32,33]. Furthermore, attempts have been made to integrate AI technology into the operator-dependent field of testicular sperm extraction to increase sperm identification efficiency in testicular biopsy samples and predict the success of sperm retrieval [34,35]. However, these studies have demonstrated only moderate performance, necessitating further external validation. Various methods exist for sperm selection in ICSI, but none have yielded significant improvements in pregnancy rates [36]. A non-invasive AI approach could help build an extensive database and establish a global consensus on sperm selection. Mojo-AISA, a commercially available AI-based sperm analysis system, was developed to address the limitations of traditional CASA systems [37]. It provides accurate semen analysis results in half the time required by manual methods, improving the efficiency of embryologists. Despite challenges with extremely low-concentration samples, it has potential to minimize human error and improve the accuracy of semen analysis. The cited studies are detailed in Table 4.

8. Embryo assessment

Embryo grading traditionally involves visual morphological assessment. However, this subjective method leads to inconsistent results among embryologists, contributing to variability and potentially lowering IVF success rates. The introduction of time-lapse imaging (TLI) is known to allow for more comprehensive embryo evaluation by integrating morphokinetic data with the traditional morphological grading system [21]. Consequently, numerous studies have explored AI methods combined with TLI to assess embryos, including the prediction of embryo quality, implantation potential, and pregnancy outcomes.

STORK, an automated blastocyst selection algorithm based on deep neural networks (DNNs), was developed to predict blastocyst quality for single embryo transfer [38]. Using extensive TLI data, blastocysts were classified into good or poor quality based on the likelihood of pregnancy. The DNN was trained over 50,000 iterations using Google’s Inception model. STORK achieved an impressive AUC exceeding 98% in predicting blastocyst quality. Considering patient age, the model predicted a 66% pregnancy success rate for women under 37 years old. STORK demonstrated robustness across datasets from other clinics and outperformed individual embryologists. Another retrospective multicenter study developed IVY, a fully automated DL model that utilizes raw time-lapse videos (TLV) without manual annotation [39]. This study analyzed over 10,000 embryos across four countries. IVY demonstrated an AUC of 93% in predicting whether a blastocyst would result in a fetal heartbeat pregnancy. A recent systematic analysis of 20 studies on AI in embryo assessment demonstrated that AI models consistently outperformed embryologists in predicting clinical pregnancy (accuracy, 77.8% vs. 64%) and embryo morphology grade (accuracy, 75.5% vs. 65.4%) [40]. However, the findings were limited by the lack of prospective clinical evaluations and variability in AI models, databases, and study designs. Additionally, TLV can support DL models in recognizing blastocyst ploidy status. Lee et al. [41] investigated TLV metadata combined with known preimplantation genetic testing for aneuploidy (PGT-A) outcomes. The model, trained to classify blastocysts as either aneuploid or other (euploid and mosaic embryos), achieved an AUC of 74%. This result modestly exceeded those of previous studies, suggesting that further research could contribute to non-invasive PGT-A assessment.

Recently, Illingworth et al. [42] conducted the first multicenter, randomized, double-blind, noninferiority trial comparing DL with manual morphology-based embryo selection in IVF. Among 1,066 patients, the DL group exhibited a pregnancy rate of 46.5%, while the morphology group had a rate of 48.2% (risk difference, −1.7%; p=0.62). This result did not demonstrate noninferiority. However, the algorithm showed a significant and nearly tenfold reduction in evaluation time compared to the morphology group (21.3±18.1 seconds vs. 208.3±144.7 seconds, p<0.001), regardless of blastocyst count. The absence of significant improvement in clinical pregnancy rates with AI technologies raises questions about their cost-effectiveness. Thus, future well-designed randomized controlled trials and robust regulatory measures are essential to prevent premature adoption, protect patients from exploitation, and ensure that the true clinical value of AI is responsibly evaluated [43]. The cited studies are detailed in Table 5.

Summary of embryo assessment

Future considerations and conclusion

The field of ART offers significant potential for integrating AI at various stages; however, careful consideration of multiple factors is necessary when implementing AI technologies. First, current studies mainly focus on optimizing decisions for a single cycle, but it is also essential to consider IVF cycles that account for individual biological variability [15]. Second, future studies should not only aim to increase oocyte retrieval, maturation, fertilization rates, and cumulative live birth rates but also analyze live birth outcomes [15]. Another key consideration is the need for prospective multicenter studies to ensure the generalizability of AI applications. In supervised ML models, labeled data are critical for developing well-structured algorithms, highlighting the importance of robust data standardization and quality control across ART centers [44].

However, several ethical concerns must be addressed before AI can be routinely implemented in clinical practice. These include issues related to informed consent, algorithmic bias, and the potential socioeconomic impacts of AI healthcare applications. As the scale of studies increases, establishing appropriate regulations to manage and safeguard patient data privacy becomes essential [45]. Some public-private AI projects have demonstrated weaknesses in privacy protection. Additionally, AI algorithms have shown the capability to re-identify anonymized patient data. These issues underscore the necessity for regulations that emphasize patient autonomy and informed consent in the collection, use, and protection of health information [45]. A second ethical issue is algorithmic bias. Bias in assessing surgical performance and racial bias have already been observed in several medical AI algorithms [46,47]. Algorithmic bias is not only a fairness issue but also a patient safety concern, as it may lead to inappropriate care and serious harm [48]. To address AI-driven discrimination, in 2024, the department of Health and Human Services Office for Civil Rights prohibited biased outcomes from DSS tools [48]. Since bias can emerge at any stage of the AI lifecycle—from conception through post-deployment—expanded efforts to implement mitigation strategies are essential [49]. Another ethical consideration is the socioeconomic impact of AI in healthcare. Although AI can improve patient outcomes through efficient data analysis, it also raises ethical concerns such as equitable access, anticompetitive practices, data misuse, and interference with human interaction [50]. Healthcare providers should be aware of these issues, apply AI cautiously, and contribute to establishing flexible regulatory frameworks.

The current state of AI technology in ART remains distant from routine clinical application. Prospective studies clearly demonstrating clinical benefits or cost-effectiveness compared to conventional practices remain limited. The envisioned role of AI in ART involves the seamless integration of clinical and laboratory data, ultrasound imaging, sperm analysis, and demographic information into a comprehensive system. Such a system could provide evidence-based recommendations, including pregnancy success rates for various ART treatments and optimal initial FSH dosages [51]. Achieving this vision requires collaborative, interdisciplinary research among clinicians, researchers, and technicians, as well as the establishment of integrated multi-institutional data registries [52]. Multiple AI models should be evaluated to identify the most accurate approach, with clearly defined training and testing phases. Additionally, continuous validation, bias reduction, and careful integration are essential to prevent algorithms from introducing statistical errors that could harm patients [15]. To ensure the responsible and effective deployment of these technologies, it is imperative to address the needs of ART stakeholders, including technicians, clinicians, embryologists, nursing teams, and administrators. Furthermore, robust regulatory frameworks and safeguards must be implemented to promote safe and ethical integration into clinical practice [51].

The application of AI in ART presents numerous opportunities. With thoughtful consideration of future directions and challenges, valuable research findings are expected to help advance the field.

Notes

Conflict of interest

No potential conflict of interest relevant to this article was reported.

References

1. Dhombres F, Bonnard J, Bailly K, Maurice P, Papageorghiou AT, Jouannic JM. Contributions of artificial intelligence reported in obstetrics and gynecology journals: systematic review. J Med Internet Res 2022;24e35465. 10.2196/35465. 35297766.
2. Chen M, Decary M. Artificial intelligence in healthcare: an essential guide for health leaders. Healthc Manage Forum 2020;33:10–8. 10.1177/0840470419873123. 31550922.
3. Drukker L, Noble JA, Papageorghiou AT. Introduction to artificial intelligence in ultrasound imaging in obstetrics and gynecology. Ultrasound Obstet Gynecol 2020;56:498–505. 10.1002/uog.22122. 32530098.
4. Hanassab S, Abbara A, Yeung AC, Voliotis M, Tsaneva-Atanasova K, Kelsey TW, et al. The prospect of artificial intelligence to personalize assisted reproductive technology. NPJ Digit Med 2024;7:55. 10.1038/s41746-024-01006-x. 38429464.
5. Aldahiri A, Alrashed B, Hussain W. Trends in using IoT with machine learning in health prediction system. Forecasting 2021;3:181–206. 10.3390/forecast3010012.
6. Chervenak J, Lieman H, Blanco-Breindel M, Jindal S. The promise and peril of using a large language model to obtain clinical information: ChatGPT performs strongly as a fertility counseling tool with limitations. Fertil Steril 2023;120(3 Pt 2):575–83. 10.1016/j.fertnstert.2023.05.151. 37217092.
7. Liao S, Pan W, Dai WQ, Jin L, Huang G, Wang R, et al. Development of a dynamic diagnosis grading system for infertility using machine learning. JAMA Netw Open 2020;3e2023654. 10.1001/jamanetworkopen.2020.23654. 33165608.
8. McLernon DJ, Raja EA, Toner JP, Baker VL, Doody KJ, Seifer DB, et al. Predicting personalized cumulative live birth following in vitro fertilization. Fertil Steril 2022;117:326–38. 10.1016/j.fertnstert.2021.09.015. 34674824.
9. Kozar N, Kovac V, Reljic M. Can methods of artificial intelligence aid in optimizing patient selection in patients undergoing intrauterine inseminations? J Assist Reprod Genet 2021;38:1665–73. 10.1007/s10815-021-02224-y. 34031765.
10. Ranjbari S, Khatibi T, Vosough Dizaji A, Sajadi H, Totonchi M, Ghaffari F. CNFE-SE: a novel approach combining complex network-based feature engineering and stacked ensemble to predict the success of intrauterine insemination and ranking the features. BMC Med Inform Decis Mak 2021;21:1. 10.1186/s12911-020-01362-0. 33388057.
11. Garg AX, Adhikari NK, McDonald H, Rosas-Arellano MP, Devereaux PJ, Beyene J, et al. Effects of computerized clinical decision support systems on practitioner performance and patient outcomes: a systematic review. JAMA 2005;293:1223–38. 10.1001/jama.293.10.1223. 15755945.
12. Letterie G, Mac Donald A. Artificial intelligence in in vitro fertilization: a computer decision support system for day-to-day management of ovarian stimulation during in vitro fertilization. Fertil Steril 2020;114:1026–31. 10.1016/j.fertnstert.2020.06.006. 33012555.
13. Robertson I, Chmiel FP, Cheong Y. Streamlining follicular monitoring during controlled ovarian stimulation: a data-driven approach to efficient IVF care in the new era of social distancing. Hum Reprod 2021;36:99–106. 10.1093/humrep/deaa251. 33147345.
14. Letterie G, MacDonald A, Shi Z. An artificial intelligence platform to optimize workflow during ovarian stimulation and IVF: process improvement and outcome-based predictions. Reprod Biomed Online 2022;44:254–60. 10.1016/j.rbmo.2021.10.006. 34865998.
15. Hariton E, Pavlovic Z, Fanton M, Jiang VS. Applications of artificial intelligence in ovarian stimulation: a tool for improving efficiency and outcomes. Fertil Steril 2023;120:8–16. 10.1016/j.fertnstert.2023.05.148. 37211063.
16. Correa N, Cerquides J, Arcos JL, Vassena R. Supporting first FSH dosage for ovarian stimulation with machine learning. Reprod Biomed Online 2022;45:1039–45. 10.1016/j.rbmo.2022.06.010. 35915001.
17. Murillo F, Fanton M, Baker VL, Loewke K. Causal inference indicates that poor responders have similar outcomes with the antagonist protocol compared with flare. Fertil Steril 2023;120:289–96. 10.1016/j.fertnstert.2023.04.007. 37044308.
18. Wald K, Hariton E, Morris JR, Chi EA, Jaswa EG, Cedars MI, et al. Changing stimulation protocol on repeat conventional ovarian stimulation cycles does not lead to improved laboratory outcomes. Fertil Steril 2021;116:757–65. 10.1016/j.fertnstert.2021.04.030. 34045067.
19. Hariton E, Chi EA, Chi G, Morris JR, Braatz J, Rajpurkar P, et al. A machine learning algorithm can optimize the day of trigger to improve in vitro fertilization outcomes. Fertil Steril 2021;116:1227–35. 10.1016/j.fertnstert.2021.06.018. 34256948.
20. Fanton M, Nutting V, Solano F, Maeder-York P, Hariton E, Barash O, et al. An interpretable machine learning model for predicting the optimal day of trigger during ovarian stimulation. Fertil Steril 2022;118:101–8. 10.1016/j.fertnstert.2022.04.003. 35589417.
21. Bhaskar D, Chang TA, Wang S. Current trends in artificial intelligence in reproductive endocrinology. Curr Opin Obstet Gynecol 2022;34:159–63. 10.1097/gco.0000000000000796. 35895955.
22. Zielinski K, Pukszta S, Mickiewicz M, Kotlarz M, Wygocki P, Zielen M, et al. Personalized prediction of the secondary oocytes number after ovarian stimulation: a machine learning model based on clinical and genetic data. PLoS Comput Biol 2023;19e1011020. 10.1371/journal.pcbi.1011020. 37104276.
23. Noor N, Vignarajan CP, Malhotra N, Vanamail P. Three-dimensional automated volume calculation (sonography-based automated volume count) versus two-dimensional manual ultrasonography for follicular tracking and oocyte retrieval in women undergoing in vitro fertilization-embryo transfer: a randomized controlled trial. J Hum Reprod Sci 2020;13:296–302. 10.4103/jhrs.jhrs_91_20. 33627979.
24. Liang X, Liang J, Zeng F, Lin Y, Li Y, Cai K, et al. Evaluation of oocyte maturity using artificial intelligence quantification of follicle volume biomarker by three-dimensional ultrasound. Reprod Biomed Online 2022;45:1197–206. 10.1016/j.rbmo.2022.07.012. 36075848.
25. Murria L, Arnal LB, Chiva ES, Albert C, Cobo A, Meseguer M. Artificial intelligence oocyte image analysis predicts fertilization, blastocyst development, and live birth outcomes per cohort. Fertil Steril 2023;120:e42–3. 10.1016/j.fertnstert.2023.08.151.
26. Fjeldstad J, Mercuri N, Meriano J, Krivoi A, Campbell A, Smith R, et al. O-204 Non-invasive AI image analysis unlocks the secrets of oocyte quality and reproductive potential by assigning ‘Magenta’ scores from 2-dimensional (2-D) microscope images. Hum Reprod 2022;37(Supplement 1):deac104.119. 10.1093/humrep/deac104.119.
27. Fjeldstad J, Qi W, Mercuri N, Siddique N, Meriano J, Krivoi A, et al. An artificial intelligence tool predicts blastocyst development from static images of fresh mature oocytes. Reprod Biomed Online 2024;48:103842. 10.1016/j.rbmo.2024.103842. 38552566.
28. Haugen TB, Hicks SA, Andersen JM, Witczak O, Hammer HL, Borgli R, et al. VISEM: a multimodal video dataset of human spermatozoa. In: Proceedings of the 10th ACM Multimedia Systems Conference (ACM Mmsys'19); 2019 Jun 18-21; Amherst, MA. Association for Computing Machinery; 2019. p. 261-6. Available from: https://doi.org/10.1145/3304109.33258. 10.1145/3304109.33258.
29. Hicks SA, Andersen JM, Witczak O, Thambawita V, Halvorsen P, Hammer HL, et al. Machine learning-based analysis of sperm videos and participant data for male fertility prediction. Sci Rep 2019;9:16770. 10.1038/s41598-019-53217-y. 31727961.
30. Ottl S, Amiriparian S, Gerczuk M, Schuller BW. motilitAI: a machine learning framework for automatic prediction of human sperm motility. iScience 2022;25:104644. 10.1016/j.isci.2022.104644. 35856034.
31. Thambawita V, Hicks SA, Storas AM, Nguyen T, Andersen JM, Witczak O, et al. VISEM-Tracking, a human spermatozoa tracking dataset. Sci Data 2023;10:260. 10.1038/s41597-023-02173-4. 37156762.
32. Javadi S, Mirroshandel SA. A novel deep learning method for automatic assessment of human sperm images. Comput Biol Med 2019;109:182–94. 10.1016/j.compbiomed.2019.04.030. 31059902.
33. Abbasi A, Miahi E, Mirroshandel SA. Effect of deep transfer and multi-task learning on sperm abnormality detection. Comput Biol Med 2021;128:104121. 10.1016/j.compbiomed.2020.104121. 33246195.
34. Bachelot G, Dhombres F, Sermondade N, Haj Hamid R, Berthaut I, Frydman V, et al. A machine learning approach for the prediction of testicular sperm extraction in nonobstructive azoospermia: algorithm development and validation study. J Med Internet Res 2023;25e44047. 10.2196/44047. 37342078.
35. Wu DJ, Badamjav O, Reddy VV, Eisenberg M, Behr B. A preliminary study of sperm identification in microdissection testicular sperm extraction samples with deep convolutional neural networks. Asian J Androl 2021;23:135–9. 10.4103/aja.aja_66_20. 33106465.
36. Baldini D, Ferri D, Baldini GM, Lot D, Catino A, Vizziello D, et al. Sperm selection for ICSI: do we have a winner? Cells 2021;10:3566. 10.3390/cells10123566. 34944074.
37. Sengupta P, Dutta S, Roychoudhury S, Vizzarri F, Slama P. Revolutionizing semen analysis: introducing Mojo AISA, the next-gen artificial intelligence microscopy. Front Cell Dev Biol 2023;11:1203708. 10.3389/fcell.2023.1203708. 37408534.
38. Khosravi P, Kazemi E, Zhan Q, Malmsten JE, Toschi M, Zisimopoulos P, et al. Deep learning enables robust assessment and selection of human blastocysts after in vitro fertilization. NPJ Digit Med 2019;2:21. 10.1038/s41746-019-0096-y. 31304368.
39. Tran D, Cooke S, Illingworth PJ, Gardner DK. Deep learning as a predictive tool for fetal heart pregnancy following time-lapse incubation and blastocyst transfer. Hum Reprod 2019;34:1011–8. 10.1093/humrep/dez064. 31111884.
40. Salih M, Austin C, Warty RR, Tiktin C, Rolnik DL, Momeni M, et al. Embryo selection through artificial intelligence versus embryologists: a systematic review. Hum Reprod Open 2023;2023:hoad031. 10.1093/hropen/hoad031. 37588797.
41. Lee CI, Su YR, Chen CH, Chang TA, Kuo EE, Zheng WL, et al. End-to-end deep learning for recognition of ploidy status using time-lapse videos. J Assist Reprod Genet 2021;38:1655–63. 10.1007/s10815-021-02228-8. 34021832.
42. Illingworth PJ, Venetis C, Gardner DK, Nelson SM, Berntsen J, Larman MG, et al. Deep learning versus manual morphology-based embryo selection in IVF: a randomized, double-blind noninferiority trial. Nat Med 2024;30:3114–20. 10.1038/s41591-024-03166-5. 39122964.
43. Kieslinger DC, Lambalk CB, Vergouw CG. The inconvenient reality of AI-assisted embryo selection in IVF. Nat Med 2024;30:3059–60. 10.1038/s41591-024-03289-9. 39354198.
44. Swain J, VerMilyea MT, Meseguer M, Ezcurra D, ; Fertility AI Forum Group. AI in the treatment of fertility: key considerations. J Assist Reprod Genet 2020;37:2817–24. 10.1007/s10815-020-01950-z. 32989510.
45. Murdoch B. Privacy and artificial intelligence: challenges for protecting health information in a new era. BMC Med Ethics 2021;22:122. 10.1186/s12910-021-00687-3. 34525993.
46. Kiyasseh D, Laca J, Haque TF, Otiato M, Miles BJ, Wagner C, et al. Human visual explanations mitigate bias in AI-based assessment of surgeon skills. NPJ Digit Med 2023;6:54. 10.1038/s41746-023-00766-2. 36997642.
47. Obermeyer Z, Powers B, Vogeli C, Mullainathan S. Dissecting racial bias in an algorithm used to manage the health of populations. Science 2019;366:447–53. 10.1126/science.aax2342. 31649194.
48. Ratwani RM, Sutton K, Galarraga JE. Addressing AI algorithmic bias in health care. JAMA 2024;332:1051–2. 10.1001/jama.2024.13486. 39230911.
49. Hasanzadeh F, Josephson CB, Waters G, Adedinsewo D, Azizi Z, White JA. Bias recognition and mitigation strategies in artificial intelligence healthcare applications. NPJ Digit Med 2025;8:154. 10.1038/s41746-025-01503-7. 40069303.
50. Capraro V, Lentsch A, Acemoglu D, Akgun S, Akhmedova A, Bilancini E, et al. The impact of generative artificial intelligence on socioeconomic inequalities and policy making. PNAS Nexus 2024;3:pgae191. 10.1093/pnasnexus/pgae191. 38864006.
51. Letterie G. Three ways of knowing: the integration of clinical expertise, evidence-based medicine, and artificial intelligence in assisted reproductive technologies. J Assist Reprod Genet 2021;38:1617–25. 10.1007/s10815-021-02159-4. 33870475.
52. Barrera FJ, Brown ED, Rojo A, Obeso J, Plata H, Lincango EP, et al. Application of machine learning and artificial intelligence in the diagnosis and classification of polycystic ovarian syndrome: a systematic review. Front Endocrinol (Lausanne) 2023;14:1106625. 10.3389/fendo.2023.1106625. 37790605.

Article information Continued

Figure 1.

Overview of artificial intelligence, machine learning, and deep learning approaches.

Table 1.

Summary of AI applications in clinical counseling and outcome prediction

Application area Study Dataset AI methodology used Key findings
Clinical counseling and outcome prediction Liao et al. (2020) [7] Retrospective study of 60,648 IVF-ET cycles Entropy-based feature discretization, random forest algorithm The grading system classified infertility into five grades (A–E), with pregnancy rates of 53.82% for grade A and 0.90% for grade E. Cross-validation confirmed 95.94% stability (95% CI, 95.14%–96.74%).
Key predictors: age, BMI, FSH, AFC, AMH, number of oocytes, and endometrial thickness
McLernon et al. (2022) [8] Population-based cohort study of 88,614 patients, using SART CORS data Logistic regression method at two key time points: before starting the first complete IVF cycle and after an unsuccessful first complete IVF cycle Each additional IVF cycle reduced live birth odds by 42%, while higher AMH and more retrieved eggs were associated with increased success rates.
Key predictors: age, BMI, previous full-term birth, male factor infertility, PCOS, DOR, uterine factor infertility, and unexplained infertility Women in a second cycle had higher rates of DOR (39%) and PCOS (38%).

AI, artificial intelligence; IVF-ET, in vitro fertilization and embryo transfer; BMI, body mass index; FSH, follicle-stimulating hormone; AFC, antral follicle count; AMH, anti-Müllerian hormone; CI, confidence interval; SART CORS, Society for Assisted Reproductive Technology Clinical Outcomes Reporting System; PCOS, polycystic ovary syndrome; DOR, diminished ovarian reserve.

Table 2.

Summary of AI applications in IUI and IVF workflow management

Application area Study Dataset AI methodology used Key findings
IUI Kozar et al. (2021) [9] 1,029 IUI procedures in 413 couples, including various stimulation methods Random forest, artificial neural network, partial least squares regression, support vector machine, and linear models The best predictive model for IUI outcome was random forest (AUC, 0.66; sensitivity, 0.4; specificity, 0.8).
Sperm parameters represented the most significant predictor.
Ranjbari et al. (2021) [10] Reproductive study of 11,255 IUI procedures Combining complex network-based feature engineering and stacked ensemble The best predictive model for IUI outcome achieved an AUC of 0.84, with a sensitivity of 0.8, specificity of 0.9, and accuracy of 0.9.
Included the 20 most important features for IUI prediction The most significant predictors were semen parameters (sperm concentration and motility) and female BMI.
IVF workflow management Letterie et al. (2020) [12] 2,603 IVF cycles (1,853 autologous and 750 donor cycles) Classification and regression trees, random forests, support vector machines, logistic regression, and neural networks The algorithm accurately predicted four key clinical decisions, aligning closely with expert decisions: (1) continuing or stopping stimulation (92% accuracy), (2) triggering or canceling retrieval (96%), (3) dose adjustments (82%), and (4) follow-up timing (87%).
Robertson et al. (2021) [13] Retrospective analysis of 9,294 ultrasound scans from 2,322 IVF cycles Random forest regression and random forest classifier method Stimulation day 5 was the best day for predicting the trigger date (mean squared error, 2.16±0.12) and over-response (AUC, 0.91±0.01).
Letterie et al. (2022) [14] 1,591 autologous IVF cycles Linear regression, random forest, extra tree regression, K-nearest neighbor The algorithm (1) predicted the optimal monitoring day (mean error, 1.4 days), (2) identified trigger day options within a 3-day retrieval range, with oocyte count variance of 0–3 across these days, and (3) estimated mature oocyte count with a sensitivity of approximately 0.8.

AI, artificial intelligence; IUI, intrauterine insemination; IVF, in vitro fertilization; AUC, area under the curve; BMI, body mass index.

Table 3.

Summary of AI applications in COS and follicular monitoring with 3D ultrasound

Application area Study Dataset AI methodology used Key findings
COS Correa et al. (2022) [16] Observational study of 3,487 patients from five centers Linear regression The model outperformed clinician-prescribed FSH doses, achieving performance scores of 0.87 vs. 0.83 (p=2.44e-10) in development and 0.89 vs. 0.84 (p=3.81e-05) in validation.
Predictor variables: age, BMI, AMH, AFC, and previous live births
Murillo et al. (2023) [17] Retrospective study of 20,081 IVF cycles, using SART CORS Logistic regression, K-nearest neighbor When comparing flare and antagonist protocols in poor responders, (1) CLBR was similar in the first cycles (14.2% vs. 13.6%); and (2) CLBR improvement was comparable across protocol transitions.
Hariton et al. (2021) [19] 7,886 IVF with ICSI cycles T-learner with bagged decision trees For trigger timing decisions, physician-model agreement was 53% for 2PNs and 62% for blastocysts.
The algorithm yielded 1.4 more 2PNs and 0.6 more blastocysts per cycle compared to physicians.
Fanton et al. (2022) [20] 30,278 IVF cycles from three centers Linear regression When predicting mature oocyte outcomes for triggering today versus tomorrow: (1) early triggers resulted in 2.3 fewer MII oocytes, 1.8 fewer 2PNs, and 1.0 fewer blastocysts compared to optimal triggers; and (2) late triggers led to 2.7 fewer MII oocytes, 2.0 fewer 2PNs, and 0.7 fewer blastocysts than optimal triggers.
Follicular monitoring with 3D ultrasound Noor et al. (2020) [23] Randomized controlled trial of 130 IVF-ET recipients Sonography-based automated volume count Both methods yielded comparable ART outcomes, including oocyte retrieval, fertilization, cleavage, and pregnancy rates. However, 2D ultrasound required significantly more time (p<0.01).
Automated 3D vs. manual 2D ultrasound
Liang et al. (2022) [24] 515 IVF cases Deep learning-based segmentation algorithm The optimal follicle volume cut-off for predicting mature oocytes was 0.5 cm3, outperforming conventional 2D ultrasound.
The optimal leading follicle volume for hCG triggering was 3.0 cm3, significantly correlating with more mature oocytes (p=0.01).
For predicting ovarian hyper-response, a multi-layer perceptron model achieved higher accuracy (0.890 vs. 0.785) than 2D measurements.

AI, artificial intelligence; COS, controlled ovarian stimulation; 3D, three-dimensional; BMI, body mass index; AMH, anti-Müllerian hormone; AFC, antral follicle count; FSH, follicle-stimulating hormone; SART CORS, Society for Assisted Reproductive Technology Clinical Outcomes Reporting System; CLBR, clinical live birth rate; IVF, in vitro fertilization; ICSI, intracytoplasmic sperm injection; 2PN, fertilized oocytes; MII, metaphase II; ET, embryo transfer; 2D, two-dimensional; ART, assisted reproductive technology; hCG, human chorionic gonadotropin.

Table 4.

Summary of oocyte assessment and semen analysis

Application area Study Dataset AI methodology used Key findings
Oocyte assessment Murria et al. (2023) [25] Retrospective study of 165 oocyte images from 24 patients Violet (Future Fertility, Canada), 2D images Predicted fertilization with 82% accuracy, blastocyst development with 67% accuracy, and live birth potential with 75% accuracy per oocyte cohort
Fjeldstad et al. (2022) [26] Prospective multicenter study of images of 392 oocytes from 46 patients Magenta (Future Fertility, Canada), 2D images Magenta oocyte scores correlated significantly with blastocyst formation, with higher-scoring oocytes (7.1–10) developing into blastocysts at a 46.1% rate vs. 26.6% for lower-scoring oocytes (1.0–4.0) (p<0.005).
Fjeldstad et al. (2024) [27] 37,133 Images of mature oocytes combined with clinical data from patients across eight fertility clinics in six countries Deep learning Test dataset: AUC, 0.64; accuracy, 0.60; specificity, 0.55; sensitivity, 0.65; best performance in 38–39-year age group (AUC, 0.68), minimal male factor impact, good clinic generalizability
External validation: AUC, 0.63; accuracy, 0.58; specificity, 0.57; sensitivity, 0.59. Higher scores were correlated with better blastocyst outcomes.
Semen analysis Hicks et al. (2019) [29] 85 Sperm motility videos from the VISEM dataset Deep learning (CNNs), linear regression Deep learning-based prediction was fast and consistent, although adding participant data did not improve performance.
Ottl et al. (2022) [30] 85 Sperm motility videos from the VISEM dataset Multiple neural networks, support vector regression models The model can predict the percentage of progressive, non-progressive, and immotile spermatozoa.
The mean absolute error improved from 8.83 to 7.31 compared to previous studies.
Javadi et al. (2019) [32] 1,540 Images from 235 male infertility patients from the MHSMA dataset Deep learning (CNNs) The model exhibited high accuracy in identifying morphological deformities in real time.
The model achieved F0.5 scores of 84.74% (acrosome), 83.86% (head), and 94.65% (vacuole).
Abbasi et al. (2021) [33] 1,540 Non-stained grayscale sperm images from the MHSMA dataset Deep learning (CNNs) The model improved accuracy rates to 84.00% (head), 80.66% (acrosome), and 94.00% (vacuole) compared to previous studies.
Bachelot et al. (2023) [34] 201 Patients who underwent TESE Eight machine learning models Aim: to predict the success of TESE in NOA
Random forest performed best (AUC, 0.90; sensitivity, 100%; specificity, 69.2%)
Inhibin B and varicocele history were key predictors.
Wu et al. (2021) [35] 702 Images from 30 patients Deep learning CASA system Aim: to automate sperm identification for TESE
The model achieved a mean average precision of 0.741 and an average recall of 0.376, performing near human level.

AI, artificial intelligence; 2D, two-dimensional; AUC, area under the curve; CNN, convolutional neural network; MHSMA, Modified Human Sperm Morphology Analysis; TESE, testicular sperm extraction; NOA, non-obstructive azoospermia; CASA, computer-aided sperm analyzer.

Table 5.

Summary of embryo assessment

Application area Study Dataset AI methodology used Key findings
Embryo assessment Khosravi et al. (2019) [38] Retrospective analysis of 12,001 time-lapse images from 1,774 embryos STORK, a deep learning-based automated embryo quality assessment system AUC >0.98, surpassing embryologists and generalizing across clinics.
Pregnancy likelihood ranged from 13.8% (poor-quality embryos, age ≥41) to 66.3% (good-quality embryos, age <37).
Tran et al. (2019) [39] Retrospective analysis of time-lapse videos of 10,638 embryos from eight centers IVY, a deep learning model that predicts pregnancy with fetal heartbeat AUC of 0.93 (95% CI, 0.92–0.94) in cross-validation, with results reproducible across clinics (AUC, 0.90–0.95).
Lee et al. (2021) [41] Retrospective analysis of 690 time-lapse videos with PGT-A results Deep learning AUC of 0.74 in distinguishing aneuploid embryos from euploid/mosaic embryos.
Illingworth et al. (2024) [42] Multicenter, randomized, double-blind trial Deep learning Clinical pregnancy rates of 46.5% (deep learning) vs. 48.2% (morphology) (risk difference, −1.7%; 95% CI, −7.7 to 4.3; p=0.62).
1,066 Patients from 14 centers A significant, 10-fold reduction in evaluation time compared to the morphology group (21.3±18.1 seconds vs. 208.3±144.7 seconds, p<0.001).
Deep learning-based embryo selection vs. standard morphology assessment

AI, artificial intelligence; AUC, area under the curve; CI, confidence interval; PGT-A, preimplantation genetic testing for aneuploidy.