Horizoning the Convergence of Artificial Intelligence and Healthcare: An Exploratory Analysis Using Latent Semantic Analysis
PDF
Cite
Share
Request
Clinical Research
VOLUME: 35 ISSUE: 3
P: 353 - 367
December 2025

Horizoning the Convergence of Artificial Intelligence and Healthcare: An Exploratory Analysis Using Latent Semantic Analysis

Anatol J Gen Med Res 2025;35(3):353-367
1. İzmir Katip Celebi University Faculty of Economics and Administrative Sciences, Department of Health Management, İzmir, Turkiye
2. İzmir Provincial Health Directorate Personnel Services Directorate, İzmir, Turkiye
No information available.
No information available
Received Date: 10.11.2025
Accepted Date: 18.11.2025
Online Date: 30.12.2025
Publish Date: 30.12.2025
PDF
Cite
Share
Request

Abstract

Objective

This study aims to provide insights for healthcare stakeholders by applying latent semantic analysis to the most-cited scientific publications in artificial intelligence and healthcare published over the past decade.

Methods

Publications were retrieved from the Web of Science database, focusing on the 1.000 most-cited papers published between 2015 and 2025. Latent semantic analysis  was employed for text analysis, encompassing corpus creation, text preprocessing, tokenization, lowercasing, stopword removal, stemming, lemmatization, term-document matrix construction, weighting, singular value decomposition, dimensionality reduction, semantic construction, and interpretation. Model performance was assessed using singular values, explained variance, and cumulative variance, with the optimal number of dimensions determined to be 160.

Results

The latent semantic analysis model effectively uncovered the underlying semantic relationships within the dataset. The gradual decline in decomposition values supported the appropriateness of the model’s structure and dimensionality. The analysis revealed that artificial intelligence and healthcare are converging on two primary clinical themes: deep learning and predictive applications. The deep learning theme reflects the training of artificial intelligence systems with patient data, whereas the predictive theme emphasizes the use of artificial intelligence in diagnostic and therapeutic decision-making. Additionally, the distinct semantic positioning of the coronavirus disease-2019 theme highlighted the model’s ability to differentiate thematic clusters accurately.

Conclusion

Findings indicate a clear convergence between artificial intelligence and healthcare, demonstrating increasing interconnectivity. Advances in artificial intelligence are increasingly influencing clinical decision-support systems and patient-centered applications. Policymakers should develop strategic frameworks to ensure the safe, ethical, and effective integration of AI into healthcare, particularly in clinical decision-support and patient care. Such frameworks must emphasize regulatory standards, professional training, data privacy, and patient safety to maximize the benefits of AI while mitigating potential risks.

Keywords:
Healthcare, artificial intelligence, text analytics, natural language processing, latent semantic analysis

Introduction

Modern healthcare is evolving from a volume-based model to a value-based one, emphasizing the use of health data to optimize resources, enhance quality of care, increase patient satisfaction, and improve health outcomes(1). This transition highlights the growing importance of data analytics in healthcare, as it helps uncover valuable patterns in large, complex datasets to address new challenges. In this context, artificial intelligence (AI) has become a powerful tool in healthcare(2). According to the common definition of AI as the simulation of human cognition, AI builds on advances in predictive modeling approaches, such as machine learning (ML), through which computer algorithms learn from training data without human guidance to enable algorithmic decision-making(2). As technology advances, AI’s ability to process and analyze data is becoming increasingly powerful, making it a key instrument in transforming healthcare(3). Recent studies have focused on promoting innovative algorithms that integrate novel solutions into health systems to provide more robust healthcare(4). Therefore, the integration of AI and big data has created opportunities to incorporate evidence-based decision-making more extensively into the healthcare system. Considering this horizon, novel developments offer new potential, such as augmented decision-making support for physicians across the spectrum of care, thereby contributing to enhanced patient safety and improved health outcomes(5, 6).

In a technology-driven world, healthcare has witnessed breakthroughs and exponential advancements(7). Data, advanced analytics, and computational tools have created a new paradigm for managing health data to predict disease occurrence and treatment outcomes(8). In this context, AI has emerged through technological progress and is poised to revolutionize healthcare by enabling advanced algorithmic decision-making in domain-specific applications. These applications aim to mimic human thinking and cognitive functions, thereby transforming healthcare delivery. As is well known, healthcare and health services have traditionally been provided by experts and specialized institutions. Therefore, the convergence of AI and healthcare providers represents a paradigm-changing development, enabled by the increasing availability of healthcare data and rapid advances in analytic techniques. In particular, early detection and diagnosis, treatment optimization, and outcome prediction and prognostic evaluation are the three major areas of AI applications in healthcare(9). AI applications can be understood through aggregated healthcare data, which produce more powerful models capable of automating diagnosis and enhancing personalized medicine solutions(10). Generally, AI-driven technologies can be categorized into two main approaches. The first is the ML approach, which analyzes structured data such as imaging and genetic data. The second involves natural language processing (NLP) techniques, which extract information from unstructured data, such as clinical notes, to augment structured medical data for clinical use(9).

AI applications in healthcare domains generally utilize supervised learning techniques, including support vector machines, neural networks, decision trees, random forests, linear and logistic regression, Naive Bayes, discriminant analysis, and nearest neighbor algorithms. Clinically significant gains have been achieved, particularly through supervised learning methods such as neural networks and support vector machines, rather than unsupervised approaches such as clustering and principal component analysis(9). Therefore, it is paramount to combine data with advanced systems, applications, and algorithms to generate meaningful knowledge(3). The increasing volume and complexity of medical data underscore the rationale for using AI-driven technologies, especially deep learning and ML, in healthcare. The development of new AI applications based on deep learning demonstrates AI’s substantial potential to transform healthcare. Kaku(11), author of Physics of the Future urges us to recognize the limits of AI in healthcare with the following words:

“In the near future, you will simply approach a wall-mounted screen and consult a robo-doctor. A friendly face will ask you many questions. Then you will answer orally rather than in writing. After a few initial questions, the robo-doctor will diagnose your disease based on the most robust clinical experience of doctors worldwide.”

AI applications, particularly those based on deep learning, are already revolutionizing healthcare. For instance, International Business Machines’s Watson platform enables healthcare providers to analyze complex health data more accurately and cost-effectively than human experts(3). Numerous studies have demonstrated the effectiveness of AI in medical applications such as image-based diagnostics(12). One study showed that AI could detect diabetic retinopathy, a leading cause of blindness among diabetic patients(13). In the United States, AI models trained on over 128.000 images achieved excellent diagnostic performance in detecting diabetic retinopathy(14). AI has also been utilized to predict pediatric diseases such as asthma and pneumonia, achieving results comparable to human expertise(12).

Pain prediction represents another promising domain for AI-driven algorithms, as pain directly affects patients’ quality of life and informs the selection of optimal treatments. Given the potential bias in pain assessment tools arising from patient and physician subjectivity, Liu et al.(15) emphasize the importance of machine-based pain assessment that uses facial expression data to provide more accurate, less biased evaluations. In radiology, Li et al.(16) introduced a deep learning-based model combining multimodal brain data to improve diagnostic accuracy. The model, trained on magnetic resonance imaging (MRI) and positron emission tomography (PET) images, predicts missing PET patterns from MRI data to assist diagnosis of neurodegenerative diseases such as Alzheimer’s disease. Similarly, in the field of surgery, robotic systems such as da Vinci have been used for minimally invasive procedures and have been associated with safe and effective patient outcomes(17-20). As AI’s predictive models continue to improve, even greater advancements in healthcare are expected in the near future.

Advancements in data, analytics, and computational tools are transforming how health data are utilized to predict disease occurrences and treatment outcomes(8). AI plays a critical role in this transformation by harnessing data to guide medical decisions. However, the ethical and legal implications of AI in healthcare remain insufficiently explored(21). While AI offers technical advantages, it also raises concerns regarding patient safety and privacy(22). Issues such as fairness, autonomy, and accountability pose significant challenges to the integration of AI into healthcare, as legal systems continue to grapple with how to address these concerns(21). A notable research gap exists concerning liability for AI-driven decisions that result in patient harm, underscoring the urgent need for clear ethical and legal guidelines governing AI applications in healthcare. Because AI requires access to sensitive patient data, the risk of misuse or data breaches increases(23). According to the general data protection regulation, individuals have the right not to be subject to decisions based solely on automated processing (such as AI-driven systems) that could significantly affect them. Although AI can assist clinicians with clinical decision-making, it is unlikely to replace human clinicians in the near future(9, 24). The role of human clinicians, especially in critical decision-making, remains indispensable since AI is not yet capable of replicating human expertise(25, 26).

It is unsurprising that little has been reported in the literature regarding liability for harm caused by AI-driven algorithmic decisions; this may call into question the adequacy of existing legal frameworks(27, 28). Hence, AI inevitably introduces various ethical and legal concerns, along with paradigmatic shifts in healthcare, in which harm to human health is unacceptable. Therefore, it is of paramount importance to analyze and evaluate AI-driven progress within an ethical and legal framework. Overall, these considerations highlight the need to clarify the legal nature of AI, a challenging responsibility that includes identifying who bears liability for medical malpractice and ensuring appropriate compensation for patients harmed. Accordingly, it is essential that healthcare authorities-such as the Ministry of Health-along with governmental and regulatory bodies act responsibly, monitor emerging problem areas, and establish governance mechanisms to prevent conflicting outcomes. Therefore, scientific approaches developed using AI and healthcare research should form the foundation for decision-making, thereby highlighting the importance of relying on evidence-based findings. NLP techniques offer valuable opportunities for policymakers through text analytics. This study aims to generate insights for healthcare stakeholders by applying latent semantic analysis (LSA) to the most-cited scientific publications from the past decade in AI and healthcare. The study is considered original in both its focus and methodology and is expected to make a meaningful contribution to the literature.

Materials and Methods

Objective

The aim of the study is to generate insights for healthcare stakeholders by applying LSA to the most-cited scientific publications from the past decade in AI and healthcare.

Research Question

What insights and future horizons emerge from the convergence of AI and healthcare in scholarly literature?

Data Source and Research Unit

The data used in this study were obtained from the Web of Science (WoS) Core Collection, selected for its wide recognition and established reliability as a source of scientific literature, particularly for bibliometric and text-analytics research(29). The research was conducted on the assumption that publications retrieved from the WoS database adequately represent developments in the scientific domain, allowing the study to effectively address its research question. Therefore, the research unit of this study comprises the 1.000 most-cited articles published in the last ten years (2015-2025), identified in the database.

Search Strategy

The publication retrieval process was completed in three stages within the WoS database. In the first stage, study titles were searched using specific keywords related to AI. The keywords related to AI were defined as follows: artificial intelligence, AI, machine learning, deep learning, neural networks, artificial neural networks, convolutional neural networks, recurrent neural networks, reinforcement learning, supervised learning, unsupervised learning, semi-supervised learning, transfer learning, federated learning, explainable AI, generative AI, natural language processing, computer vision, speech recognition, pattern recognition, knowledge representation, expert systems, cognitive computing, symbolic AI, evolutionary algorithms, swarm intelligence, fuzzy logic, rule-based systems, agent-based modeling, data mining, data preprocessing, feature extraction, feature selection, predictive modeling, classification, clustering, regression analysis, dimensionality reduction, big data, data visualization, knowledge discovery, decision trees, random forest, support vector machine, gradient boosting, ensemble learning, text mining, sentiment analysis, topic modeling, latent Dirichlet allocation, latent semantic analysis, word embeddings, Word2Vec, GloVe, transformer models, BERT, GPT models, text classification, named entity recognition, machine translation, question answering, speech-to-text, text summarization, decision-support systems, predictive analytics, intelligent automation, robotics, autonomous vehicles, recommender systems, chatbots, virtual assistants, image recognition, fraud detection, anomaly detection, medical diagnosis, smart healthcare, precision medicine, remote monitoring, health informatics, personalized medicine, clinical decision-support, AI ethics, algorithm, training data, test data, validation set, model accuracy, overfitting, underfitting, hyperparameter tuning, cross-validation, loss function, gradient descent, optimization, feature engineering, model interpretability, bias and variance, computational complexity, algorithmic bias, data privacy, data security, fairness, transparency, accountability, explainability, trustworthy AI, responsible AI, human-centered AI, AI governance, artificial general intelligence, quantum machine learning, edge AI, internet of things, internet of medical things, cyber-physical systems, human-AI interaction, human-computer interaction, digital twins, cognitive robotics, sustainable AI, green AI.

In the second stage, a search was conducted using keywords related to healthcare. The keywords defined for the healthcare domain included healthcare, health care, health system, healthcare system, health services, public health, global health, primary healthcare, secondary healthcare, tertiary healthcare, preventive healthcare, curative healthcare, rehabilitation services, health policy, health management, health administration, health economics, health financing, health insurance, universal health coverage, health equity, health disparity, access to healthcare, quality of care, patient safety, health outcomes, health indicators, disease prevention, health promotion, telehealth, telemedicine, mobile health, mHealth, digital health, eHealth, virtual healthcare, remote monitoring, home healthcare, hospital care, primary care, ambulatory care, emergency care, intensive care, long-term care, palliative care, nursing care, mental health care, behavioral health, dental care, maternal health, child health, reproductive health, geriatric care, chronic disease management, preventive services, occupational health, environmental health, nutrition services, laboratory services, radiology services, pharmacy services, surgical services, health information systems, hospital information system, electronic health records, electronic medical records, health informatics, clinical decision-support systems, artificial intelligence in healthcare, machine learning in medicine, predictive analytics in healthcare, medical devices, digital therapeutics, wearable technology, smart healthcare, internet of medical things, precision medicine, personalized medicine, robotics in healthcare, remote diagnostics, virtual reality in healthcare, augmented reality in medicine, health expenditure, cost-effectiveness analysis, cost-utility analysis, cost-benefit analysis, economic evaluation, health technology assessment, resource allocation, efficiency in healthcare, healthcare cost, payment systems, reimbursement, sustainability in healthcare, financial risk protection, value-based healthcare, hospital efficiency, budget impact analysis, health workforce, healthcare professionals, physicians, nurses, pharmacists, allied health professionals, workforce planning, health leadership, hospital management, healthcare quality management, performance measurement, patient-centered care, integrated care, care coordination, patient satisfaction, patient experience, healthcare marketing, hospital accreditation, strategic management in healthcare, medical ethics, bioethics, patient rights, informed consent, data privacy in healthcare, confidentiality, medical law, healthcare regulation, health governance, ethical decision-making, health system strengthening, sustainable health systems, global burden of disease, health security, epidemics, pandemics, COVID-19, vaccine distribution, health resilience, social determinants of health, environmental determinants, climate change and health, one health, sustainable development goals, planetary health, health research, clinical trials, observational studies, epidemiology, evidence-based medicine, health data analytics, big data in healthcare, health statistics, population health, health outcome measurement, patient-reported outcomes, quality indicators, data-driven healthcare. In the third and final stage, the publications obtained from the previous steps were combined using the AND operator, thereby completing the retrieval process. The inclusion criteria required that publications be written in English, categorized as research articles, published in citation-indexed journals (SCI, SSCI, SCI-Expanded, ESCI, or SCOPUS), and classified in health- or medical-related WoS categories. These parameters guided the final selection of studies included in the analysis.

Statistical Analysis

Latent Semantic Analysis

The ever-growing size and complexity of textual data make it increasingly challenging to identify meaningful documents and patterns, not only in other disciplines(30) but also in health and medicine. In this context, text analytics has recently emerged as a strategic tool for analyzing textual data in healthcare and medical research. Text analytics is a multi-stage process that aims to derive meaningful insights from unstructured textual data. The methods used in text analytics are primarily implemented using NLP techniques. When the objective of NLP is to understand the contextual meaning of words, semantic modeling approaches are employed. Within this framework, LSA represents a semantic modeling method that applies statistical and mathematical techniques to analyze textual data and extract significant insights(31). Therefore, under the applied statistical framework, the primary goal of LSA is to uncover the contextual meaning of words within a large text corpus(32). Originating in the 1980s, the LSA method was initially designed as an information retrieval technique(33), but it was later introduced into psychological research as a theory and method for discovering and representing the meanings of words(34, 35). Similarly, Landauer and Dumais(36) proposed that LSA constitutes a fundamental computational theory of knowledge acquisition and representation. Through its use, LSA can capture word-passage, passage-passage, and sentence-sentence relationships in ways that align with human cognition(32). In this regard, LSA stands out as a fully automated method that employs mathematical and statistical techniques to infer information about the contextual use of words in textual data(37). Foltz(38) discussed the applicability of the LSA method in text-based research in three domains, while Shen and Ho(39) demonstrated its usefulness in technology-assisted higher education, confirming that LSA is a highly effective tool in text analytics. The workflow used for the LSA analysis in this study is shown in Figure 1.

The top 1.000 most-cited studies published in the last ten years were retrieved from the WoS database and exported to Microsoft Excel. A preliminary review of the publication titles revealed that the dataset contained no duplicate entries. The final dataset was then imported into the R programming environment(40) for LSA. The LSA procedure was conducted based on study abstracts, following similar approaches presented in the literature. Accordingly, the LSA process began with obtaining the text data from the WoS database. Before constructing the LSA model, the text corpus underwent data preprocessing. During this process, several operations were performed on the textual data, including tokenization, stopword removal, and lemmatization/stemming.

The first step in the LSA analysis involved transforming the text into a matrix in which each row represented a unique word and each column represented a passage or another content segment. The cells at the intersections of rows and columns indicated the frequency with which each word appeared in the text. In this step, a document-term matrix (DTM) was created, allowing textual data to be represented as vectors. Subsequently, word frequencies in the text data were transformed and weighted to reflect the relative importance of words within the corpus. For this purpose, a term frequency-inverse document frequency weighting scheme was applied to the DTM.

The second step of the LSA involved applying singular value decomposition (SVD) to the matrix. As LSA is based on SVD-a mathematical matrix decomposition technique similar to factor analysis(32)-this operation can be interpreted as a form of factor extraction. In the SVD method, a rectangular matrix is decomposed into the product of three distinct matrices: one defining the rows as vectors of derived orthogonal factor values; another defining the columns in a similar manner; and a third, diagonal matrix containing the singular values that scale the two orthogonal matrices. When these three matrices are multiplied, the original matrix is reconstructed. Thus, after the weighting process, SVD was applied to the final matrix to perform decomposition. This process resulted in the decomposition of the original matrix into its constituent matrices. Using SVD, singular values and explained variances were obtained. The optimal number of dimensions to optimize model performance was determined graphically, thereby defining the appropriate dimensional space for the textual data and achieving dimensionality reduction.

Following these stages, construction of the semantic space, visualization, interpretation, and generation of insights were carried out as part of the LSA process. In the R environment(40), several packages are available for performing semantic analysis. In this study, the lsa and tm packages were used for modeling, while ggplot2 was employed for visualization(41). Descriptive findings regarding the publication dataset were summarized as frequencies, whereas analytical results from the LSA process were presented as exploratory graphical visualizations to facilitate deeper interpretation.

Results

Descriptive findings regarding the publications are presented in Table 1.

According to Table 1, the most highly cited studies published in the past decade at the intersection of AI and healthcare primarily focus on the integration of deep learning, ML, and data-driven approaches into clinical applications. Studies published in high-impact journals such as Nature Medicine and Circulation indicate a highly multidisciplinary area of impact, encompassing fields including medical informatics, biomedical sciences, and oncology. A semantic examination reveals that research employing electronic health records to develop deep federated learning methods, along with AI applications designed for the diagnosis and prediction of coronavirus disease-2019 (COVID-19), has become particularly prominent. Furthermore, emerging approaches related to explainable AI and automation-based diagnostic systems demonstrate a clear orientation toward trustworthy, transparent, and clinically meaningful solutions. Overall, the findings suggest a strong convergence between the domains of AI and healthcare, indicating that these fields are becoming increasingly intertwined. Technological advances in AI are being progressively reflected in clinical decision-support systems and patient-centered applications, signaling a significant transformation in healthcare delivery. Findings related to the factor loadings of the LSA model are presented in Table 2.

As shown in Table 2, the singular value corresponding to the first dimension was 160.88, which is considerably higher than that of the second dimension. This indicates that the first dimension represents the most dominant semantic axis within the text space. In subsequent dimensions, the singular values gradually decrease, suggesting that each successive dimension carries less information than the preceding one, reflecting the principle of diminishing marginal contribution. This trend can be interpreted as evidence of successful SVD performance in the LSA model. According to the findings, the first dimension explains approximately 9% of the total variance in the model. When the first 10 dimensions are considered, they collectively account for about 23% of the variance, while the first 25 dimensions explain approximately 33% of the model’s total variance. This clearly demonstrates that roughly one-third of the semantic structure of the analyzed texts is represented by the first 25 dimensions. In terms of cumulative variance, the initial dimensions contribute significantly to the model, while the contribution of dimensions beyond the 20th progressively decreases. This pattern suggests that the model is approaching a point of saturation. This trend is consistent with the cumulative variance curve presented in Figure 2. Therefore, the findings of the LSA model highlight that the initial components capture a substantial portion of the semantic structure embedded within the text corpus. A systematic decline in singular values further indicates the model’s effective dimensionality-reduction performance, demonstrating that each component contributes meaningfully to the model’s explanatory capacity. Collectively, these results confirm that the LSA model effectively distinguishes the semantic structures within the text and attains optimal performance as the number of dimensions increases. Findings regarding the explained variance, optimal dimensionality, and document similarity distribution of the LSA model are presented in Figure 2.

The explained variance graph in the upper-left section shows that as the number of dimensions increases, the cumulative variance rises significantly, eventually reaching a plateau at around 160 dimensions. At this point, the LSA model explains approximately 80% of the total variance. This finding indicates that the optimal number of dimensions provides a balanced trade-off between model complexity and semantic representability. Therefore, obtaining additional dimensions beyond this threshold does not yield any meaningful improvement in the model’s performance or in the proportion of variance explained. The document similarity distribution in the upper-right graph demonstrates that a significant portion of the similarity scores between document pairs is concentrated around zero, suggesting substantial semantic diversity among the documents in the dataset. This finding indicates that the LSA model effectively distinguishes semantically distinct content within the corpus. The explained-variance graph in the lower-left section illustrates that the first components of the dataset account for a major portion of the variance in the LSA model, whereas the contributions of the subsequent components are markedly lower. This implies that the dominant semantic structure within the dataset is represented by a limited number of latent semantic components. The cumulative variance curve in the lower-right graph provides further clarity by showing that the incremental contribution of each additional dimension to the explained variance progressively decreases. Overall, the results demonstrate that the LSA model effectively captures the semantic structure of the dataset and produces outcomes that are dimensionally efficient. Accordingly, the number of dimensions obtained for the LSA model represents an optimal solution, effectively identifying semantic patterns while minimizing computational complexity. The correlation matrix illustrating the relationships among the documents is presented in Figure 3.

Figure 3 shows that correlations among documents vary considerably. This indicates that the documents included in the LSA model are not highly correlated. The predominance of light-colored areas in the correlation matrix indicates that a substantial portion of the documents are either uncorrelated or weakly correlated. This finding is particularly important for the LSA model, as its main objective is to generate distinct semantic dimensions that differentiate between documents. Conversely, the presence of light-blue or light-red areas suggests that weak or even negative correlations may exist between certain pairs of documents. This implies that some documents may partially share common semantic patterns within the corpus. Overall, no high-level correlations were observed in the LSA model, indicating that no excessive semantic overlap or redundant components were produced. Consequently, the semantic dimensions derived from the LSA model are found to be meaningfully distinct from one another. On the other hand, the presence of minor correlations-although not statistically significant-suggests the emergence of shared themes or semantic clusters within the dataset. The subsequent findings related to the LSA model are presented in Figure 4.

The upper-left graph shows changes in the number of publications over time. A noticeable increase in AI-related healthcare studies was observed in 2020, reflecting the exponential rise in AI research in healthcare during the COVID-19 pandemic. The upward trend in publication volume continued into 2021, but the number of publications appeared to stabilize thereafter, returning to pre-pandemic levels. Because this analysis is based on the top 1,000 most-cited publications, the observed trend must be compared with the overall publication trend to provide full contextual interpretation. The upper-right graph presents the two-dimensional representation of five document clusters generated by LSA. Each color in the graph represents a distinct cluster. Upon inspection, clusters 1 and 2 appear semantically similar, indicating that studies within these clusters share thematic content. In contrast, the scattered data points represent documents focusing on divergent thematic areas, demonstrating semantic diversity across the corpus. The group of points in the lower-left region of the graph clearly illustrates this dispersion.

The lower-left graph displays the distribution of documents across the first two semantic dimensions. The clustering pattern on the right side of the graph indicates that most documents share a semantic orientation. Areas with higher point density indicate documents that share strong conceptual similarity, whereas areas with points distributed toward the left or along the vertical axis likely represent unique or thematically distinct areas within the field. The lower-right graph shows the distribution of significant terms across the first two dimensions. The term “COVID” is clearly separated from the other terms, occupying a distinct region in semantic space. This finding suggests that the pandemic constitutes a thematically distinct subdomain within AI research in healthcare. Conversely, terms such as develop, deep, learn, pandemic, vaccine, healthcare, predict, outcome, identify, and risk are centrally clustered, implying a strong thematic association among studies focusing on AI-driven healthcare, deep learning, health outcomes, risk prediction, and vaccine-related research.

The isolated placement of the term “COVID” on the semantic map indicates that AI studies related to the pandemic constitute a distinct thematic domain within the LSA model. This also highlights the model’s capacity to semantically distinguish pandemic-related AI research from other healthcare applications effectively. Overall, the LSA model identifies COVID-19 and pandemic-oriented AI research as a distinct thematic area. It also underscores the significance of AI-assisted clinical decision systems, particularly those focusing on risk assessment, prediction, learning, and diagnostic processes. Furthermore, the clustering of AI-related studies around the terms “healthcare” and “outcomes” clearly reveals that health analytics within the field of AI has gained an increasingly applied, data-driven orientation. In summary, the structure revealed by the LSA model demonstrates that, while COVID-19 constitutes a semantically distinct theme, most other related concepts cluster around the broader domains of health technologies and AI applications in healthcare.

Discussion

The LSA model developed for the top 1.000 most-cited studies at the intersection of AI and healthcare over the past decade demonstrates the model’s ability to reveal the latent semantic structure embedded in the textual data. The findings derived from the SVD analysis show that the LSA model explains at least 80% of the total variance using 160 dimensions. Accordingly, the model achieved a high level of performance in distinguishing semantic clusters within the corpus. The LSA model results reveal that, in a two-dimensional semantic space, three distinct clusters form: two are semantically close to each other, whereas the third is positioned considerably farther away. Specifically, the words “predict” and “use” constituted the first cluster; “risk, outcome, deep, vaccine, medicine, precision, and image” formed the second cluster; and COVID represented the third and final cluster.

The concepts identified through the LSA model highlight the critical role of clinical integration between AI and healthcare domains, emphasizing their importance for this study. The positioning of the word “predict” as a partially separate cluster once again underscores the privileged role of prediction and foresight in clinical decision-making within healthcare. The distinct placement of the word “COVID”, on the other hand, provides clear evidence of the model’s accuracy and success, as the COVID-19 pandemic has been one of the most significant global health phenomena since December 2019(42). Although studies focusing on the relationship between AI and COVID-19 were predominant during the pandemic, their frequency has gradually declined in subsequent years. Nevertheless, the COVID-19 theme remains a central yet semantically distinct topic, differing substantially from mainstream AI-healthcare research areas. This is clearly illustrated in the word cloud presented in Figure 5.

While COVID-19 remains a prominent topic in AI-related studies, the LSA model revealed that COVID-19 does not constitute a mainstream research domain in healthcare-related AI literature. Instead, it occupies a semantically distant and unique position in the two-dimensional semantic space. This finding demonstrates that the LSA model effectively differentiated COVID-19 from the broader semantic themes at the intersection of AI and healthcare. In this respect, the LSA model yielded unexpected yet insightful results by uncovering a hidden dimension of AI research in healthcare. The positioning of COVID-19 as a semantically distinct cluster highlights not only the value of text analytics as a strategic tool for capturing developments in the health sector but also the potential for enhancing digital health capacity during global crises characterized by rapid growth in scientific publications. As highlighted in the literature, the pandemic witnessed a remarkable acceleration in telehealth initiatives and the implementation of remote healthcare services worldwide(43). Thus, the findings of this study emphasize the importance of effectively leveraging scientific advancements emerging during crisis periods to strengthen the potential of AI in healthcare. For this reason, policymakers are encouraged to transform this momentum into sustainable progress by strategically investing in digital health initiatives, particularly chronic disease management, early diagnosis, and risk prediction.

The finding that the COVID-19 theme occupies a semantically distinct position in the study demonstrates that research orientations in healthcare can shift rapidly during crises. Accordingly, it can be inferred that the pandemic has acted as a significant catalyst in shaping the clinical reflections and practical adoption of AI within healthcare. This observation aligns with findings reported in the existing literature(44). From a policymaking perspective, this underscores the necessity of flexible and adaptive research funding mechanisms that can respond swiftly to such thematic shifts during health crises such as COVID-19. Furthermore, the results of this study indicate that AI research in healthcare has become a sustained domain of inquiry. This, in turn, highlights the urgent need for a national health data strategy that establishes standards for data sharing, ethical frameworks, and interoperability across the healthcare system. The findings in the literature further support this conclusion(45). In this context, strengthening national data infrastructures is expected to yield significant advancements in the field of AI. The study demonstrates that themes such as deep learning and prediction are of critical importance to the field of health analytics, as these concepts play a key role in the integration of AI applications into clinical practice. Within analytic frameworks that utilize patient data, AI exhibits a growing convergence with healthcare, spanning a wide spectrum-from disease diagnosis to measurement of health outcomes. These observations are strongly supported by findings reported in previous research(46). Moreover, these developments highlight not only the need to enhance technological capacity but also the importance of strengthening human resource competencies in this field. Therefore, to maximize the benefits of AI applications in healthcare, it is recommended that AI-based health analytics training programs be developed collaboratively between universities and healthcare institutions, fostering multidisciplinary expertise.

Study Limitations

This study used the WoS database, which may have contributed to the variability in findings regarding AI-healthcare convergence. The literature was analyzed from a healthcare management perspective using a holistic approach based on LSA, an NLP technique for text data. Future research could use alternative modeling approaches to compare results and improve generalizability. Given the study’s limited dataset, collecting additional data in future research could strengthen the LSA models’ representational capacity and provide more comprehensive insights into this domain. This study was limited to the top 1.000 most-cited publications from the past decade. Future research could diversify or expand the research unit, thereby enabling a broader scope of analysis and more generalizable findings. Additionally, incorporating full-text data instead of abstracts could yield results with greater representational depth and semantic richness. To achieve this, future studies may consider employing word-embedding-based models, which could enhance the precision and contextual understanding in the analysis of textual data.

Conclusion

Finally, the findings of the LSA model demonstrate that data derived from AI-related studies in healthcare can serve as a valuable resource for policy analysis. Policymakers can leverage such data-driven analytical approaches to adopt evidence-based strategies in shaping future health policies. The LSA model used in this study revealed a significant convergence between the fields of AI and healthcare, indicating that these domains are becoming increasingly intertwined. It also showed that technological advancements in AI are increasingly reflected in clinical decision-support systems and patient-centered healthcare applications.

Drawing on Friedman’s(47) fundamental theorem, it can be asserted that a healthcare system enhanced by AI will outperform one without AI. Therefore, policymakers should develop a strategic framework to safely and ethically integrate AI into healthcare-particularly within the domains of clinical decision-support and patient care. This framework must prioritize regulatory standards, professional training, data privacy, and patient safety to ensure responsible implementation. Moreover, the study demonstrates that text analytics can serve as a strategic tool for capturing and monitoring developments in healthcare. Consequently, policymakers can use such text-analytic approaches to adopt data-driven decision-making frameworks when shaping healthcare policies and strategies.

AI-Assisted Tools Disclosure

Since the study used publicly available data, ethical approval was not required. However, all other ethical principles were followed throughout the research process. The methods used and the study’s main limitations have been clearly stated, and the results have been presented in an impartial and balanced manner. The article was written solely by the authors. OpenAI’s ChatGPT model(48) was used to translate the manuscript into English, and Anthropic’s Claude AI model(49) assisted in reviewing the translated text to ensure linguistic accuracy and professional quality.

Ethics

Ethics Committee Approval: Since the study used publicly available data, ethical approval was not required. However, all other ethical principles were followed throughout the research process.
Informed Consent: This study did not require informed consent as it involved no human participants, personal data collection, or interventional procedures.

Authorship Contributions

Concept: H.D., S.M., Design: H.D., S.M., Data Collection or Processing: H.D., Analysis or Interpretation: H.D., Literature Search: H.D., S.M., Writing: H.D., S.M.
Conflict of Interest: No conflict of interest was declared by the authors.
Financial Disclosure: The authors declared that this study received no financial support.

References

1
Abidi SSR, Abidi SR. Intelligent health data analytics: a convergence of artificial intelligence and big data. Healthc Manage Forum. 2019;32:178-82.
2
Fogel AL, Kvedar JC. Artificial intelligence powers digital medicine. NPJ Digit Med. 2018;1:5.
3
Marr B. Veri stratejisi: büyük veri ve nesnelerin interneti nasıl kar getirir? İçinde: Gündüz B, (editör). İstanbul: MediCat Yayınları; 2019.
4
Tuli S, Sandhu R, Buyya R. Shared data-aware dynamic resource provisioning and task scheduling for data intensive applications on hybrid clouds using Aneka. Future Gener Comput Syst. 2020;106:595-606.
5
Bates DW, Levine DM, Syrowatka A, et al. The potential of artificial intelligence to improve patient safety: a scoping review. NPJ Digit Med. 2021;4:54.
6
Alowais SA, Alghamdi SS, Alsuhebany N, et al. Revolutionizing healthcare: the role of artificial intelligence in clinical practice. BMC Med Educ. 2023;23:689.
7
Singhal S, Carlton S. The era of exponential improvement in healthcare? McKinsey & Company; 2019. Available from: https://www.mckinsey.com/industries/healthcare-systems-and-services/our-insights/the-era-of-exponential-improvement-in-healthcare
8
Chen B, Baur A, Stepniak M, Wang J. Finding the future of care provision: the role of smart hospitals. 2019. Available from: https://www.scribd.com/document/432936514/Finding-the-future-of-care-provision-the-role-of-smart-hospital
9
Jiang F, Jiang Y, Zhi H, et al. Artificial intelligence in healthcare: past, present and future. Stroke Vasc Neurol. 2017;2:230-43.
10
Panch T, Mattie H, Celi LA. The “inconvenient truth” about AI in healthcare. NPJ Digit Med. 2019;2:77.
11
Kaku M. Geleceğin fiziği. 3. baskı. İçinde: Oymak YS, Oymak H, (editörler). Ankara: ODTÜ Geliştirme Vakfı Yayıncılık; 2015.
12
Liang H, Tsui BY, Ni H, et al. Evaluation and accurate diagnoses of pediatric diseases using artificial intelligence. Nat Med. 2019;25:433-8.
13
Abràmoff MD, Lavin PT, Birch M, Shah N, Folk JC. Pivotal trial of an autonomous AI-based diagnostic system for detection of diabetic retinopathy in primary care offices. NPJ Digit Med. 2018;1:39.
14
Gulshan V, Peng L, Coram M, et al. Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA. 2016;316:2402-10.
15
Liu D, Cheng D, Houle TT, Chen L, Zhang W, Deng H. Machine learning methods for automatic pain assessment using facial expression information: protocol for a systematic review and meta-analysis. Medicine (Baltimore). 2018;97:e13421.
16
Li R, Zhang W, Suk H, et al. Deep learning based imaging data completion for improved brain disease diagnosis. Med Image Comput Comput Assist Interv. 2014;17:305-12.
17
Economopoulos KP, Mylonas KS, Stamou AA, et al. Laparoscopic versus robotic adrenalectomy: a comprehensive meta-analysis. Int J Surg. 2017;38:95-104.
18
Wei S, Chen M, Chen N, Liu L. Feasibility and safety of robot-assisted thoracic surgery for lung lobectomy in patients with non-small cell lung cancer: a systematic review and meta-analysis. World J Surg Oncol. 2017;15:98.
19
Lauridsen SV, Tønnesen H, Jensen BT, Neuner B, Thind P, Thomsen T. Complications and health-related quality of life after robot-assisted versus open radical cystectomy: a systematic review and meta-analysis of four RCTs. Syst Rev. 2017;6:150.
20
Roh HF, Nam SH, Kim JM. Robot-assisted laparoscopic surgery versus conventional laparoscopic surgery in randomized controlled trials: a systematic review and meta-analysis. PLoS One. 2018;13:e0191628.
21
Schönberger D. Artificial intelligence in healthcare: a critical analysis of the legal and ethical implications. Int J Law Inf Technol. 2019;27:171-203.
22
Van Biesen W, Decruyenaere J, Sideri K, Cockbain J, Sterckx S. Remote digital monitoring of medication intake: methodological, medical, ethical and legal reflections. Acta Clin Belg. 2021;76:209-16.
23
Forcier MB, Gallois H, Mullan S, Joly Y. Integrating artificial intelligence into health care through data access: can the GDPR act as a beacon for policymakers? J Law Biosci. 2019;6:317-35.
24
Davenport T, Kalakota R. The potential for artificial intelligence in healthcare. Future Healthc J. 2019;6:94-8.
25
Asan O, Bayrak AE, Choudhury A. Artificial intelligence and human trust in healthcare: focus on clinicians. J Med Internet Res. 2020;22:e15154.
26
Pereira KR, Sinha R. Welcome the “new kid on the block” into the family: artificial intelligence in oral and maxillofacial surgery. Br J Oral Maxillofac Surg. 2020;58:83-4.
27
Jobin A, Ienca M, Vayena E. The global landscape of AI ethics guidelines. Nat Mach Intell. 2019;1:389-99.
28
Morley J, Machado CCV, Burr C, et al. The ethics of AI in health care: a mapping review. Soc Sci Med. 2020;260:113172.
29
Islam MM, Poly TN, Alsinglawi B, et al. Application of artificial intelligence in COVID-19 pandemic: bibliometric analysis. Healthcare (Basel). 2021;9:441.
30
Evangelopoulos N, Zhang X, Prybutok VR. Latent semantic analysis: five methodological recommendations. Eur J Inf Syst. 2012;21:70-86.
31
Wolfe MBW, Schreiner ME, Rehder B, et al. Learning from text: matching readers and texts by latent semantic analysis. Discourse Process. 1998;25:309-36.
32
Landauer TK, Foltz PW, Laham D. An introduction to latent semantic analysis. Discourse Process. 1998;25:259-84.
33
Deerwester S, Dumais ST, Furnas GW, Landauer TK, Harshman R. Indexing by latent semantic analysis. J Am Soc Inf Sci. 1990;41:391-407.
34
Landauer TK, Laham D, Derr M. From paragraph to graph: latent semantic analysis for information visualization. Proc Natl Acad Sci U S A. 2004;101(Suppl 1):5214-9.
35
Landauer TK. LSA as a theory of meaning. In: Landauer TK, McNamara DS, Dennis S, Kintsch W, (editors). Handbook of latent semantic analysis. Mahwah, NJ: Lawrence Erlbaum Associates; 2007:3-32.
36
Landauer TK, Dumais ST. A solution to Plato’s problem: the latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychol Rev. 1997;104:211-40.
37
Valdez D, Pickett AC, Goodson P. Topic modeling: latent semantic analysis for the social sciences. Soc Sci Q. 2018;99:1665-79.
38
Foltz PW. Latent semantic analysis for text-based research. Behav Res Methods Instrum Comput. 1996;28:197-202.
39
Shen CW, Ho JT. Technology-enhanced learning in higher education: a bibliometric analysis with latent semantic approach. Comput Human Behav. 2020;104:106177.
40
R Core Team. R: A Language and environment for statistical computing. Vienna: R Foundation for Statistical Computing; 2025. Available from: https://cran.r-project.org/doc/manuals/r-release/fullrefman.pdf
41
Günther F, Dudschig C, Kaup B. LSAfun--An R package for computations based on latent semantic analysis. Behav Res Methods. 2015;47:930-44.
42
Liu YC, Kuo RL, Shih SR. COVID-19: the first documented coronavirus pandemic in history. Biomed J. 2020;43:328-33.
43
Smolić Š, Blaževski N, Fabijančić M. Remote healthcare during the COVID-19 pandemic: findings for older adults in 27 European countries and Israel. Front Public Health. 2022;10:921379.
44
Adadi A, Lahmer M, Nasiri S. Artificial Intelligence and COVID-19: a systematic umbrella review and roads ahead. J King Saud Univ Comput Inf Sci. 2022;34:5898-920.
45
Bajwa J, Munir U, Nori A, Williams B. Artificial intelligence in healthcare: transforming the practice of medicine. Future Healthc J. 2021;8:e188-94.
46
Suman AP, Srivastava NK, Sharma P, Tyagi P. The convergence of artificial intelligence and healthcare in revolutionizing diagnosis, treatment, and personalization: a review. In: Proc 3 rd Int Conf Disruptive Technol (ICDT). Greater Noida, India; 2025:1026-32.
47
Friedman CP. A “fundamental theorem” of biomedical informatics. J Am Med Inform Assoc. 2009;16:169-70.
48
OpenAI. ChatGPT (Version 4) [AI model]. OpenAI; 2025. Available from: https://www.openai.com/chatgpt
49
Anthropic. Claude AI (Version 1.0) [AI language model]. Anthropic; 2025. Available from: https://www.anthropic.com