Abstract
Objectives
Keratan sulfate proteoglycans (KSPGs) are the least studied of the proteoglycans with emerging roles in cancer. Diverse roles of KSPGs are known; however, the KSPG interactome has not been defined to date. The aim here was to define a KSPG interactome and to determine biomarkers in cancer patients using published datasets and bioinformatic analysis tools.
Methods
The STRING database was utilized to compile a KSPG interactome centered around keratocan (KERA), lumican, fibromodulin and osteoglycin. STRING, Xena Browser, Gene Set Cancer Analysis Platform, and SmulTCan were used to perform data analyses.
Results
A 56-protein KSPG interactome network was discovered. Extracellular matrix (ECM) organization, PI3K-AKT signaling, ECM-receptor interactions, and nicotinamide metabolism pathways were enriched in the interactome. Multivariate prognostic modeling with Glmnet identified a seven-gene prognostic signature in kidney renal papillary cell carcinoma (KIRP), stratifying patients into distinct risk groups with strong predictive accuracy (area under the curve of 78.9%). Among these, KERA emerged as a biomarker of poor prognosis, while nicotinamide mononucleotide adenylyltransferase 1 (NMNAT1) emerged as a protective biomarker in KIRP.
Conclusion
This study describes a methodology to utilize available datasets for protein interactome discovery and for querying cancer prognostic genes. In this study, an extensive KSPG interactome was identified. This is the first report linking KERA and NMNAT1 to prognosis in KIRP.
Introduction
The extracellular matrix (ECM) surrounding the tumor is the first stromal component encountered by metastatic cancer cells before they move to distant sites. Traditionally considered a physical blockade, the ECM is now recognized as a dynamic regulator of tumor invasion, metastasis, and immune modulation. Increased ECM stiffness can suppress metastatic spread, whereas activation or overexpression of matrix metalloproteinases that digest the ECM has been associated with increased cellular migratory capacity and metastasis(1, 2). The influence of the ECM on metastatic cancer cells is complex, owing to its highly intricate and dynamic composition. The main macromolecular constituents are collagens and proteoglycans, both diverse groups that contribute to the organ- and tissue-specific composition of the ECM(3). Proteoglycans are classified into several types, named after their glycosaminoglycan side chains, such as heparan sulfate, dermatan sulfate, or keratan sulfate. The determinants of the side chains are the specific types of disaccharide repeats, such as glucuronic acid or iduronic acid paired with glucosamine for heparan sulfates, and galactose paired with N-acetylglucosamine for keratan sulfates. Each proteoglycan molecule may have different lengths of disaccharide repeats, different sulfate loads on these repeats, and different core proteins(3). Diversity in structure is likely to be reflected in functional diversity, making it especially challenging to pinpoint the roles of a particular proteoglycan(4).
Most research on the role of proteoglycans in cancer metastasis has focused on heparan sulfate proteoglycans(5). Keratan sulfate proteoglycans (KSPGs), on the other hand, are the least studied in this context(6). Recent transcriptomic studies have identified that carbohydrate sulfotransferase 6
(CHST6), a keratan sulfate-specific sulfotransferase, is part of a genetic signature associated with poor prognosis in cancer(7-9). Sulfation of KSPGs is a major influence on their function and solubility, as evidenced by the formation of KSPG aggregates in patients with macular corneal dystrophy, which is linked to CHST6 mutations(10-12). These data indicate that KSPG regulation is important in cancer progression.
Reports on the role of lumican (LUM) in cancer progression are contradictory(13). LUM overexpression has been correlated with poor prognosis in colorectal, lung, and squamous cell carcinomas(14-17). In contrast, LUM overexpression was shown to decrease the malignancy of melanoma cells and to correlate negatively with osteosarcoma tumor progression(18). These contradictory findings may be related to the location of the KSPG: in the ECM or within the cancer cell itself. Additionally, differences in chain lengths and sulfation status of LUM in different cell types influence its interactions with other proteins and its roles(19).
KSPG binding proteins are not fully defined in the literature, as exemplified by the absence of a dedicated gene ontology (GO) term (https://www.ebi.ac.uk/QuickGO/). Proteoglycan binding GO term contains child terms that include syndecan binding (GO:0045545), chondroitin sulfate proteoglycan binding (GO:0035373), heparan sulfate proteoglycan binding (GO:0043395). This study aimed to define the KSPG interactome and to explore the expression and prognostic significance of its components in cancer, using published datasets and openly available bioinformatic tools and databases.
Here, KSPG-interacting proteins were identified through manual queries in STRING(20). The STRING database integrates both experimentally validated interactions and computational predictions based on co-expression, literature mining, and curated databases(20). Then, data from The Cancer Genome Atlas (TCGA) Pan-Cancer study were analyzed using the SmulTCan app to investigate gene expression patterns and correlations with survival in cancer patients(21). This analysis revealed a candidate prognostic gene set in the tumor microenvironment of kidney renal papillary cell carcinoma (KIRP).
Materials and Methods
KSPG Interactome Discovery
Selected KSPGs were manually entered into the STRING search query. Network display options were adjusted to reveal only high-confidence interactions, defined as the “physical subnetwork”. A medium-confidence level was selected, and the first and second shells of the interactome were included in the network. Functional enrichment analysis was performed using STRING’s built-in functions(20).
Statistical Analysis
The KSPG interactome gene set was queried in the Gene Set Cancer Analysis Platform (https://guolab.wchscu.cn/GSCA/#/), to evaluate the correlation of the KSPG interactome expression with drug sensitivity(22, 23).
Multivariate Prognostic Modeling
Gene expression and survival data of the TCGA Pan-Cancer study was retrieved from Xena Browser and uploaded to the SmulTCan app(21, 24, 25). Univariate Cox proportional hazards (CPH) regression analyses and Glmnet analyses were performed for each cancer type(21). Genes with p-values less than 0.05 were considered statistically significant. Forest plots were generated using Python (v3.10) with the matplotlib and seaborn libraries to visualize hazard ratios, 95% confidence intervals, and corresponding p-values.
Ethical Aspects of the Research
This research was conducted in accordance with the principles of the Declaration of Helsinki. Datasets available in public repositories were used. Therefore there is no ethical consern related to the work reported here.
Results
KSPG Interactome Discovery and Functional Enrichment Analysis
To compile a list of proteins interacting with KSPGs, the STRING database was queried for the four major KSPG core proteins: LUM, keratocan (KERA), fibromodulin (FMOD), and osteoglycin (OSG/mimecan). Experimentally validated interactors, as well as first- and second-shell interaction partners, were compiled (Table 1). LUM exhibited the largest interaction network, with experimentally validated interactors: zinc finger protein 488 (ZNF488), cystathionine gamma-lyase (CTH), and matrix metalloproteinase 14 (MMP14). The KERA subnetwork included direct physical interactors bleomycin hydrolase (BLMH), nudix hydrolase 12 (NUDT12), and EP300 interacting inhibitor of differentiation 2 (EID2). FMOD showed high-confidence binding to transforming growth factor-beta (TGFB) 1, and this binding extended to include other TGFB isoforms and matrix proteins. OSG was experimentally linked to collagen type XIV alpha 1 chain (COL14A1), proto-oncogene tyrosine-protein kinase Src (SRC), major histocompatibility complex, class II, DR beta 1 (HLA-DRB1), and canopy FGF signaling regulator 3 (CNPY3).
The identified KSPG interactome contains 56 proteins and forms a highly interconnected network that is visualized and functionally annotated using the STRING database (Figure 1A, Table S1). GO biological process enrichment analysis highlighted ECM organization as the most significantly enriched term [false discovery rate (FDR) <1e-23], with more than 20 genes contributing (Figure 1B). Closely related terms included extracellular structure organization, collagen fibril organization, and connective tissue development. Additional enrichment in cell adhesion and in positive regulation of smooth muscle cell proliferation was detected. Proteoglycans in cancer was the top KEGG pathway (FDR <1e-9), followed by focal adhesion, ECM-receptor interaction, phosphoinositide 3-kinase-AKT (PI3K-AKT) signaling, and TGFB signaling pathways, Interestingly, terms related to nicotinamide adenine dinucleotide (NADH) metabolism were enriched in both GO and KEGG pathway analyses.
Prognostic Associations of the KSPG Interactome Across Cancer Types
Correlation between mRNA expression profiles and drug sensitivity was assessed using the genomics of drug sensitivity in cancer portal and the Pan-Cancer dataset(23). The top eight drugs with significant correlations with KSPG gene expression patterns and drug response were identified (Figure 2A, Table S2). Response to BHG712, a selective EphB4 inhibitor, was found to be strongly correlated with the expression levels of most genes in the KSPG interactome. Among these genes, NMNAT2, ZNF488, TPBG, ITGB1, ITGB2, CDCP1, and FN1 were positively correlated, while ENPP3, BLMH, HLA-DRB1, and CNPY3 were negatively correlated. Response to BRAF and MEK inhibitors (Dabrafenib, PLX4720, SB590885, CI-1040, and RDEA119) were negatively correlated with expression levels of LUM, FN1, CXCL1, COL16A1, MMP14, HSPG2 and MLANA. Expression levels of the interactome genes in skin cutaneous melanoma (SKCM), uterine endometrial cancer (UCEC), and KIRP are plotted in Figure 2B. SKCM was chosen because the KSPG interactome gene set alteration frequency is highest (data not shown); UCEC and KIRP were chosen to complement CPH analyses explained below. Most KSPG genes are overexpressed in cancer patients. Among the core KSPGs, LUM, FMOD, and biglycan were highly overexpressed; decorin, osteomodulin, and aggrecan were moderately overexpressed; KERA was not overexpressed in the analyzed cancer types.
Next, CPH regression and Glmnet analyses were using the SmulTCan app across TCGA cancer types to define prognostic gene sets. The CPH analysis revealed specific multi-gene signatures in various cancer types, with uterine corpus endometrial carcinoma (UCEC) and KIRP exhibiting the most extensive lists of genes (Table S3). CPH predicted a highly significant multigene prognostic signature comprising 18 genes associated with survival in KIRP (p<0.05). These genes included the proteoglycans LUM, FMOD, and KERA, which are significantly associated with poor prognosis (Figure 3A). The metabolic genes NMNAT1, NUDT12, and ENPP1 were significantly associated with a better prognosis. Other genes with prognostic value were CXCL1, MLANA, CHRM1, ZNF488, BLMH, and CTH. The model effectively stratified patients into high- and low-risk groups, which exhibited a significant difference in overall survival on the Kaplan-Meier plot (Figure 3B).
Next, the Glmnet method was applied to select the best gene subset; however, a fitted model was obtained only for KIRP. The KIRP prognostic signature included KERA, ZNF488 and CTH with positive coefficients, suggesting their expression is associated with increased risk and poorer survival (Figure 3C). In contrast, nicotinamide mononucleotide adenylyltransferase 1 (NMNAT1) was assigned a negative coefficient, consistent with a protective effect. Additional low-magnitude positive coefficients were retained for COL11A2, COL1A1, and COL5A2. Receiver operating characteristic (ROC) curve analysis yielded an area under the curve (AUC) of 78.9%, indicating strong predictive accuracy (Figure 3D).
Discussion
KSPGs are found both extracellularly and intracellularly, with capacity to interact with a large number and variety of proteins(4). Nevertheless, the experimental data on KSPG-interacting proteins are fragmented, and a definition of the KSPG interactome is needed. To address this gap, we used the STRING database to construct a physical interaction network centered on the KSPG core proteins KERA, LUM, FMOD, and OGN and identified a 56-protein KSPG interactome. The intereactome not only included ECM related proteins but also major cancer related pathways including PI3K-AKT signaling. Enrichment of focal adhesion, integrin binding, and TGFB binding may be linked to the structural remodeling required for cancer metastasis.
The composition of the ECM is known to modulate drug sensitivity in cancer(26). Correlations between gene expression and BRAF-MEK inhibitors suggest a potential modulation of drug sensitivity via the KSPG interactome. The prognostic potential of KSPG interactome genes was analyzed via univariate CPH, and best subsets were selected using Glmnet multivariate penalized Cox regression. This approach enables simultaneous variable selection and regularization, identifying a minimal gene set with optimal predictive performance while excluding redundant or non-informative features(21). The CPH approach identified numerous genes with significant associations with survival across tumor types, including uterine corpus endometrial carcinoma UCEC and KIRP. While this method is straightforward and sensitive to any gene-outcome association, it does not account for correlations among genes, potentially overestimating the number of relevant variables. By contrast, Glmnet applies lasso regularization to select a parsimonious multigene model with independent predictive contributions and improved generalizability(21). However, this increased stringency can lead to failure to retain variables in the final model when predictors are highly correlated or when the incremental predictive value is modest relative to noise. This scenario occurred in UCEC, where no multigene signature was selected despite numerous univariate associations. However, in KIRP, Glmnet successfully identified a multigene signature consisting of KERA, NMNAT1, ZNF488, COL5A2, COL1A1, COL11A2, and CTH. The transcription factor ZNF488 is functionally linked to LUM and COL5A3 in the STRING network. Recently, the roles of ZNF488 in cancer progression and resistance have been indicated, although the mechanism and its link to KSPGs have not been studied (27, 28). KERA had a positive coefficient, indicating that higher expression is associated with worse survival. Interestingly, KERA expression in KIRP is markedly lower than in other cancers and than that of other genes in the KSPG interactome. However, high variability in gene expression among patients was observed, suggesting that KERA expression must be kept very low for better patient prognosis.
KERA encodes a small leucine-rich proteoglycan involved in collagen organization and ECM architecture, but its role in cancer has been described in only a few studies(29, 30). NMNAT1 had the most negative coefficient in the penalized model, suggesting a protective association in KIRP. Enrichment of the nicotinate and nicotinamide metabolism pathways suggested involvement of the KSPG network in regulating energy metabolism, which is further supported by the prognostic value of NMNAT1 identified in this study. The NMNAT1 enzyme catalyzes a critical step in NAD⁺ biosynthesis, potentially supporting metabolic homeostasis and resilience to oxidative stress. Notably, NMNAT1 is part of the KERA-centered STRING subnetwork together with NMNAT2, ENPP1, ENPP3, and NUDT12. These proteins regulate nicotinate and nicotinamide metabolism. Human NMNAT enzymes catalyze the formation of NAD⁺ from adenosine triphosphate and nicotinamide mononucleotide (NMN) and the cleavage of NADH in the reverse reaction(31). NUDT12 is a peroxisomal NADH pyrophosphatase and an mRNA-decapping enzyme that specifically removes the NAD cap from a subset of transcripts, thereby generating NMN and promoting mRNA decay, particularly under nutrient stress conditions(32, 33). It also hydrolyzes free NAD(P)H and diadenosine diphosphates to regulate peroxisomal nicotinamide nucleotide pools required for oxidative metabolism. Together, these findings suggest a potentially relevant ECM-metabolism regulatory axis, linking KERA and its interactors to energy metabolism and nucleotide turnover within the tumor microenvironment.
This analysis leveraged the SmulTCan web application, which integrates both univariate and multivariate survival modeling in an accessible interface(21). SmulTCan proved particularly useful for rapidly evaluating prognostic associations without requiring advanced bioinformatics expertise or command-line tools, making it well suited for translational researchers and clinicians who wish to explore publicly available cancer datasets reproducibly. The methodology followed here illustrates that integrating curated protein interaction networks with public genomics data can reveal cancer-type-specific gene modules associated with prognosis and drug response.
Importantly, a KSPG interactome was identified and proposed in this study. KERA and NMNAT1 were identified as having high prognostic value in KIRP, a finding not previously reported. Future studies are needed to functionally validate the contributions of the candidate genes identified here and to determine whether they can inform stratified therapeutic strategies or combination treatments that target ECM remodeling and tumor metabolism.


