Identifying inflammatory bowel disease subtypes: a comprehensive exploration of transcriptomic data and machine learning-based approaches
Abstract
Background: Inflammatory bowel disease (IBD), encompassing Crohn's disease (CD) and ulcerative colitis (UC), is a heterogeneous condition characterised by chronic gastrointestinal inflammation and dysregulated immune responses. Despite advances in transcriptomic analysis and machine learning (ML), consistent molecular subtyping across datasets remains a challenge. There is a critical need for robust subtypes that reflect disease heterogeneity and correlate with clinical outcomes.
Objectives: Unlike prior studies focused on either UC or CD or based on small datasets, this study analyses a large-scale RNA sequencing (RNA-seq) dataset to identify transcriptomic subtypes in both UC and CD.
Design: We analysed RNA-seq data from four prospective cross-sectional cohorts from Gene Expression Omnibus: GSE193677, GSE186507, GSE137344 and GSE235236.
Methods: Analysed RNA-sequenced data from inflamed and non-inflamed intestinal biopsies of 2490 adult IBD patients. K-means clustering was applied independently to UC and CD samples to identify transcriptomic clusters. Gene set enrichment and network analyses explored molecular characteristics. Associations with clinical metadata, including disease severity and anatomical involvement, were assessed using Chi-square and analysis of variance tests.
Results: K-means clustering revealed three distinct transcriptomic subtypes in both UC and CD. In UC, Cluster 1 was enriched for RNA processing and DNA repair genes; Cluster 2 highlighted autophagy, stress responses and upregulation of ATG13, VPS37C and DVL2; Cluster 3 emphasised cytoskeletal organisation (SRF, SRC and ABL1). In CD, Cluster 1 featured cytoskeletal remodelling and suppressed protein synthesis (CFL1, F11R and RAD23A), while Cluster 2 upregulated stress and translation pathways. Cluster 3 again prioritised cytoskeletal structure over metabolic activity. Cluster 3 in both conditions was significantly associated with moderate-to-severe endoscopic activity; Cluster 1 was enriched in inactive or mild disease.
Conclusion: We report three transcriptomic subtypes in UC and CD, each with distinct molecular signatures and clinical relevance. These findings support a stratified approach to IBD diagnosis and therapy, enabling more personalised disease management strategies.
Author
Date
2025-08-12
Type
Article
Subject
Inflammatory bowel diseases, transcriptomics, Transcription, genetic
Collections
Citation
Saini N, Acharjee A. Identifying inflammatory bowel disease subtypes: a comprehensive exploration of transcriptomic data and machine learning-based approaches. Therap Adv Gastroenterol. 2025 Aug 12;18:17562848251362391. doi: 10.1177/17562848251362391.
Journal / Source Title
Therapeutic advances in gastroenterology
DOI
10.1177/17562848251362391
PMID
40808866
Publisher
Sage Publications
Publisher’s URL
https://journals.sagepub.com/home/tag
https://pmc.ncbi.nlm.nih.gov/journals/?term=101478893
https://pmc.ncbi.nlm.nih.gov/journals/?term=101478893
