2025-04-11
【學術亮點】SUMO-LMNet:使用高維特徵預測 SUMO1 與 SUMO2 中 SUMOylation 位點的無損映射網路
Font Size
Small
Middle
Large
Intelligent Cultivation: Using physiological indicators to Establish a Smart Health Early Warning Platform for Crops【Institute of Genomics and Bioinformatics / Yen-Wei Chu / Professor】
智慧栽培:應用生理指標建立超前預警之作物栽培管理平台【基因體暨生物資訊學研究所 朱彥煒教授】
上架日期:2025/03/14
智慧栽培:應用生理指標建立超前預警之作物栽培管理平台【基因體暨生物資訊學研究所 朱彥煒教授】
論文篇名 | 英文:SUMO-LMNet: Lossless mapping network for predicting SUMOylation sites in SUMO1 and SUMO2 using high-dimensional features 中文:SUMO-LMNet:使用高維特徵預測 SUMO1 與 SUMO2 中 SUMOylation 位點的無損映射網路 |
期刊名稱 | Computational and Structural Biotechnology Journal |
發表年份, 卷數, 起迄頁數 | 2025, 27, 1048-1059 |
作者 | Cheng-Hsun Ho, Yen-Wei Chu(朱彥煒)*, Lan-Ying Huang, Chi-Wei Chen |
DOI | 10.1016/j.csbj.2025.03.005 |
中文摘要 | 準確預測 SUMOylation(小泛素修飾)位點對於解析基因調控機制與疾病相關途徑至關重要。然而,由於 SUMO1 與 SUMO2 結構相似,兩者修飾位點難以區分,對預測模型構成挑戰。傳統模型常無法有效區分這兩種同源蛋白,限制其在生物學研究中的應用。為此,我們提出 SUMO-LMNet,一種基於深度學習的預測架構,專為精準辨識 SUMO1 與 SUMO2 修飾位點設計。SUMO-LMNet 結合無損映射策略與深度學習架構,不僅提升預測準確性,也強化模型可解釋性。模型從蛋白序列中擷取高維特徵,並轉換為二維特徵圖,使卷積神經網路(CNN)能有效學習資料中的局部與全域特徵關聯。透過 Lossless Mapping Network(LM-Net),特徵空間得以完整保留,避免空間資訊遺失。雖然 Grad-CAM 可針對單一樣本突顯關鍵特徵,但缺乏跨樣本一致性與整體性評估,因此我們提出 CHFA(Combined Heatmap Feature Analysis),以聚合多樣本特徵重要性,提供更可靠且具全局視角的特徵分析。實驗結果顯示,SUMO1 與 SUMO2 在特徵依賴性上存在顯著差異,證實針對同源蛋白個別建模的必要性。多種神經網路架構比較結果亦顯示,我們的模型在區分 SUMO1 與 SUMO2 修飾位點方面達到超過 80% 的準確率。此模型可協助實驗設計,優先篩選具生物意義的修飾位點,加速 SUMOylation 標的的發現。SUMO-LMNet 已公開提供下載。https://predictor.isu.edu.tw/sumo-lmnet. |
英文摘要 | Accurate SUMOylation site prediction is crucial for deciphering gene regulation and disease mechanisms. However, distinguishing SUMO1 and SUMO2 modifications remains a major challenge due to their structural similarities. Conventional prediction models often struggle to differentiate between these paralogues, limiting their applicability in biological research. To address this, we introduce SUMO-LMNet, a deep learning-based framework for the precise prediction of SUMO1 and SUMO2 sites. Unlike previous models, SUMO-LMNet integrates a lossless mapping strategy and deep learning architectures to enhance both prediction accuracy and interpretability. Our model extracts high-dimensional features from sequences and transforms them into two-dimensional feature maps, enabling convolutional neural networks (CNNs) to effectively capture both local and global dependencies within the data. By leveraging a Lossless Mapping Network (LM-Net), this approach preserves the original feature space, ensuring that feature integrity is retained without loss of spatial information. While Grad-CAM highlights key features in individual predictions, it lacks consistency across samples and does not provide a dataset-wide evaluation of feature importance. To address this, we introduce Combined Heatmap Feature Analysis (CHFA), which systematically aggregates feature importance across multiple samples, providing a more reliable and interpretable dataset-wide assessment. Experimental results reveal distinct feature dependencies between SUMO1 and SUMO2, underscoring the necessity of paralogue-specific predictive models. Through a systematic comparison of multiple neural network architectures, we demonstrate that our model achieves over 80 % accuracy in distinguishing SUMO1 and SUMO2 modification sites. By prioritizing candidate sites for further study, our model aids experimental design and accelerates the discovery of biologically relevant SUMOylation targets. SUMO-LMNet is publicly available at https://predictor.isu.edu.tw/sumo-lmnet. |
發表成果與本中心研究主題相關性 | 本研究成果與AI計畫的主題具有深度契合。SUMO-LMNet 模型展示了人工智慧於生物資訊領域的高度應用潛力,透過深度學習技術精準預測 SUMO1 與 SUMO2 修飾位點,協助解析基因調控與疾病機制。該模型創新性地導入 Lossless Mapping 將蛋白質序列特徵轉換為二維影像表示,使卷積神經網路(CNN)能有效捕捉局部與全域依賴關係;同時結合 Grad-CAM 與自創的 CHFA 機制,提升模型可解釋性,有助於深入分析特徵與生物功能的關聯。藉由建構 paralogue-specific 模型架構,SUMO-LMNet 成功克服高度序列相似所造成的預測困難,顯著提升分類準確率。整體而言,此研究不僅展現 AI 在蛋白質修飾預測的創新應用,更提供實驗設計的依據,實現資料驅動的生物研究加速目標,符合當前人工智慧與生命科學跨域整合的發展趨勢。 |