2026-05-22
【學術亮點-頂級期刊論文】SMART:用於植物病害診斷之結構化文本多模態農業檢索增強 Transformer
字體大小
小
中
大
Intelligent Service: Large-scale Agricultural AI Models
【Department of Civil Engineering / Ming-Der Yang / Tenured Distinguished Professor】
智慧服務:可大規模擴展之農業AI 模型【土木工程學系楊明德終身特聘教授】
上架日期2026-05-17
【Department of Civil Engineering / Ming-Der Yang / Tenured Distinguished Professor】
智慧服務:可大規模擴展之農業AI 模型【土木工程學系楊明德終身特聘教授】
| 論文篇名 | 英文:SMART: Structured multimodal agricultural retrieval-augmented transformer for plant disease diagnosis 中文:SMART:用於植物病害診斷之結構化文本多模態農業檢索增強 Transformer |
| 期刊名稱 | Computers and Electronics in Agriculture (指標清單期刊) |
| 發表年份, 卷數, 起迄頁數 | 2026,250, no.111882 |
| 作者 | Yuan-Chia Chan, Yao-Chung Fan (范耀中), Hung-Chung Li (李宏中), Ming-Der Yang (楊明德)* |
| DOI | 10.1016/j.compag.2026.111882 |
| 中文摘要 | 本研究提出 SMART(Structured Multimodal Agricultural Retrieval-augmented Transformer),一套用於植物病害診斷的結構化多模態檢索增強 Transformer 架構。傳統植物病害影像分類模型多依賴單一視覺訊號,容易受到田間影像背景雜訊、病徵相似、長尾類別分布與使用者輸入不完整等因素影響,導致實際應用時的診斷穩定性與解釋能力受限。為解決上述問題,本研究整合病害影像、結構化語意標註、病徵描述與相似案例檢索,建構可結合視覺特徵與文字語意的多模態診斷流程。 SMART 透過結構化病害標籤與語意錨點強化病徵表徵,並導入檢索增強機制,使模型能在推論階段參考相似影像與相關農業知識,提升對細粒度病害差異的辨識能力。相較於單純影像分類或一般文字生成式方法,本研究強調「視覺辨識、語意對齊、案例檢索與可解釋診斷」的整合,能提供更具脈絡的植物病害判斷依據。研究結果顯示,SMART 可有效改善多模態植物病害診斷表現,並展現其於智慧農業、植物保健決策支援與田間病害管理應用上的潛力。 |
| 英文摘要 | Real-world plant disease diagnosis requires robustness to long-tailed class distributions and variable input quality, yet prevailing multimodal approaches do not explicitly address these challenges. This study presents SMART, a framework that integrates structured linguistic guidance with retrieval-augmented inference. Evaluation of 37,586 images across 48 disease categories yields three findings. First, structured captions outperform LLM-generated narratives, with the vision-only baseline exceeding all LLM approaches on the 20-class pilot, demonstrating that semantic precision, not descriptive complexity, drives effective vision-language alignment. Second, we distinguish between training-time and inference-time challenges: long-tailed degradation is mitigated through asymmetric loss design, while input quality variability is addressed through a dual-pathway mechanism in which semantic anchoring constrains predictions under related errors, and the retrieval-based safety net ensures bounded degradation under noisy inputs. Third, frequency-stratified analysis reveals a division of labor: semantic anchoring benefits high-frequency classes, while the retrieval-based safety net preferentially compensates rare classes. These complementary mechanisms support a human-in-the-loop workflow that transforms single-pass diagnosis into collaborative verification. SMART achieves F1 = 0.9767, surpassing fine-tuned CNNs (best F1 = 0.9118) and 22 multimodal foundation models (best F1 = 0.8904). The primary contributions of this work are: (i) a systematic benchmark of caption form-structured versus LLM-generated as an experimental variable in agricultural vision-language alignment; (ii) a factorial decomposition that quantitatively isolates the independent and complementary roles of semantic anchoring and retrieval-based compensation across frequency strata; and (iii) a statistically validated, human-in-the-loop diagnostic framework applicable to large-scale, imbalanced disease taxonomies. |
| 發表成果與AI計畫研究主題相關性 | 本研究提出SMART植物病害診斷系統:透過結構化多模態學習與相似案例檢索,強化模型對植物病害影像與文字描述之間的語意對齊能力,並提升診斷結果的可解釋性與實務應用價值。 |