2024-08-26

【學術亮點-頂級會議論文】大型語言模型錯誤學習命令提示:以原民語翻譯為例

字體大小
【學術亮點-頂級會議論文】大型語言模型錯誤學習命令提示:以原民語翻譯為例
AI core Technology: Advanced Research and Resource Integration Platform or AI TechnologyDepartment of Computer Science and Engineering / Yao-Chung Fan / Associate Professor
核心技術:AI核心技術之進階研究與資源整合平台【資訊工程學系范耀中副教授】
 
論文篇名 英文:Learning-From-Mistakes Prompting for Indigenous Language Translation
中文:大型語言模型錯誤學習命令提示:以原民語翻譯為例
期刊名稱 The 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024) (指標清單期刊)
發表年份, 卷數, 起迄頁數 In Proceedings of the ACL Workshop on Technologies for Machine Translation of Low-Resource Languages (LoResMT 2024@ACL), pages 146–158, Bangkok, Thailand from August 15th, 2024. Association for Computational Linguistics.
作者 You Cheng Liao, Chen-Jui Yu, Chi-Yi Lin, He-Feng Yun, Yen-Hsiang Wang, Hsiao-Min Li, Yao-Chung Fan(范耀中)
DOI https://aclanthology.org/2024.loresmt-1.15
中文摘要 本文利用大型語言模型(LLM),提出了改進極低資源原住民語言翻譯的技術。我們的方法基於以下幾點:(1) 包含有限數量平行翻譯範例的資料庫,(2) GPT-3.5等LLM的內在能力,以及 (3) 字詞級翻譯詞典。在這樣的環境中,我們利用LLM的潛力和情境學習技術,將LLM用作極低資源語言的通用翻譯器。我們的方法論著重於將LLM用作特定語言對的語言編譯器,假設它們能夠內化句法結構以促進準確的翻譯。我們引入了三種技術:使用檢索提示上下文的KNN提示、思維鏈提示,以及從錯誤中學習的提示。最後一種方法特別針對過去的錯誤進行修正。評估結果顯示,即使在語料庫有限的情況下,LLM配合適當的提示策略,仍然能有效翻譯極低資源語言。
英文摘要 Using large language models, this paper presents techniques to improve extremely low-resourced indigenous language translations. Our approaches are grounded in the use of (1) the presence of a datastore consisting of a limited number of parallel translation examples, (2) the inherent capabilities of LLMs like GPT-3.5, and (3) a word-level translation dictionary. We harness the potential of LLMs and in-context learning techniques in such a setting for using LLM as universal translators for extremely low-resourced languages. Our methodology hinges on utilizing LLMs as language compilers for selected language pairs, hypothesizing that they could internalize syntactic structures to facilitate accurate translation. We introduce three techniques: KNN-Prompting with Retrieved Prompting Context, Chain-of-Thought Prompting, and Learning-from-Mistakes Prompting, with the last method addressing past errors. The evaluation results suggest that, even with limited corpora, LLMs, when paired with proper prompting, can effectively translate extremely low-resource languages.
發表成果與AI計畫研究主題相關性 研究大型語言模型能力,其於低資源語言In-context Learning的學習能力。並進行設計演算法進行自我修正能力之學習。論文中我們以TAIDE, Breeze, GPT3.5為研究對象,實驗結論表明大型語言模型具有強大的In-context Learning與思維練理解能力。而我們所預計開發之農業語言模型,屬於低資源設定,相關知識之理解,將有助於進一步提升我們所設定之農業模型開發。
上架日期:2024/8/15
 
聯絡我們