2026-06-22
【學術亮點-頂級會議論文】顯著性引導之嵌入對齊方法於查詢對文件之法律案件檢索
Font Size
Small
Middle
Large
【學術亮點-頂級會議論文】顯著性引導之嵌入對齊方法於查詢對文件之法律案件檢索
AI core Technology: Advanced Research and Resource Integration Platform or AI Technology【Department of Computer Science and Engineering / Yao-Chung Fan / Professor】
核心技術:AI核心技術之進階研究與資源整合平台資訊工程學系范耀中教授】
上架日期:2025/5/28
AI core Technology: Advanced Research and Resource Integration Platform or AI Technology【Department of Computer Science and Engineering / Yao-Chung Fan / Professor】
核心技術:AI核心技術之進階研究與資源整合平台資訊工程學系范耀中教授】
| 論文篇名 | 英文:Saliency-Guided Embedding Alignment for Query-to-Document Legal Case Retrieval 中文:顯著性引導之嵌入對齊方法於查詢對文件之法律案件檢索 |
| 期刊名稱 | WWW Companion '26: Companion Proceedings of the ACM Web Conference 2026 (指標清單會議) |
| 發表年份, 卷數, 起迄頁數 | In Companion Proceedings of the ACM Web Conference 2026 (WWW Companion '26) , pp. 1046 - 1052, Dubai United Arab Emirates from June 29 – July 7, 2026. |
| 作者 | Yu-Han Shi; Yao-Chung Fan(范耀中)∗ |
| DOI | 10.1145/3774905.3795085 |
| 中文摘要 | 法律案件檢索(LCR)旨在依查詢找出相關案件。多數既有方法採「文件對文件」(D2D)範式,將整份法律文件作為查詢。然而在真實情境中,使用者提供的是簡短描述,形成「查詢對文件」(Q2D)範式,此時以 D2D 訓練的嵌入會因長度落差而表現不佳。為此,我們提出面向 Q2D 的框架,透過區段選取與半監督訓練來改善短查詢與長文件之匹配。我們提出「區段顯著性評估器」(CSA)以辨識與查詢最相關的法律文件區段;在此基礎上,設計兩階段微調策略:(1)法律反克漏字任務(LICT)以強化對法律結構的理解,(2)法律內容對齊嵌入微調(LCAET)以對齊查詢與文件之語義。在多種犯罪類別上以類別專屬微調進行的實驗顯示檢索準確度顯著提升。消融實驗證實所提對齊機制之有效性,能精準辨識與查詢相關的區段並弭平語義落差。 |
| 英文摘要 | Legal Case Retrieval (LCR) aims to find relevant cases given a query. Most existing approaches use a Document-to-Document (D2D) paradigm, treating a full legal document as the query. In real scenarios, however, users provide short descriptions, forming a Query-to-Document (Q2D) paradigm where D2D-trained embeddings struggle due to length mismatch. To address this, we propose a Q2D-oriented framework that improves short-query and long-document matching through segment selection and semi-supervised training. We introduce a Chunk Saliency Assessor (CSA) to identify legal document segments most relevant to a query. Building on this, we design a two-stage fine-tuning strategy: (1) Legal Inverse Cloze Task (LICT) to strengthen understanding of legal structures, and (2) Legal Content-Aligned Embedding Tuning (LCAET) to align query and document semantics. Experiments across multiple crime categories, with category-specific fine-tuning, show significant improvements in retrieval accuracy. Ablation results demonstrate the effectiveness of our alignment mechanism, demonstrating precise identification of query-relevant segments and bridging the semantic gap. |
| 發表成果與AI計畫研究主題相關性 | 本論文針對以簡短查詢檢索長篇專業文件的落差。論文提出 CSA 顯著性評估器辨識文件中與查詢最相關的區段,再以 LICT 與 LCAET 兩階段微調對齊短查詢與長文件的語義,在多種類別上提升檢索準確度。農業診斷常有類似情境,農民或植物醫師可能以簡短症狀描述發問,而證據往往是較長的植保文獻、防治手冊或試驗報告,兩者之間存在長度與語義落差。CSA 的區段顯著性選取與內容對齊微調,有機會遷移用於從長篇農業文獻中擷取與症狀較相關的段落,提升本中心診斷系統的證據檢索品質與可追溯性。 |