2026-06-22

【學術亮點-頂級會議論文】STAR:結合表頭感知叢集與自適應加權融合之語義表格表示法

Font Size
Small
Middle
Large
【學術亮點-頂級會議論文】STAR:結合表頭感知叢集與自適應加權融合之語義表格表示法
AI core Technology: Advanced Research and Resource Integration Platform or AI TechnologyDepartment of Computer Science and Engineering / Yao-Chung Fan / Professor
核心技術:AI核心技術之進階研究與資源整合平台資訊工程學系范耀中教授】
 
論文篇名 英文:STAR: Semantic Table Representation with Header-Aware Clustering and Adaptive Weighted Fusion
中文:STAR:結合表頭感知叢集與自適應加權融合之語義表格表示法
期刊名稱 WWW '26: Proceedings of the ACM Web Conference 2026(指標清單會議)
發表年份, 卷數, 起迄頁數 In Proceedings of the ACM Web Conference 2026 (WWW '26) , pp.8773–8776, Dubai United Arab Emirates from April 13 - 17, 2026.
作者 Shui-Hsiang Hsu; Tsung-Hsiang Chou; Chen-Jui Yu; Yao-Chung Fan(范耀中)
DOI 10.1145/3774904.3792962
中文摘要 表格檢索的任務是依據自然語言查詢,從大規模語料中檢索出最相關的表格。然而,非結構化文字與結構化表格之間的結構與語義落差,使得嵌入對齊格外困難。近期如 QGpT 等方法雖嘗試以生成合成查詢來豐富表格語義,但仍仰賴粗略的部分表格抽樣與簡單的融合策略,限制了語義多樣性並阻礙有效的查詢—表格對齊。我們提出 STAR(語義表格表示法),一套透過語義叢集與加權融合來改進語義表格表示的輕量化框架。STAR 首先以表頭感知的 K-means 叢集將語義相似的列分群,並選取具代表性的中心實例以建構語義多樣的部分表格;接著為各叢集生成專屬的合成查詢,以全面涵蓋表格的語義空間;最後採用加權融合策略整合表格與查詢的嵌入,達成細粒度的語義對齊。此設計使 STAR 能擷取結構化與文字來源的互補資訊,提升表格表示的表達力。於五個基準上的實驗顯示,STAR 在所有資料集上的 Recall 皆穩定高於 QGpT,證實語義叢集與加權融合對於穩健表格表示的有效性。
英文摘要 Table retrieval is the task of retrieving the most relevant tables from large-scale corpora given natural language queries. However, structural and semantic discrepancies between unstructured text and structured tables make embedding alignment particularly challenging. Recent methods such as QGpT attempt to enrich table semantics by generating synthetic queries, yet they still rely on coarse partial-table sampling and simple fusion strategies, which limit semantic diversity and hinder effective query–table alignment. We propose STAR (Semantic Table Representation), a lightweight framework that improves semantic table representation through semantic clustering and weighted fusion. STAR first applies header-aware K-means clustering to group semantically similar rows and selects representative centroid instances to construct a diverse partial table. It then generates cluster-specific synthetic queries to comprehensively cover the table's semantic space. Finally, STAR employs weighted fusion strategies to integrate table and query embeddings, enabling fine-grained semantic alignment. This design enables STAR to capture complementary information from structured and textual sources, improving the expressiveness of table representations. Experiments on five benchmarks show that STAR achieves consistently higher Recall than QGpT on all datasets, demonstrating the effectiveness of semantic clustering and weighted fusion for robust table representation.
發表成果與AI計畫研究主題相關性 農業結構化資料的欄位名稱常承載關鍵語義,像是作物別、病害名稱、生長期、劑量、罹病度,這些欄位決定了每個數值的意義,若忽略表頭就可能把不同性質的數值混淆。STAR 將表頭語義納入叢集以選出較具代表性的列,再依資料特性調整表格與查詢嵌入的相對權重。論文在五個基準上平均 R@1 較基線提升 6.39%,顯示這套設計在處理綱要多元、結構較複雜的表格時具有穩定性。STAR 有助於提升本中心農業診斷與用藥決策知識庫對跨異質農業表格的檢索一致性,使系統較能依查詢對應到特定作物、病害與處置條件。
上架日期:2025/4/12
 
Contact Us