Volume 7 - Issue 14
Multi-view LDA for semantics-based document representation
Abstract
Each document and word can be modeled as a mixture of topics by Latent Dirichlet Allocation (LDA), which does not contain any external semantic information. In this paper, we represent documents as two feature spaces consisting of words and Wikipedia categories respectively, and propose a new method called Multi-View LDA (M-LDA) by combining LDA with explicit human-defined concepts in Wikipedia. M-LDA improves document topic model by taking advantage of both two feature spaces and their mapping relationship. Experimental results on classification and clustering tasks show M-LDA outperforms traditional LDA.
Paper Details
PaperID: 83255186949
Author's Name: Yun, J., Jing, L., Huang, H., Yu, J.
Volume: Volume 7
Issues: Issue 14
Keywords: Latent dirichlet allocation, Semantics, Topic model, Wikipedia category
Year: 2011
Month: December
Pages: 4999 - 5006