Multi-view LDA for semantics-based document representation – Journal of Computational Information Systems

Volume 7 - Issue 14

Multi-view LDA for semantics-based document representation

Abstract

Each document and word can be modeled as a mixture of topics by Latent Dirichlet Allocation (LDA), which does not contain any external semantic information. In this paper, we represent documents as two feature spaces consisting of words and Wikipedia categories respectively, and propose a new method called Multi-View LDA (M-LDA) by combining LDA with explicit human-defined concepts in Wikipedia. M-LDA improves document topic model by taking advantage of both two feature spaces and their mapping relationship. Experimental results on classification and clustering tasks show M-LDA outperforms traditional LDA.

Paper Details

PaperID: 83255186949

Author's Name: Yun, J., Jing, L., Huang, H., Yu, J.

Volume: Volume 7

Issues: Issue 14

Keywords: Latent dirichlet allocation, Semantics, Topic model, Wikipedia category

Year: 2011

Month: December

Pages: 4999 - 5006