A cross-document coreference solution with nonparametric Bayesian method
A novel approach based on nonparametric Bayesian method is proposed for cross-document coreference (CDC) resolution. The main challenge of the CDC is how to generate the summary for the ambiguous NE phrase. In this paper, nonparametric Bayesian method is adopted to generate the summaries. The HDP model is utilized to generate the latent topic based sentence vectors, and then, the DPGMM clusters those vectors to generate the summaries. The proposed approach is compared with other classical approaches on 6 different data sets, the BCubed metric and the standard purity and inverse purity clustering metrics are used to evaluate the coreference performance. The experimental results indicate that the proposed approach has a significantly advantage over all the compared classical approaches.
Author's Name: Xing, X., He, Z., Li, Y., Zhang, W.
Volume: Volume 9
Issues: Issue 19
Keywords: Cross-document coreference, Information fusion, Nonparametric bayesian, Topic model