Shi Zhi, Bo Zhao, Wenzhu Tong, Jing Gao, Dian Yu, Heng Ji, Jiawei Han
In the 21st ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2015)
When integrating information from multiple sources, it is common to encounter conflicting answers to the same question. Truth discovery is to infer the most accurate and complete integrated answers from conflicting sources. In some cases, there exist questions for which the true answers are excluded from the candidate answers provided by all sources. Without any prior knowledge, these questions, named no-truth questions, are difficult to be distinguished from the questions that have true answers, named hastruth questions. In particular, these no-truth questions degrade the precision of the answer integration system. We address such a challenge by introducing source quality, which is made up of three finegrained measures: silent rate, false spoken rate and true spoken rate. By incorporating these three measures, we propose a probabilistic graphical model, which simultaneously infers truth as well as source quality without any a priori training involving ground truth answers. Moreover, since inferring this graphical model requires parameter tuning of the prior of truth, we propose an initialization scheme based upon a quantity named truth existence score, which synthesizes two indicators, namely, participation rate and consistency rate. Compared with existing methods, our method can effectively filter out no-truth questions, which results in more accurate source quality estimation. Consequently, our method provides more accurate and complete answers to both has-truth and no-truth questions. Experiments on three real-world datasets illustrate the notable advantage of our method over existing state-of-the-art truth discovery methods.