同位関係を考慮した文章型クエリのパラフレージング
Paraphrasing Sentential Queries by Incorporating Coordinate Relationship

概要

Although long queries are still a minority in the queries that are submitted to web search engines, their usage tend to gradually increase. However, the effectiveness of the retrieval decreases with the increase of query length. We target at sentential queries and propose a method for improving their retrieval performance, called “query rewriting”. Briefly, given a sentential query, our method acquires paraphrases from the noisy Web and uses them to avoid returning no answers. In particular, since a relation can be represented either intensionally (referred to as “paraphrase templates”) or extensionally (referred to as “coordinate tuples”), the mutual reinforcement between them is taken into account. We are motivated by the assumption that separate terms or phrases from a sentential query may lead to the missing of some information or query drift. Hence, a long query is regarded as an integral whole. In contrast, previous works are mainly based on the assumption that long queries always contain extraneous terms. Therefore, they concentrate on removing or reweighing extraneous terms to improve the retrieval performance of long queries, while in this paper, we present a query paraphrasing technique to avoid missing information and consequently ensure the completeness of the information. The experimental results show that for declarative sentences, the average precision of our method is 68.1%, compared to 44.2% of the baseline. Besides, the relative recall of our method is 95.9%, nearly 3 times compared to that of the baseline. While for questions, the average precision of our method is 46.9%, compared to 39.9% of the baseline. We also show the effectiveness of query paraphrasing in two applications: judgement of fact credibility and QA search.

産業界への展開例・適用分野

Question-answering systems are frequently required to handle long queries, especially long natural language queries. Retrieval performance of QA systems can be improved if they can automatically detect the difference between a user’s question and existing questions or answers. It is a good way to paraphrase users’ questions to widely-used ones in these systems.
Query paraphrasing is also effective in estimating the credibility of facts. We assume the credibility of a fact is high if people often mention it on the Web. Hence, a naive way to judge fact credibility is to check its occurrence on the Web. However, this trial always fails. Take the fact “apples pop a powerful pectin punch” for example. Unfortunately, we find this fact never appears on the Web and would draw an erroneous conclusion that apples do not have a high amount of pectin. So it is likely to be misled by only observing the occurrence of the original fact. However, its paraphrases, such as “apples contain a lot of pectin” and “apples are rich in pectin”, are widely used on the Web. Therefore, if we take paraphrases into consideration, we could come to the right conclusion that apples are a high pectin fruit.

研究者

氏名 専攻 研究室 役職/学年
趙 夢 社会情報学専攻 田中克己研究室 博士3回生
大島 裕明 社会情報学専攻 田中克己研究室 特定准教授
田中 克己 社会情報学専攻 田中克己研究室 教授

PAGE TOP