ニュースアーカイブからの時間を超えた類似オブジェクト検索
Searching for Similar Objects across Time in News Archive

概要

Large-scale archival datasets are prevalent in many areas including informatics, computational social science and finance. However, to search and better understand the archival dataset is however not trivial.

The key difficulty comes from the change of the entire context that results in low overlap of context across time. Therefore, when searching in archival data, a serious problem arises from the fact that the terminology evolves overtime. Since users formulate queries using current terminology, old documents with similar concepts, yet written in different expressions will not be retrieved. For example, users searching for documents about similar music devices in 1980s which are similar to iPod, may fail to succeed since they do not know the keyword Walkman which played similar role as iPod does nowadays. We solve this terminology gap by approaching the temporal correspondence problem in which, given an input term (e.g., iPod) and the target time (e.g. 1980s), the task is to find the counterpart of the query that existed in the target time. We then propose a method that transforms word contexts across time based on their distributional representations. Knowledge of temporal counterparts can help to alleviate the problem of terminology gap for users searching within temporal document collections such as archives. For example, given a user’s query and the target time frame, a new modified query that represents the same meaning could be suggested to improve search results.

Essentially, it would mean letting searchers use the knowledge they possess on the current world to perform search within unknown collections such as ones containing documents from the distant past. Furthermore, solving temporal correspondence problem can help timeline construction, temporal summarization, reference forecasting, similarity search and can have applications in education.

産業界への展開例・適用分野

The applications of our research can help average users to search, support the education, and extract useful information to enrich the knowledge base. Since our research work is part of the computational history, so it provides computational support for sociologists and historians. It makes the access to archives more efficient and interesting in order to encourage users appreciate our heritage and to learn from the history.

By applying our techniques, the current way to search will be changed. For many times, people don’t know the keyword to search, such as iPod is unknown to the people born 1960s since they accepted Walkman as the most popular music device. However when these people want to buy the latest Walkman, they will find out the Walkman is already out of sold. In this case, our system will recommend the user with iPod or other up-to-date music devices which are the temporal parts of the Walkman. Our techniques can also be implemented as a “similarity” search function, such as the query “I want to search for a music device like Walkman” or “I want to search for a person like Bill Gates” etc. People don’t need to describe their queries any more, instead just query by example.

研究者

氏名 専攻 研究室 役職/学年
Zhang Yating 社会情報学専攻 田中研究室 博士3回生
Jatowt Adam 社会情報学専攻 田中研究室 特定准教授
田中 克己 社会情報学専攻 田中研究室 教授

Website


PAGE TOP