Component Recognition in Image Sequences in Recipe Data
Component Recognition in Image Sequences in Recipe Data

概要

In this paper, we propose a method to recognize ingredients that appear in intermediate steps in multimedia recipes.
Each multimedia recipe describes the cooking steps by a sequence of image-text pairs, where ingredients shown in an image are often omitted in the associated text. This creates a need to recognize ingredients from images. The task comes with unique challenges and opportunities: ingredients less visually apparent in later steps, inheritance of ingredients across steps, short ingredient lists associated with per recipe, and each step with a variable number of components but the final step includes all.
To address these, we propose a method with the following features. First, we use prior steps as context to leverage ingredient information from earlier, clearer steps. We also implement a masked loss function to focus only on the distinction of ingredients in the list associated with each recipe. This approach is further enhanced by an adaptive threshold strategy to determine the presence of ingredients based on the predictive score from the final step.
In addition to proposing the method, we construct a new multimedia recipe dataset containing image-text pairs and ingredient annotations for every cooking step.
Few publicly available recipe datasets contain such text-image pairs for every cooking step.
Experimental results on this dataset demonstrate the improved performance of our method compared to baseline methods, showcasing the effectiveness of our approach.

研究者

氏名 コース 研究室 役職/学年
張訳心 社会情報学コース ソーシャルメディアユニット 博士3回生