The Search Evaluation Dataset

This dataset was created to support research on search evaluation in exploratory search. We conducted a user study which contained 166 search sessions in three domains. Users’ interactions and explicit feedback were collected during searching process. The clicked documents collected in the user study were annotated by external assessors.


Search evaluation is one of the central concerns in information retrieval (IR) studies. In this line of research, many existing studies focused on the estimation of two important variables: user satisfaction and search success.

For search tasks with simple information needs, search success is usually consistent with user satisfaction. However, for search tasks with complex information needs, e.g. exploratory search, it is sometimes difficult for users to determine whether they have gained enough credible information to complete the search task. In these scenarios, user satisfaction might be different from search success.

Besides bringing satisfaction to users, it is also very important to help them make a successful search, i.e. get sufficient correct information via search. In this study, we try to investigate the relationship between user satisfaction and search success.

Data description

This dataset contains two parts: the users’ search log and annotations of documents. The users’ search logs consists of 166 search sessions of 6 tasks in three domains: Environment, Medicine, and Politics. All tasks were designed by senior graduate students in corresponding departments in our university.

The participant needs to finish a pre-search questionnaire including: his/her domain knowledge level (1-5), predicted difficulty level (1-5), and interest level (1-5) of the task. And then he/she needs to give a pre-task answer if he/she knows something about the task. He/She is asked to mark whether the clicked documents were useful (4-level). Finally, he/she is required to give an answer to the search task and an overall 5-level graded satisfaction feedback of search experience in the task. The detailed information are shown in the following table.

Measure Type Description
user #30 indexs Index used to distinguish users
task #6 indexs Index used to distinguish tasks
pre_difficulty 1(low)~5(high) user perceived task difficulty
pre_knowledge 1(low)~5(high) user’s prior knowledge about a task
pre_interest 1(low)~5(high) user’s interest about a task
pre_answer [ 1, 0, 0, 0, 1 ] user’s pre-search answer contains the corresponding correct point or not (1/0)
pre_answer [ 1, 1, 0, 1, 1 ] user’s post-search answer contains the corresponding correct point or not (1/0)
query text user submitted query
clicked_url url user clicked url
start/end time numerical the time user behavior occurs
usefulness 1(low)~4(high) user’s usefulness feedback on a document
satisfaction 1(low)~5(high) user’s satisfaction feedback on a search session

The dataset contains 1194 clicked documents which were annotated by external assessors. For each clicked document, the following document-level information were obtained: (1) 4-level Relevance; (2) 4-level Credibility; (3) 4-level Readability. We also obtained the fine-grained findability (5-level) of three types of information points: Correct point, Incorrect point and Neutral point.

How to get the detailed dataset

We provide the data used in the paper we published at the WWW18 conference. For the whole dataset that contains the detailed user behavior, you need to contact with us ( After signing an application forum online, we can send you the data.


If you use this dataset in your research, please add the following bibtex citation in your references. A preprint of this paper can be found here.

  author    = {Mengyang Liu and
               Yiqun Liu and
               Jiaxin Mao and
               Cheng Luo and
               Min Zhang and
               Shaoping Ma},
  title     = {"Satisfaction with Failure" or "Unsatisfied Success": Investigating
               the Relationship between Search Success and User Satisfaction},
  booktitle = {Proceedings of the 2018 World Wide Web Conference on World Wide Web,
               {WWW} 2018, Lyon, France, April 23-27, 2018},
  pages     = {1533--1542},
  year      = {2018},
  crossref  = {DBLP:conf/www/2018},
  url       = {},
  doi       = {10.1145/3178876.3186065},
  timestamp = {Wed, 25 Apr 2018 16:17:00 +0200},
  biburl    = {},
  bibsource = {dblp computer science bibliography,}