The Search Evaluation Dataset

This dataset was created to support research on session search evaluation. We conducted a user study which contained 450 search sessions for 9 search tasks. Users’ interactions and explicit feedback were collected during searching process. We collected the document relevance assessment through a crowdsourcing platform.

Motivation

User satisfaction is an important variable in Web search evaluation studies and has received more and more attention in recent years. Many studies regard user satisfaction as the ground truth for designing better evaluation metrics. However, most of the existing studies focus on designing Cranfield-like evaluation metrics to reflect user satisfaction at query-level.

As information need becomes more and more complex, users often need multiple queries and multiround search interactions to complete a search task (e.g. exploratory search). In those cases, how to characterize the user’s satisfaction during a search session still remains to be investigated.

In this study, we collect a dataset through a laboratory study in which users need to complete some complex search tasks. With the help of hierarchical linear models (HLM), we try to reveal how user’s query-level and session-level satisfaction are a ected by different cognitive effects.

Data description

The dataset consists of 450 search sessions of 9 tasks. For each search task, the participant needs to read and memorize the task description and repeat the task description without viewing it. The participant can submit queries and click on the results to collect information as they usually do with commercial search engines. He/She is asked to mark whether the clicked documents were useful (4-level) and give a 5-level graded satisfaction feedback on each query. Finally, he/she is required to give an answer to the search task and an overall 5-level graded satisfaction feedback of search experience in the task.

we also collect the relevance assessment (4-level) of all the documents in our user study with a popular Chinese crowdsourcing platform. The detailed information are shown in the following table.

Measure Type Description
task #9 indexs Index used to distinguish tasks
query text user submitted query
clicked_url url user clicked url
start/end time numerical the time user behavior occurs
usefulness 1(low)~4(high) user’s usefulness feedback on a document
query satisfaction 1(low)~5(high) user’s satisfaction feedback on a search query
session satisfaction 1(low)~5(high) user’s satisfaction feedback on a search session
answer text user’s answer on a task
relevance 0(low)~3(high) crowdsourcing annotation

How to get the detailed dataset

We provide the data used in the paper we published at the KDD2019 conference. For the whole dataset that contains the detailed user behavior, you need to contact with us (maojiaxin@gmail.com). After signing an application forum online, we can send you the data.

Citation

If you use this dataset in your research, please add the following bibtex citation in your references. A preprint of this paper can be found here.

@inproceedings{DBLP:conf/kdd/LiuMLZM19,
  author    = {Mengyang Liu and
               Jiaxin Mao and
               Yiqun Liu and
               Min Zhang and
               Shaoping Ma},
  title     = {Investigating Cognitive Effects in Session-level Search User Satisfaction},
  booktitle = {Proceedings of the 25th {ACM} {SIGKDD} International Conference on
               Knowledge Discovery {\&} Data Mining, {KDD} 2019, Anchorage, AK,
               USA, August 4-8, 2019},
  pages     = {923--931},
  year      = {2019},
  crossref  = {DBLP:conf/kdd/2019},
  url       = {https://doi.org/10.1145/3292500.3330981},
  doi       = {10.1145/3292500.3330981},
  timestamp = {Mon, 04 Nov 2019 09:51:27 +0100},
  biburl    = {https://dblp.org/rec/bib/conf/kdd/LiuMLZM19},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}