The Search Evaluation Dataset
This dataset was created to support research on session search evaluation. We conducted two user study which contained 675 search sessions for 9 search tasks. Users’ interactions and explicit feedback were collected during searching process.
Motivation
User satisfaction has been paid much attention to in recent Web search evaluation studies and regarded as the ground truth for designing better evaluation metrics. However, most existing studies are focused on the relationship between satisfaction and evaluation metrics at query-level.
While search request becomes more and more complex, there are many scenarios in which multiple queries and multi-round search interactions are needed (e.g. exploratory search). In those cases, the relationship between session-level search satisfaction and session search evaluation metrics remain uninvestigated.
In this study, we conduct a laboratory study in which users are required to finish some complex search tasks and provide usefulness judgments of documents as well as session-level and query-level satisfaction feedbacks. So that we analyze how users’ perceptions of satisfaction accord with a series of session-level evaluation metrics.
Data description
This dataset contains two parts: main user study and comparison user study. The main user study consists of 450 search sessions of 9 tasks and the comparison user study consists of 225 search sessions of the same 9 tasks.
For each search task, the participant needs to read and memorize the task description and repeat the task description without viewing it. The participant can submit queries and click on the results to collect information as they usually do with commercial search engines. He/She is asked to mark whether the clicked documents were useful (4-level) and give a 5-level graded satisfaction feedback on each query. Finally, he/she is required to give an answer to the search task and an overall 5-level graded satisfaction feedback of search experience in the task. The detailed information are shown in the following table.
Measure | Type | Description |
---|---|---|
task | #9 indexs | Index used to distinguish tasks |
query | text | user submitted query |
clicked_url | url | user clicked url |
start/end time | numerical | the time user behavior occurs |
usefulness | 1(low)~4(high) | user’s usefulness feedback on a document |
query satisfaction | 1(low)~5(high) | user’s satisfaction feedback on a search query |
session satisfaction | 1(low)~5(high) | user’s satisfaction feedback on a search session |
answer | text | user’s answer on a task |
How to get the detailed dataset
We provide the data used in the paper we published at the WWW18 conference. For the whole dataset that contains the detailed user behavior, you need to contact with us (thuir_datamanage@126.com). After signing an application forum online, we can send you the data.
Citation
If you use this dataset in your research, please add the following bibtex citation in your references. A preprint of this paper can be found here.
@inproceedings{DBLP:conf/sigir/LiuLMLM18,
author = {Mengyang Liu and
Yiqun Liu and
Jiaxin Mao and
Cheng Luo and
Shaoping Ma},
title = {Towards Designing Better Session Search Evaluation Metrics},
booktitle = {The 41st International {ACM} {SIGIR} Conference on Research {\&}
Development in Information Retrieval, {SIGIR} 2018, Ann Arbor, MI,
USA, July 08-12, 2018},
pages = {1121--1124},
year = {2018},
crossref = {DBLP:conf/sigir/2018},
url = {http://doi.acm.org/10.1145/3209978.3210097},
doi = {10.1145/3209978.3210097},
timestamp = {Mon, 02 Jul 2018 08:24:13 +0200},
biburl = {https://dblp.org/rec/bib/conf/sigir/LiuLMLM18},
bibsource = {dblp computer science bibliography, https://dblp.org}
}