The Search-Brainwave Dataset
We present the Search-Brainwave Dataset to support researches in the analysis of human neurological states during search process and BMI(Brain Machine Interface)-enhanced search system. The electroencephalogram (EEG) of 18 participants is recorded as each doing pre-defined search tasks in a period of 60 minutes. During the user study, participants are required to examine several search results with pre-defined task description. After that, they are required to evaluate the task-level search difficulty and satisfaction, then annotate the usefulness of each search result.
Search-Brainwave数据集可用于信息检索任务中的用户大脑活动研究。研究人员记录了18名参与者在一小时内进行预先定义的搜索任务时的脑电信号。在这个过程中,参与者需要根据搜索任务描述,逐个检验搜索结果。此外,参与者需要评价任务级别的搜索难度和搜索满意度,并对每个搜索结果进行有用性标注。
Motivation
Information Retrieval (IR) is a complex cognitive process that involves many brain activities. There is an increasing number of interdisciplinary research literature focusing on the intersection of neuroscience and IR, which can reveal the mental state directly and leverage brain signals as user feedback for evaluation and performance improvement.
信息检索是一个涉及很多大脑活动的复杂认知任务。越来越多的研究开始专注于利用神经科学的方式来分析信息检索过程中的大脑活动,这样的研究可以真正帮助我们了解搜索交互过程中的认知过程并利用脑信号作为搜索过程中的反馈来提升搜索评价和搜索性能。
Usefulness is a key concept in the user-centric evaluation of Web search. In contrast to relevance, which is often annotated by external assessors, usefulness represents users’ opinions about whether search results can meet their information needs. To understand the procedure of usefulness judgment in search process, we carry out a user study and collect brain signals during search results examination. Specially, our user study fill up the implicit feedback during the usefulness judgment for non-click results since other implicit feedback (click, dwell time) can’t be collected in this situation.
在以用户为中心的搜索评估中,有用性是一个关键概念。相对于相关性(通常由第三方评估人员标注),有用性代表用户对搜索结果是否满足他们的信息需求。为了理解用户进行有用性判断的过程,我们开展了用户实验并采集搜过结果检验过程中的大脑信号。特别的是,我们的用户实验采集了对于没有点击的结果的有用性判断过程中的用户脑电信号,而其他隐式反馈(点击,停留时间)在这种情况下无法收集,因此填补了对于无点击结果的有用性判断研究的空白。
Dataset Collection
The dataset is collected from a well-designed user study. The above figure illustrates the procedure of each search task in the main step.
本数据集从一个用户实验中获取,图二展示了用户实验的流程。
At first, participants view a task description randomly selected from the dataset. Then a randomly selected search result screenshot is displayed, lasting for 2.5 seconds. After that, three response choices, i.e., “skip”, “click”, and “end search”, are presented on the screen while the search result is still shown. If participants choose “click”, the landing page of the corresponding result will be presented. After examining the lading page, the participant can either end the search or continue to examine the next result in this step. If the participant decides to end the search they are presented with an end-mark page to report their perceived task difficulty (five-point Likert scale) and usefulness feedback (four-point Likert scale) to each result.
首先,参与者阅读一个从数据集中随机选择的任务描述。然后逐个显示随机选择的搜索结果截图,持续2.5秒。之后,屏幕上会出现“跳过”、“点击”和“结束搜索”三个响应选项。如果参与者选择“点击”,则会出现相应结果的结果页面。在检验结果页面之后,参与者可以结束搜索,或者继续检验下一个结果。如果参与者决定结束搜索,他们会看到一个结束标记页面,报告他们的感知任务难度(李克特五点量表)和对每个结果的有用性反馈(李克特四点量表)。
As for apparatus, our study uses a Scan NuAmps Express system (Compumedics Ltd., VIC, Australia) and a 64-channel Quik-Cap (Compumedical NeuroScan) are deployed to capture the participants’ EEG data. The computations and data pre-processing are performed using the Curry V8.3 (Neuroscan, TX), a widely used commercial source localization software package.
在仪器方面,我们的研究使用Scan NuAmps Express系统(Compumedics Ltd., VIC, Australia)和64通道quick-cap (Compumedical NeuroScan)来捕获参与者的脑电图数据。计算和数据预处理使用广泛使用的商业源定位软件包Curry V8.3 (Neuroscan, TX)。
Dataset Organization
The dataset contains EEG signals for each participant responses to each search result, the annotations of result usefulness, task difficulty, task satisfaction, the experimental data of 150 search tasks, and the pre-processed features including content/context features of each search result (similarity rank of the query-result pair (using a BERT encoder), BM25 rank of the query-result pair, result type, average/max similarity rank with previous search results, average/max/total usefulness ratings with previous search results, number of previous search results). More details are provided upon request.
该数据集包含了被试对搜索结果的EEG信号、被试标注的对搜索结果有用性的注释、对搜索任务的难度和满意度、150个搜索任务的数据、对每个搜索结果预处理的内容和会话特征(查询结果对的相似度排名(使用BERT编码器),查询结果对的BM25排名,结果类型,以前的搜索结果的平均/最大相似度排名,以前的搜索结果的平均/最大/总有用度排名,以前搜索结果的数量)。在申请数据集权限后,可提供更多细节。
How to get Search-Brainwave
To gain access to the Search-Brainwave dataset, you need to contact with us (thuir_datamanage@126.com). After signing an application forum online, we can send you the data. If there is any problem with our dataset, please feel free to contact us.
若希望获取Search-Brainwave的数据,请通过邮件联系我们(thuir_datamanage@126.com),在完成在线申请后即可获得。如果我们的数据有任何问题,请随时与我们联系。
Reference
[1] Ziyi Ye, Xiaohui Xie, Yiqun Liu, Zhihong Wang, Xuancheng Li, Jiaji Li, Xuesong Chen, Min Zhang, and Shaoping Ma. 2021. Why Don’t You Click: Neural Correlates of Non-Click Behaviors in Web Search. arXiv:arXiv:2109.10560
[2] Ziyi Ye, Xiaohui Xie, Yiqun Liu, Zhihong Wang, Xuesong Chen, Min Zhang, and Shaoping Ma. 2021. Understanding Human Reading Comprehension with brain signals. arXiv:arXiv:2108.01360