Article Image
Article Image


The Session Search (SS) task is a new pilot task in NTCIR-16 to support intensive investigations of session search or task-oriented search. Nowadays, users depend increasingly on search engines to either gain useful information or to complete tasks. In complex search scenarios, a single query may not fully cover user information need. Therefore, users will submit more queries to search systems within a short time interval until they are satisfied or give up. Such a search process is called a search session or task. As users’ search intents may evolve within search sessions, their actions and decisions will also be greatly impacted. Going beyond ad-hoc search and considering the contextual information within sessions has been proved efficient for user intent modeling in IR communities. However, existing tasks in NTCIR have not involved session-based retrieval yet. To this end, we propose this new Session Search task to provide practical datasets as well as evaluation methodology to researchers in the related domain.


To better assess the search effectiveness at both query-level and session-level, we aim to set up two subtasks as follows:

  • Fully Observed Session Search (FOSS): For a k-length session, we provide full session contexts in the first (k-1) queries. Participants need to re-rank the candidate documents for the last query of a session. This setting follows TREC Session Tracks to enable ad-hoc evaluation by using metrics such as NDCG, AP, and RBP, etc.
  • Partially Observed Session Search (POSS): In this subtask, we truncate all sessions before the last query. For a session with k queries (k ≥ 2), we only reserve the session contexts in the first queries, where 1nk-1. The value of n varies in different sessions. Participants will need to re-rank documents for the last k-n queries(query) according to the partially observed contextual information in previous search rounds. Session-level metrics such as RS-DCG and RS-RBP will be adopted for the evaluation of system effectiveness.
  NTCIR-16 Session Search (SS) TREC Session Tracks
Number of Sessions (in Chinese) - Training: 147, 154, with human relevance labels for the last query of 2, 000 sessions.
- FOSS Testing Subtask: 1, 817.
- POSS Testing Subtask: 1, 203.
English: 76~1, 257
Session Datasets - Extracted from a Sogou search log and two field study datasets Session Track 2011-2014
Document Collection - With about 1, 000, 000 documents.
- Contains 81.23 candidate documents in average for all queries in the testing set.
Source/Generation of session data Refined from a search log from or extracted from two large-scale field studies. Generated by real search users based on manually designed topics.
Support from log analysis for annotation? ×
Support session-level evaluation? ×

Expected Results

We plan to attract at least 15 active participants. As this is the first year, we only construct Chinese-centric session collections. Participants could leverage the training set for either single-task or multi-task learning. As for the testing collection, the number of session in both subtasks are expected to be more than 1,000. Having collected running results from all teams, we will recruit accessors for relevance annotation and further use both query-level and session-level metrics for evaluation. Through these efforts, we hope that there will be a technological breakthrough in optimizing session-level document ranking.

Important Dates

All deadlines are at 11:59pm in the Anywhere on Earth (AOE) timezone.
Session Search registration Due: Aug 25, 2021
Dataset Release: Aug 15, 2021
Formal Run: 👉Oct 1, 2021 - Dec 31, 2021
Relevance Assessment: Jan, 2022 - Feb, 2022
Evaluation Result Release: Mar 1, 2022
Draft Task Overview Paper Release: Mar 1, 2022
Draft Participant Paper Submission Due: April 1, 2022
All Camera-ready Paper Submission Due: Jun 1, 2022
NTCIR-16 Conference in NII, Tokyo, Japan: Jun 2022

Evaluation Measures

For the FOSS subtask, we adopt nDCG, AP, and RBP, etc.
For the POSS subtask, we use session-level metrics such as RS-DCG and RS-RBP.
The official evaluation tool is coming soon!

Data and File format

We provide three directories:

|----- ./document_collection 
|              	|----- ./doc [A collection with about 1, 000, 000 pages. Each directory contains about 10, 000 files. ]
|              	|----- qid2docs.json [A file mapping testing query IDs into their corresponding candidate document IDs.]
|----- ./sessions
|			|----- ./training |----- training_sessions.txt		
|			|
|			|----- ./testing
|						|----- ./FOSS |------ testing_sessions_foss.txt		
|						|
|						|----- ./POSS |------ testing_sessions_poss.txt						
|----- ./training_human_labels |----- human_labels.txt

1) Firstly, for all session files, each session is split by two line breaks (\n\n).

2) Each training session is formatted as follows:

SessionID	87
画杨桃	q198	1427848224.93
1	d1882	404	0	-1
2	d1883	<unk>	0	-1
3	d1884	画杨桃-搜索页	0	-1
4	d1885	画杨桃	0	-1
5	d1886	人教版小学三年级下册语文《画杨桃》教学设计优质课教案	0	-1
6	d5	微信,是一个生活方式	0	-1
7	d1887	【图文】画杨桃_百度文库	0	-1
8	d1888	画杨桃课件_	0	-1
9	d1889	搜狗搜索	0	-1
10	d1890	画杨桃_三年级语文下册课件_奥数网	0	-1
画杨桃ppt课件	q199	1427848230.2
1	d1894	【图文】画杨桃ppt课件精品_百度文库	1	1427848232.105
2	d1895	《画杨桃》PPT课件	0	-1
3	d1896	《画杨桃》PPT课件6	0	-1
4	d1897	《画杨桃》公开课ppt课件(24页)-免费高速下载	0	-1
5	d1898	<unk>	0	-1
6	d1899	画杨桃ppt课件下载	0	-1
7	d1900	<unk>	0	-1
8	d1901	11画杨桃PPT课件_管理资源吧	0	-1
9	d1902	《画杨桃》ppt课件-免费高速下载	0	-1
10	d1903	《画杨桃》ppt课件【13页】-免费高速下载	0	-1
画杨桃ppt	q200	1427848257.0
1	d1904	【图文】画杨桃PPT_百度文库	1	1427848258.188
2	d1895	《画杨桃》PPT课件	0	-1
3	d1905	【图文】画杨桃ppt课件_百度文库	0	-1
4	d1897	《画杨桃》公开课ppt课件(24页)-免费高速下载	0	-1
5	d1900	<unk>	0	-1
6	d1906	《画杨桃》ppt课件(19页)-免费高速下载	0	-1
7	d1907	<unk>	0	-1
8	d1903	《画杨桃》ppt课件【13页】-免费高速下载	0	-1
9	d1896	《画杨桃》PPT课件6	0	-1
10	d1908	【精品】:画杨桃PPT	0	-1
  • The first line: <SessionID><tab><session ID>, such as SessionID 87.
  • The first line in a query: <query string><tab><query ID><tab><query start time>, such as 画杨桃 q198 1427848224.93.
  • Each rest line in a query: <rank><tab><url><tab><document ID><tab><document title><tab><clicked><tab><click timestamp>, such as 2 d1895 《画杨桃》PPT课件 0 -1.
  • If the title of a document is unknown, then the document title will be represented by <unk>.
  • If a document is not clicked, then the click timestamp is -1.

3) Each testing session is formatted as follows:

SessionID	8
Tensorflow	q64324	1596009033.208
1	d527264	TensorFlow	0	-1	2
2	d527265	TensorFlow教程:TensorFlow快速入门教程(非常详细)	0	-1	0
3	d527266	TensorFlow_百度百科	0	-1	0
4	d527267	TensorFlow - 机器学习系统	0	-1	0
5	d527267	TensorFlow - 机器学习系统	0	-1	0
6	d527268	tensorflow neural network playground - A Neural Network...	0	-1	2
7	d527269	GitHub - tensorflow/tensorflow: An Open Source Machine...	0	-1	0
8	d527270	TensorFlow入门极简教程(一) - 简书	0	-1	0
9	d527271	TensorFlow 如何入门,如何快速学习? - 知乎	0	-1	0
10	d527272	终于来了!TensorFlow 2.0入门指南(上篇)_机器学习算法..._CSDN博客	0	-1	0
Pytorch	q64325	1596009037.5
  • The first line: <SessionID><tab><session ID>, such as SessionID 8.
  • The first line in an observed/unobserved query: <query string><tab><query ID><tab><query start time>, such as Tensorflow q64324 1596009033.208.
  • Each rest line in an observed query: <rank><tab><url><tab><document ID><tab><document title><tab><clicked><tab><click timestamp><tab><usefulness>, such as 1 d527264 TensorFlow 0 -1 2.
  • If the title of a document is unknown, then the document title will be represented by <unk>.
  • If a document is not clicked, then the click timestamp is -1.
  • The usefulness ratings are 4-scale (0-3), annotated by the first-tier search users. We provide this information to explore to what extent search systems will be improved if true and instant user feedback is available.

4) Each line in the human_labels.txt is formatted as:<ID><tab><training session ID><tab><query ID><tab><document ID><tab><relevance><tab><valid>.

Submission format

1) Each team can submit up to six NEW or REP runs for each subtask.
2) The submission file should be named as [TEAMNAME]-{FOSS, POSS}-{NEW, REP}-[1-5].txt, such as THUIR1-FOSS-NEW-1.txt. Note that for the organizers’ convenience, there should not be any hyphen-minus (-) in the TEAMNAME. A NEW run means you use a novel approach, while a REP run means you reproduce some model.
3) As for the content, the first line should be a short English description of this particular run, such as BM25F with Pseudo-Relevance Feedback. The rest lines should be formatted as [SessionID]<tab>[QueryID]<tab>[QueryPosInSession]<tab>[DocumentID]<tab>[Rank]<tab>[Score]<tab>[RunName]. Such as



Please do not include more than 20 candidate documents per case!

Submitting runs


Scores presented in the leaderboard is preliminary evaluation.


For each foss run, We calculated its three metrics, nDCG@3, nDCG@5 and nDCG@10. The rank is sorted by nDCG@3 scores.


For each poss run, We calculated its two metrics, RS_DCG and RS_RBP. The rank is sorted by RS_DCG scores.


Yiqun Liu [] (Tsinghua University)
Jia Chen [] (Tsinghua University)
Weihao Wu [] (Tsinghua University)
Beining Wang [] (Tsinghua University)
Fan Zhang [] (Wuhan University)
Jiaxin Mao [] (Renmin University of China)
Weizhi Ma [] (Tsinghua University)
Xiaohui Xie [] (Tsinghua University)

Contact Email:
Please feel free to contact us! 😉


[1] Carterette, B., Kanoulas, E., Hall, M., & Clough, P. (2014). Overview of the TREC 2014 session track. pdf
[2] Yang, G. H., & Soboroff, I. (2016). TREC 2016 Dynamic Domain Track Overview. In TREC. pdf
[3] Zhang, F., Mao, J., Liu, Y., Ma, W., Zhang, M., & Ma, S. (2020, July). Cascade or Recency: Constructing Better Evaluation Metrics for Session Search. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 389-398). pdf
[4] Chen, J., Mao, J., Liu, Y., Zhang, M., & Ma, S. (2019, November). TianGong-ST: A New Dataset with Large-scale Refined Real-world Web Search Sessions. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management (pp. 2485-2488). pdf
[5] Liu, M., Liu, Y., Mao, J., Luo, C., & Ma, S. (2018, June). Towards designing better session search evaluation metrics. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval (pp. 1121-1124). pdf

Supported by