Article Image
Article Image


The Session Search (SS) task is a new pilot task in NTCIR-16 to support intensive investigations of session search or task-oriented search. Nowadays, users depend increasingly on search engines to either gain useful information or to complete tasks. In complex search scenarios, a single query may not fully cover user information need. Therefore, users will submit more queries to search systems within a short time interval until they are satisfied or give up. Such a search process is called a search session or task. As users’ search intents may evolve within search sessions, their actions and decisions will also be greatly impacted. Going beyond ad-hoc search and considering the contextual information within sessions has been proved efficient for user intent modeling in IR communities. However, existing tasks in NTCIR have not involved session-based retrieval yet. To this end, we propose this new Session Search task to provide practical datasets as well as evaluation methodology to researchers in the related domain.


To better assess the search effectiveness at both query-level and session-level, we aim to set up two subtasks as follows:

  • Fully Observed Session Search (FOSS): For a k-length session, we provide full session contexts in the first (k-1) queries. Participants need to re-rank the candidate documents for the last query of a session. This setting follows TREC Session Tracks to enable ad-hoc evaluation by using metrics such as NDCG, AP, and RBP, etc.
  • Partially Observed Session Search (POSS): In this subtask, we truncate all sessions before the last query. For a session with k queries (k ≥ 2), we only reserve the session contexts in the first queries, where 1nk-1. The value of n varies in different sessions. Participants will need to re-rank documents for the last k-n queries(query) according to the partially observed contextual information in previous search rounds. Session-level metrics such as RS-DCG and RS-RBP will be adopted for the evaluation of system effectiveness.
  NTCIR-16 Session Search (SS) TREC Session Tracks
Number of Sessions Chinese: 147, 155 (Training) + 1, 124 (Validating) + 2, 356 (Testing) English: 76~1, 257
Session Datasets - TianGong-ST
- TianGong-SS-FSD
- An un-released field study dataset
Session Track 2011-2014
Document Collection - A collection provided by TianGong-ST, with more than 297, 597 pages.
- A collection which contains top 50 candidate documents for all queries in the validating and testing set.
Source/Generation of session data Refined from a search log from or extracted from two large-scale field studies. Generated by real search users based on manually designed topics.
Support from log analysis for annotation? ×
Support session-level evaluation? ×

Expected Results

We plan to attract at least 15 active participants. As this is the first year, we only construct Chinese-centric session collections. Participants could leverage the training set for either single-task or multi-task learning. As for the testing collection, the number of session in both subtasks are expected to be more than 1,000. Having collected running results from all teams, we will recruit accessors for relevance annotation and further use both query-level and session-level metrics for evaluation. Through these efforts, we hope that there will be a technological breakthrough in optimizing session-level document ranking.

Important Dates

All deadlines are at 11:59pm in the Anywhere on Earth (AOE) timezone.
Session Search registration due: 👉Aug 1, 2021
Dataset Release: Aug 15, 2021
Formal Run: Oct 1, 2021 - Nov 30, 2021
Relevance Assessment: Dec, 2021 - Jan, 2022
Evaluation Result Release: Feb 1, 2022
Draft Task Overview Paper Release: Feb 1, 2022
Draft Participant Paper Submission Due: Mar 1, 2022
All Camera-ready Paper Submission due: May 1, 2022
NTCIR-16 Conference in NII, Tokyo, Japan: Jun 2022

Evaluation Measures

For the FOSS subtask, we adopt nDCG, AP, and RBP, etc.
For the POSS subtask, we use session-level metrics such as RS-DCG and RS-RBP.

Data and File format


Submitting runs



Yiqun Liu [] (Tsinghua University)
Jia Chen [] (Tsinghua University)
Fan Zhang [] (Tsinghua University)
Jiaxin Mao [] (Renmin University of China)
Weizhi Ma [] (Tsinghua University)

Contact Email:
Please feel free to contact us! 😉


[1] Carterette, B., Kanoulas, E., Hall, M., & Clough, P. (2014). Overview of the TREC 2014 session track. pdf
[2] Yang, G. H., & Soboroff, I. (2016). TREC 2016 Dynamic Domain Track Overview. In TREC. pdf
[3] Zhang, F., Mao, J., Liu, Y., Ma, W., Zhang, M., & Ma, S. (2020, July). Cascade or Recency: Constructing Better Evaluation Metrics for Session Search. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 389-398). pdf
[4] Chen, J., Mao, J., Liu, Y., Zhang, M., & Ma, S. (2019, November). TianGong-ST: A New Dataset with Large-scale Refined Real-world Web Search Sessions. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management (pp. 2485-2488). pdf
[5] Liu, M., Liu, Y., Mao, J., Luo, C., & Ma, S. (2018, June). Towards designing better session search evaluation metrics. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval (pp. 1121-1124). pdf

Supported by