Introduction
The Session Search (SS) task is a new pilot task in NTCIR-16 to support intensive investigations of session search or task-oriented search. Nowadays, users depend increasingly on search engines to either gain useful information or to complete tasks. In complex search scenarios, a single query may not fully cover user information need. Therefore, users will submit more queries to search systems within a short time interval until they are satisfied or give up. Such a search process is called a search session or task. As users’ search intents may evolve within search sessions, their actions and decisions will also be greatly impacted. Going beyond ad-hoc search and considering the contextual information within sessions has been proved efficient for user intent modeling in IR communities. However, existing tasks in NTCIR have not involved session-based retrieval yet. To this end, we propose this new Session Search task to provide practical datasets as well as evaluation methodology to researchers in the related domain.
Subtasks
To better assess the search effectiveness at both query-level and session-level, we aim to set up two subtasks as follows:
- Fully Observed Session Search (FOSS): For a k-length session, we provide full session contexts in the first (k-1) queries. Participants need to re-rank the candidate documents for the last query of a session. This setting follows TREC Session Tracks to enable ad-hoc evaluation by using metrics such as NDCG, AP, and RBP, etc.
- Partially Observed Session Search (POSS): In this subtask, we truncate all sessions before the last query. For a session with k queries (k ≥ 2), we only reserve the session contexts in the first queries, where 1 ≤ n ≤ k-1. The value of n varies in different sessions. Participants will need to re-rank documents for the last k-n queries(query) according to the partially observed contextual information in previous search rounds. Session-level metrics such as RS-DCG and RS-RBP will be adopted for the evaluation of system effectiveness.
Differences between SS and previous related work
NTCIR-16 Session Search (SS) | TREC Session Tracks | |
---|---|---|
Number of Sessions (in Chinese) | - Training: 147, 154, with human relevance labels for the last query of 2, 000 sessions. - FOSS Testing Subtask: 1, 817. - POSS Testing Subtask: 1, 203. |
English: 76~1, 257 |
Session Datasets | - Extracted from a Sogou search log and two field study datasets | Session Track 2011-2014 |
Document Collection | - With about 1, 000, 000 documents. - Contains 81.23 candidate documents in average for all queries in the testing set. |
ClueWeb09/ClueWeb12 |
Source/Generation of session data | Refined from a search log from Sogou.com or extracted from two large-scale field studies. | Generated by real search users based on manually designed topics. |
Support from log analysis for annotation? | √ | × |
Support session-level evaluation? | √ | × |
Expected Results
We plan to attract at least 15 active participants. As this is the first year, we only construct Chinese-centric session collections. Participants could leverage the training set for either single-task or multi-task learning. As for the testing collection, the number of session in both subtasks are expected to be more than 1,000. Having collected running results from all teams, we will recruit accessors for relevance annotation and further use both query-level and session-level metrics for evaluation. Through these efforts, we hope that there will be a technological breakthrough in optimizing session-level document ranking.
Important Dates
All deadlines are at 11:59pm in the Anywhere on Earth (AOE) timezone.
Session Search registration Due: Aug 25, 2021
Dataset Release: Aug 15, 2021
Formal Run: 👉Oct 1, 2021 - Dec 31, 2021
Relevance Assessment: Jan, 2022 - Feb, 2022
Evaluation Result Release: Mar 1, 2022
Draft Task Overview Paper Release: Mar 1, 2022
Draft Participant Paper Submission Due: April 1, 2022
All Camera-ready Paper Submission Due: Jun 1, 2022
NTCIR-16 Conference in NII, Tokyo, Japan: Jun 2022
Evaluation Measures
For the FOSS subtask, we adopt nDCG, AP, and RBP, etc.
For the POSS subtask, we use session-level metrics such as RS-DCG and RS-RBP.
The official evaluation tool is coming soon!
Data and File format
We provide three directories:
NTCIR-16-SS
|
|----- ./document_collection
| |----- ./doc [A collection with about 1, 000, 000 pages. Each directory contains about 10, 000 files. ]
| |----- qid2docs.json [A file mapping testing query IDs into their corresponding candidate document IDs.]
|
|----- ./sessions
| |----- ./training |----- training_sessions.txt
| |
| |----- ./testing
| |----- ./FOSS |------ testing_sessions_foss.txt
| |
| |----- ./POSS |------ testing_sessions_poss.txt
|
|----- ./training_human_labels |----- human_labels.txt
|----- README.md
1) Firstly, for all session files, each session is split by two line breaks (\n\n
).
2) Each training session is formatted as follows:
SessionID 87
----------------------------
画杨桃 q198 1427848224.93
1 http://www.lbx777.com/yw06/x_hyt/kewen.htm d1882 404 0 -1
2 http://pic.sogou.com/pics?query=%BB%AD%D1%EE%CC%D2&p=40230500&st=255&mode=255 d1883 <unk> 0 -1
3 http://tv.sogou.com/v?query=%BB%AD%D1%EE%CC%D2&p=40230600&tn=0&st=255 d1884 画杨桃-搜索页 0 -1
4 http://baike.sogou.com/v8080089.htm d1885 画杨桃 0 -1
5 http://www.lspjy.com/thread-112497-1-1.html d1886 人教版小学三年级下册语文《画杨桃》教学设计优质课教案 0 -1
6 http://weixin.qq.com/ d5 微信,是一个生活方式 0 -1
7 http://wenku.baidu.com/view/fa4903205901020207409c89.html d1887 【图文】画杨桃_百度文库 0 -1
8 http://www.21cnjy.com/2/8135/ d1888 画杨桃课件_ 0 -1
9 http://wenwen.sogou.com/s/?sp=S%E7%94%BB%E6%9D%A8%E6%A1%83 d1889 搜狗搜索 0 -1
10 http://www.aoshu.com/e/20090604/4b8bcabd28495.shtml d1890 画杨桃_三年级语文下册课件_奥数网 0 -1
----------------------------
画杨桃ppt课件 q199 1427848230.2
1 http://wenku.baidu.com/view/bfe0c8edf8c75fbfc67db205.html d1894 【图文】画杨桃ppt课件精品_百度文库 1 1427848232.105
2 http://www.1ppt.com/kejian/8846.html d1895 《画杨桃》PPT课件 0 -1
3 http://www.1ppt.com/kejian/8851.html d1896 《画杨桃》PPT课件6 0 -1
4 http://www.xiexingcun.com/xy6/HTML/53403.html d1897 《画杨桃》公开课ppt课件(24页)-免费高速下载 0 -1
5 http://renjiaoban.21jiao.net/sanxia/huayangtao/ d1898 <unk> 0 -1
6 http://yuwen.chazidian.com/kejian108520/ d1899 画杨桃ppt课件下载 0 -1
7 http://www.docin.com/d-239045.html d1900 <unk> 0 -1
8 http://www.glzy8.com/show/162484.html d1901 11画杨桃PPT课件_管理资源吧 0 -1
9 http://www.xiexingcun.com/xy6/HTML/17470.html d1902 《画杨桃》ppt课件-免费高速下载 0 -1
10 http://www.xiexingcun.com/xy6/HTML/61592.html d1903 《画杨桃》ppt课件【13页】-免费高速下载 0 -1
----------------------------
画杨桃ppt q200 1427848257.0
1 http://wenku.baidu.com/view/300cd979f242336c1eb95e2b.html d1904 【图文】画杨桃PPT_百度文库 1 1427848258.188
2 http://www.1ppt.com/kejian/8846.html d1895 《画杨桃》PPT课件 0 -1
3 http://wenku.baidu.com/view/e7e42c25b4daa58da0114ad4.html d1905 【图文】画杨桃ppt课件_百度文库 0 -1
4 http://www.xiexingcun.com/xy6/HTML/53403.html d1897 《画杨桃》公开课ppt课件(24页)-免费高速下载 0 -1
5 http://www.docin.com/d-239045.html d1900 <unk> 0 -1
6 http://www.xiexingcun.com/xy6/HTML/53401.html d1906 《画杨桃》ppt课件(19页)-免费高速下载 0 -1
7 http://www.landong.com/x_yw_117_3640.htm d1907 <unk> 0 -1
8 http://www.xiexingcun.com/xy6/HTML/61592.html d1903 《画杨桃》ppt课件【13页】-免费高速下载 0 -1
9 http://www.1ppt.com/kejian/8851.html d1896 《画杨桃》PPT课件6 0 -1
10 http://www.doc88.com/p-891111956128.html d1908 【精品】:画杨桃PPT 0 -1
- The first line:
<SessionID><tab><session ID>
, such asSessionID 87
. - The first line in a query:
<query string><tab><query ID><tab><query start time>
, such as画杨桃 q198 1427848224.93
. - Each rest line in a query:
<rank><tab><url><tab><document ID><tab><document title><tab><clicked><tab><click timestamp>
, such as2 http://www.1ppt.com/kejian/8846.html d1895 《画杨桃》PPT课件 0 -1
. - If the title of a document is unknown, then the document title will be represented by
<unk>
. - If a document is not clicked, then the click timestamp is -1.
3) Each testing session is formatted as follows:
SessionID 8
----------------------------
Tensorflow q64324 1596009033.208
1 https://tensorflow.google.cn/ d527264 TensorFlow 0 -1 2
2 http://c.biancheng.net/tensorflow/ d527265 TensorFlow教程:TensorFlow快速入门教程(非常详细) 0 -1 0
3 https://baike.baidu.com/item/Tensorflow/18828108 d527266 TensorFlow_百度百科 0 -1 0
4 https://www.oschina.net/p/tensorflow?hmsr=aladdin1e1 d527267 TensorFlow - 机器学习系统 0 -1 0
5 https://www.oschina.net/p/tensorflow?hmsr=aladdin1e1 d527267 TensorFlow - 机器学习系统 0 -1 0
6 http://playground.tensorflow.org/ d527268 tensorflow neural network playground - A Neural Network... 0 -1 2
7 https://github.com/tensorflow/tensorflow d527269 GitHub - tensorflow/tensorflow: An Open Source Machine... 0 -1 0
8 https://www.jianshu.com/p/4665d6803bcf d527270 TensorFlow入门极简教程(一) - 简书 0 -1 0
9 https://www.zhihu.com/question/49909565 d527271 TensorFlow 如何入门,如何快速学习? - 知乎 0 -1 0
10 https://blog.csdn.net/l7h9ja4/article/details/92857163 d527272 终于来了!TensorFlow 2.0入门指南(上篇)_机器学习算法..._CSDN博客 0 -1 0
----------------------------
Pytorch q64325 1596009037.5
- The first line:
<SessionID><tab><session ID>
, such asSessionID 8
. - The first line in an observed/unobserved query:
<query string><tab><query ID><tab><query start time>
, such asTensorflow q64324 1596009033.208
. - Each rest line in an observed query:
<rank><tab><url><tab><document ID><tab><document title><tab><clicked><tab><click timestamp><tab><usefulness>
, such as1 https://tensorflow.google.cn/ d527264 TensorFlow 0 -1 2
. - If the title of a document is unknown, then the document title will be represented by
<unk>
. - If a document is not clicked, then the click timestamp is -1.
- The usefulness ratings are 4-scale (0-3), annotated by the first-tier search users. We provide this information to explore to what extent search systems will be improved if true and instant user feedback is available.
4) Each line in the human_labels.txt
is formatted as:<ID><tab><training session ID><tab><query ID><tab><document ID><tab><relevance><tab><valid>
.
Submission format
1) Each team can submit up to six NEW or REP runs for each subtask.
2) The submission file should be named as [TEAMNAME]-{FOSS, POSS}-{NEW, REP}-[1-5].txt
, such as THUIR1-FOSS-NEW-1.txt
. Note that for the organizers’ convenience, there should not be any hyphen-minus (-
) in the TEAMNAME. A NEW run means you use a novel approach, while a REP run means you reproduce some model.
3) As for the content, the first line should be a short English description of this particular run, such as BM25F with Pseudo-Relevance Feedback
. The rest lines should be formatted as [SessionID]<tab>[QueryID]<tab>[QueryPosInSession]<tab>[DocumentID]<tab>[Rank]<tab>[Score]<tab>[RunName]
. Such as
1<tab>q3<tab>2<tab>d587602<tab>1<tab>27.73<tab>THUIR-FOSS-NEW-1
1<tab>q3<tab>2<tab>d587603<tab>2<tab>25.15<tab>THUIR-FOSS-NEW-1
Please do not include more than 20 candidate documents per case!
Submitting runs
LeaderBoard
Scores presented in the leaderboard is preliminary evaluation.
Foss
For each foss run, We calculated its three metrics, nDCG@3, nDCG@5 and nDCG@10. The rank is sorted by nDCG@3 scores.
Rank | Runname | Teamname | nDCG@3 | nDCG@5 | nDCG@10 |
---|---|---|---|---|---|
1 | THUIR2-FOSS-NEW-5 | THUIR2 | 0.095888 | 0.118521 | 0.137952 |
2 | RUCIR-FOSS-REP-21 | RUCIR | 0.040584 | 0.066036 | 0.104313 |
3 | RUCIR-FOSS-REP-31 | RUCIR | 0.036234 | 0.067676 | 0.126424 |
4 | SCIR-FOSS-NEW-1 | SCIR | 0.035558 | 0.048421 | 0.060942 |
5 | RUCIR-FOSS-REP-3 | RUCIR | 0.033972 | 0.042868 | 0.084889 |
6 | RUCIR-FOSS-REP-22 | RUCIR | 0.029319 | 0.052199 | 0.086403 |
7 | THUIR2-FOSS-NEW-2 | THUIR2 | 0.022508 | 0.038771 | 0.083301 |
8 | RUCIR-FOSS-REP-2 | RUCIR | 0.020233 | 0.027601 | 0.081060 |
9 | MM6-FOSS-REP-1 | MM6 | 0.018319 | 0.027590 | 0.031082 |
10 | THUIR2-FOSS-NEW-6 | THUIR2 | 0.016912 | 0.028026 | 0.068344 |
11 | MM6-FOSS-REP-3 | MM6 | 0.016017 | 0.020287 | 0.024557 |
12 | THUIR3-FOSS-REP-1 | THUIR3 | 0.015559 | 0.032767 | 0.055631 |
13 | MM6-FOSS-NEW-15 | MM6 | 0.015232 | 0.025305 | 0.041431 |
14 | THUIR2-FOSS-NEW-3 | THUIR2 | 0.013015 | 0.020961 | 0.062601 |
15 | MM6-FOSS-NEW-1 | MM6 | 0.012841 | 0.023075 | 0.041941 |
16 | THUIR1-FOSS-REP-1 | THUIR1 | 0.012249 | 0.032705 | 0.089727 |
17 | MM6-FOSS-NEW-2 | MM6 | 0.011770 | 0.019961 | 0.047947 |
18 | MM6-FOSS-REP-2 | MM6 | 0.011487 | 0.013464 | 0.017026 |
19 | RUCIR-FOSS-REP-1 | RUCIR | 0.010817 | 0.024832 | 0.074639 |
20 | MM6-FOSS-NEW-21 | MM6 | 0.009210 | 0.017580 | 0.036537 |
21 | MM6-FOSS-REP-4 | MM6 | 0.008060 | 0.017646 | 0.042782 |
22 | MM6-FOSS-NEW-3 | MM6 | 0.007464 | 0.015314 | 0.031138 |
-- | TEST-FOSS-REP-3 | unknown | 0.005495 | 0.013620 | 0.023823 |
Poss
For each poss run, We calculated its two metrics, RS_DCG and RS_RBP. The rank is sorted by RS_DCG scores.
Rank | Runname | Teamname | RS_RBP | RS_DCG |
---|---|---|---|---|
1 | THUIR2-POSS-NEW-1 | THUIR2 | 0.023478 | 0.013074 |
2 | RUCIR-POSS-REP-3 | RUCIR | 0.003193 | 0.005304 |
3 | RUCIR-POSS-REP-2 | RUCIR | 0.001637 | 0.002602 |
4 | RUCIR-POSS-REP-1 | RUCIR | 0.001615 | 0.002501 |
5 | MM6-POSS-REP-2 | MM6 | 0.000602 | 0.000731 |
6 | MM6-POSS-REP-1 | MM6 | 0.000550 | 0.000866 |
Organisers
Yiqun Liu [yiqunliu@tsinghua.edu.cn] (Tsinghua University)
Jia Chen [chenjia0831@gmail.com] (Tsinghua University)
Weihao Wu [wuwh19@mails.tsinghua.edu.cn] (Tsinghua University)
Beining Wang [wang-bn19@mails.tsinghua.edu.cn] (Tsinghua University)
Fan Zhang [franky94@gmail.com] (Wuhan University)
Jiaxin Mao [maojiaxin@gmail.com] (Renmin University of China)
Weizhi Ma [mawz12@hotmail.com] (Tsinghua University)
Xiaohui Xie [xiexh_thu@163.com] (Tsinghua University)
Contact Email: session-search-org@googlegroups.com
Please feel free to contact us! 😉
References
[1] Carterette, B., Kanoulas, E., Hall, M., & Clough, P. (2014). Overview of the TREC 2014 session track. pdf
[2] Yang, G. H., & Soboroff, I. (2016). TREC 2016 Dynamic Domain Track Overview. In TREC. pdf
[3] Zhang, F., Mao, J., Liu, Y., Ma, W., Zhang, M., & Ma, S. (2020, July). Cascade or Recency: Constructing Better Evaluation Metrics for Session Search. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 389-398). pdf
[4] Chen, J., Mao, J., Liu, Y., Zhang, M., & Ma, S. (2019, November). TianGong-ST: A New Dataset with Large-scale Refined Real-world Web Search Sessions. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management (pp. 2485-2488). pdf
[5] Liu, M., Liu, Y., Mao, J., Luo, C., & Ma, S. (2018, June). Towards designing better session search evaluation metrics. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval (pp. 1121-1124). pdf