Welcome to the NTCIR-13 WWW Task!
Note: A new round of We Want Web task (WWW2) has already started. You may find more information on the NEW WEBSITE. If you are interested in the data of WWW1@NTCIR-13 for research purpose, you may contact the organizers for more information.
NTCIR-13 WWW is an ad hoc Web search task, which we plan to continue for at least three rounds (NTCIR 13-15). Information access tasks have diversified: currently there are various novel tracks/tasks at NTCIR, TREC, CLEF etc. This is in sharp contrast to the early TRECs where there were only a few tracks, where the ad hoc track was at the core. But is the ad hoc task a solved problem? It seems to us that researchers have moved on to new tasks not because they have completely solved the problem, but because they have reached a plateau. Ad hoc Web search, in particular, is still of utmost practical importance. We believe that IR researchers should continue to study and understand the core problems of ranked retrieval and advance the state of the art.
The main task of WWW is a traditional ad hoc task. We will also consider a session-based subtask. Pooling and graded relevance assessments will be conducted as usual. After releasing the evaluation results, we will work with participants to conduct an organized failure analysis, following the approach of the Reliable Information Access workshop. We believe that progress cannot be achieved if we just keep looking at mean performance scores. More information is provided in the following figure.
Tentative Schedule
Participation
Participants can…
To participate in the 13th NTCIR WWW task, please read How to participate NTCIR-13 task and register via NTCIR-13 online registration page.
Organizers
If any questions, feel free to contact the organizers by ntcirwww@list.waseda.jp !
Task Design
Data
Query Topics
Web Collection
Baseline Runs
User Behavior Collection
For the Chinese Subtask, we provide a user behavior collection for the participants. The behavior collection include 2 parts.
{anonymized User ID} {query} {a list of URLs presented to the users} {clicked url}{some rel labels}
These user behavior were collected from Sogou’s search logs in 2016. Due to privacy concerns, the users’ IDs are anonymized.
If you want to use these data, please contact Mr. Cheng Luo (luochengleo AT gmail.com) for further details. At this moment, this data is limited to We Want Web’s participants.
Time to Obtain Data
Submission
This submission instruction will be adopted in both Chinese and English Subtasks. The submissions must be compressed (zip). Each participating team can submit up to 5 runs for each subtask. In a run, please submit up to 100 documents. We may use a cutoff, e.g. 10, in evaluation.
All runs should be generated completely automatically. No manual run is allowed.
Run Names
Run files should be named as “<teamID>-{C,E}-{PU, CU, NU}-{Base, Own}-[priority].txt”
<teamID> is exactly the Team ID when you registered in NTCIR-12. You can contact the organizer if you forgot your team ID.
{C,E}: C for Chinese runs, E for English runs
{PU,CU,NU}: Runtypes
{Base, Own}: all runs which are based on the provided baselines runs are Base runs; otherwise they are Own runs, i.e. the documents are retrieved by the ranking system constructed by yourself.
[priority]: Due to limited resources, we may not include all submitted runs in the result pool. Therefore, it is important for you to point out in which order we should take your runs into consideration for result pool construction. The priority should be between 1 to 5 and 1 is the highest priority.
e.g.
THUIR-C-CU-Base-1
Run Submissions format
For all runs in both subtasks, Line 1 of the submission file must be of the form:
<SYSDESC>[insert a short description in English here]</SYSDESC>
The rest of the file should contain lines of the form:
[TopicID] 0 [DocumentID] [Rank] [Score] [RunName]\n
At most 100 documents should be returned for each query topic.
For example, a run should look like this:
<SYSDESC>[insert a short description in English here]</SYSDESC>
0101 0 clueweb12-0006-97-23810 1 27.73 THUIR-C-CU-Base-1
0101 0 clueweb12-0009-08-98321 2 25.15 THUIR-C-CU-Base-1
0101 0 clueweb12-0003-71-19833 3 21.89 THUIR-C-CU-Base-1
0101 0 clueweb12-0002-66-03897 4 13.57 THUIR-C-CU-Base-1
……
The due for submission is July 16th, Japan time.
For each subtask, you may put all your runs in a zip file and submit it via Dropbox. The submission links are as following:
© copyright 2016-2017. All rights reserved by NTCIR-WWW organizers.