Welcome to the NTCIR-14 We Want Web-2 Task!

We Want Web 2 is an ad hoc Web search task, which we plan to continue for at least three rounds (NTCIR 13-15). Information access tasks have diversified: currently there are various novel tracks/tasks at NTCIR, TREC, CLEF etc. This is in sharp contrast to the early TRECs where there were only a few tracks, where the ad hoc track was at the core. But is the ad hoc task a solved problem? It seems to us that researchers have moved on to new tasks not because they have completely solved the problem, but because they have reached a plateau. Ad hoc Web search, in particular, is still of utmost practical importance. We believe that IR researchers should continue to study and understand the core problems of ranked retrieval and advance the state of the art.

The main task of WWW2 is still a traditional ad hoc task. Pooling and graded relevance assessments will be conducted as usual. After releasing the evaluation results, we will work with participants to conduct an organized failure analysis, following the approach of the Reliable Information Access workshop. We believe that progress cannot be achieved if we just keep looking at mean performance scores.

In We Want Web 2, we are excited to announce the new features, comparing to We Want Web at NTCIR-13: For Chinese subtask, we will release a brand-new training dataset, Sogou-QCL, which contains 0.54 million queries and more than 9 million corresponding documents. For each query-doc pair, we provide 5 kinds of click labels generated by different models. For all the documents, the title and the content are already well-extracted with the help of our friend, Sogou.com. All you need to do is to design your own ranking model!

For more details of Sogou-QCL, you may refer to our resource paper at SIGIR 2018.

Schedule

  • July 1, 2018 Test topics and SogouQCL released
  • Aug 31, 2018 WWW2 task registrations due
  • Sep 30 2018 Run submissions due
  • Oct-Dec 2018 Relevance assessments
  • Jan 2018 failure analysis workshop in Beijing
  • Feb 1, 2019 Evaluation Results and draft overview released
  • Mar 15, 2019 Submission due of participant papers
  • May 1, 2019 Camera-ready participant paper due
  • Jun, 2019 NTCIR-14 Conference & EVIA 2019 in NII, Tokyo

Participation

Participants can…

  • Leverage the new Sogou-QCL data and Sogou-T corpus for Chinese web search
  • Evaluate your latest web search algorithms on the English clueweb12-13B corpus with a new topic set
  • Quantify within-site and cross-site improvements across multiple NTCIR rounds
  • Through a collaboration across organisers and participants, discover what cannot be discovered in a single-site failure analysis
  • Conduct cross-language web search experiments by leveraging the intersection between our Chinese and English topic sets

To participate in the 14th NTCIR WWW task, please read How to participate NTCIR-14 task and register via NTCIR-14 online registration page.

Organizers

  • Jiaxin Mao (Tsinghua University)
  • Tetsuya Sakai (Waseda University)
  • Yiqun Liu (Tsinghua University)
  • Cheng Luo (Tsinghua University)
  • Zhicheng Dou (Renmin University of China)
  • Chenyan Xiong (Carnegie Mellon University)
  • Peng Xiao (Waseda University)

If any questions, feel free to contact the organizers by www2org@list.waseda.jp !

Task Design

  • Chinese and English ad hoc web search that spans at least three rounds of NTCIR (NTCIR-13 through -15)
  • For the Chinese subtask, some user behavior data will be provided to improve search quality.
  • Evaluation measures will also take into account user behavior, along with traditional ranked retrieval measures.
  • Failure analysis pre-NTCIR workshop
  • NEW at WWW2: While WWW1 conducted relevance assessments based on the queries, WWW2 will provide a one-sentence DESCRIPTION field for each query, based on which relevance assessments will be conducted.
  • NEW at WWW2: to counter the problem of relevance assessment incompleteness with the pooling approach and to clarify the limitations of state-of-the-art automatic runs, we allow manual runs.

Data

Query Topics

  • We have 80 queries for Chinese/English subtasks. About 25 topics will be shared among different languages for possible future cross-language research purpose.
  • The Chinese queries are sampled from the median-frequency queries collected from Sogou search logs, while the English Queries are sampled from another international search engine’s logs.
  • The queries are organized in XML can be download here (Chinese, English)

Chinese Training Set

For Chinese subtask, in this round of We Want Web 2, we provide a new training set, Sogou-QCL. Sogou-QCL contains two kinds of training set:

(1) The first set is traditional relevance assessment. It is made of 1000 Chinese queries and for each query, Sogou-QCL contains about 20 query-doc relevance judgments. Each pair is annotated by three trained assessors. Sogou-QCL also provides title and content extracted from raw htmls.

(2) The second set is query click labels. Original clicks often contain much users’ privacy. Therefore we provide the relevance score estimated based on group of users’ behaviors. More specifically, for each query-doc pair, we provide five kinds of weak relevance label outputed by five popular click models: UBM, DBN, TCM, PSCM, and TACM. These click models utilize rich users’ behavior such as click, skip, and dwell time. Sogou-QCL contains more than half a million queries and more than 9 millions of documents. To the best of knowledge, this is so far the largest free training collection for Chinese ranking problem.

It is always a difficult job to handle the raw html content. Therefore we provide fine-grained extracted content with professional tools of Sogou.com. We hope it will reduce some effort for our participants and help them focus on the ranking model design.

Web Collection

  • For the Chinese Subtask, we adopt the new SogouT-16 as the document collection. SogouT-16 contains about 1.17B webpages, which are sampled from the index of Sogou. Considering that the original SogouT might be a little bit difficult to handle for some research groups (almost 80TB after decompression), we prepare a “Category B” version of SogouT-16, which is denoted as “SogouT-16 B”. This subset contains about 15% webpages of SogouT-16 and it will be applied as the Web Collection. We also provide free online retrieval services for free. This Web Collection is absolutely free for research purpose. You can apply online and then drop an email to Dr. Jiaxin Mao (maojiaxin AT gmail.com) to get it. SogouT-16 has a free online retrieval/page rendering service. You will get an account after application for SogouT-16.
  • For the English Subtask, we adopt the ClueWeb12-B13 version as document collection. To obtain the ClueWeb12 corpus, you will have to sign an agreement first. This corpus is also free for research purpose. You only need to pay for the disks and the shipment. More information can be found at Clueweb-12’s homepage. If you have any question, please contact Mr. Chenyan Xiong (cx AT cs.cmu.edu). If you have already owned a permission of ClueWeb12, you can directly use your copy to start the task. ClueWeb-12 also has a free online retrieval/page rendering service, it can be utilized after the agreement is signed.
  • Note: You may feel it spends too much to apply for the document collections. Don’t worry! We have a much easier plan for you. For both subtasks, we have baseline systems (top 1000 documents for each query). For SogouT-16, you only need to sign an application forum online, we can send you the original docs. For Clueweb-12, you just need to sign an agreement with CMU and we can send you the original docs.

Baseline Runs

  • For Chinese Subtask, we have top 1000 results for each topic. These results were obtained by our baseline system.
  • For English Susbtask, we also have top 1000 results for each topics. The results were retrieved by Indri, which is a famous open-source search engine.
  • The baselines are organized in standard TREC format.
  • The dataset will be released soon.

User Behavior Collection

For the Chinese Subtask, we provide a user behavior collection for the participants. The behavior collection include 2 parts.

  • Training set: we have 200 queries. We provide users’ clicks, the URLs of presented results in Sogou’s query logs, and some relevance annotations. More specifically, for a specific item in training set. We have

      {anonymized User ID} {query} {a list of URLs presented to the users} {clicked url} {some rel labels}
    
  • Test set: for the 80 queries used in the Chinese Subtask, we provide users’ clicks, and the URLs of presented results in Sogou’s query logs. The data is organized in a similar way as training set. These user behavior were collected from Sogou’s search logs in 2016. Due to privacy concerns, the users’ IDs are anonymized.

If you want to use these data, please contact Dr. Jiaxin Mao (maojiaxin AT gmail.com) for further details. At this moment, this data is limited to We Want Web 2’s participants.

Time to Obtain Data

  • For the Chinese corpus (Sogou-T) and user behavior dataset, we will first sign an agreement. After the agreement (soft copy) is received, we will send you the data. The delivery of disks takes about one week. The user behavior data can be send to you online separately .
  • For the English corpus (ClueWeb-12), it take about 2-3 weeks for the application and delivery. You will find more details about the application procedure on the official website.
  • We suggest you to start the data application procedure as early as possible.

Submission

This submission instruction will be adopted in both Chinese and English Subtasks. The submissions must be compressed (zip). Each participating team can submit up to 5 runs for each subtask. In a run, please submit up to 100 documents. We may use a cutoff, e.g. 10, in evaluation.

All runs should be generated completely automatically. No manual run is allowed.

Run Names

Run files should be named as

<teamID>-{C,E}-{PU, CU, NU, MAN}-{Base, Own}-[priority].txt

<teamID> is exactly the Team ID when you registered in NTCIR-12. You can contact the organizer if you forgot your team ID.

{C,E}: C for Chinese runs, E for English runs

{PU,CU,NU, MAN}: Runtypes

  • PU (presented URLs, without click info, are utilized for ranking)
  • CU (clicked URLs are utilized for ranking)
  • NU (no URLs are utilized for ranking)
  • MAN (anything that involves manual intervention is called a manual run)

{Base, Own}: all runs which are based on the provided baselines runs are Base runs; otherwise they are Own runs, i.e. the documents are retrieved by the ranking system constructed by yourself.

[priority]: Due to limited resources, we may not include all submitted runs in the result pool. Therefore, it is important for you to point out in which order we should take your runs into consideration for result pool construction. The priority should be between 1 to 5 and 1 is the highest priority.

e.g.

THUIR-C-CU-Base-1

Run Submissions format

For all runs in both subtasks, Line 1 of the submission file must be of the form:

<SYSDESC>[insert a short description in English here]</SYSDESC>

The rest of the file should contain lines of the form:

[TopicID] 0 [DocumentID] [Rank] [Score] [RunName]\n

At most 100 documents should be returned for each query topic.

For example, a run should look like this:

<SYSDESC>[insert a short description in English here]</SYSDESC>
0101 0 clueweb12-0006-97-23810 1 27.73 THUIR-C-CU-Base-1
0101 0 clueweb12-0009-08-98321 2 25.15 THUIR-C-CU-Base-1
0101 0 clueweb12-0003-71-19833 3 21.89 THUIR-C-CU-Base-1
0101 0 clueweb12-0002-66-03897 4 13.57 THUIR-C-CU-Base-1
……

The due for submission is Sep 30th, 2018, Japan time.

For each subtask, you may put all your runs in a zip file and submit it via Dropbox. The submission links are as following:

Chinese Subtask Submission Link

English Subtask Submission Link

Results

TBA