News
RIRES'03 Participants
Publications
![]()
Russian Information
Retrieval Evaluation Seminar
The Initiative
Russian information retrieval evaluation initiative was launched in 2002 with
purpose to increase communication and support community of researchers (from
both academia and industry) in the area of text retrieval for Russian language
collections by providing infrastructure necessary for evaluation of information
retrieval methodologies. In particular, series of Russian Information Evaluation
Retrieval Seminars (ROMIP seminars) is planned to be held on yearly basis.
In many respects ROMIP seminars are similar to other world information retrieval
events such as TREC, CLEF, NTCIR, etc. Initiation of the new one was motivated
by several reasons:
absence of publicly available Russian test collections
relatively low interest for the creation of Russian language tracks/collections
within the framework of the existing evaluation initiatives (as far as we know
only CLEF'2003 had Russian document collection but it was rather small);
low rate of participation of Russian research groups in the existing evaluation
initiatives.
Similar to TREC ROMIP has cycle nature and is overseen by a program committee
consisting of representatives from academia and industry. Given collection and
tasks participants run their own system on the data and submit results to the
organizing committee. Collected results are independently judged and the cycle
ends with a workshop for sharing experience and discussing future plans.
However, we don't precisely copy TREC tasks and methodology. Indeed we adapt
them to our circumstances and combine them with other recent approaches in the
information retrieval evaluation domain.
The First Seminar
First seminar was organized in 2003 with final workshop attached to the Russian
Conference on Digital Libraries (St. Petersburg, October 2003). We had nine
applications for participation but only seven teams were able to complete tasks
on schedule. Among the RIRES'03 participants were several important industry
representatives including two major players on the Russian web search market.
The participation from academia was lower probably because research prototypes
were not ready for scale of considered tasks and deadlines were tight.
ROMIP'2003 had two tracks - "adhoc" retrieval and Web-site classification using
7Gb+ subset of the narod.ru domain.
Queries for "adhoc" track were selected from the daily log of the popular
Russian Web retrieval system Yandex (www.yandex.ru). To prevent fine-tuning of
results participants were asked to perform 15000 queries and for each query
submit the first 100 results to the organizing committee. Queries for evaluation
(about 50) were selected after all the participants submit their results.
For the evaluation of results we used the TREC-like pooling mechanism. However
our evaluation procedure had several significant differences:
We collected multiple assessment judgments per query/document pair (at least two)
to improve recall approximation and decrease the influence of subjectivity.
To minimize discrepancy in assessor's reconstructed information needs for
different assessors we used the "extended" version of the search problem
specification. An extended version of the search problem includes the native
language description of expected results and was prepared during the selection
of queries to be evaluated. The purpose of extended description was to clarify
the query and minimize the number of possible interpretations.
Evaluation of a query pool was shared between three assessors and each of them
provided judgments for 70% of query-document pairs. This way we can collect more
information about assessors and therefore we can use more sophisticated
approaches for deducing final judgments.
The training set for the classification track was based on the existing Web
catalog for narod.ru sites. We selected about 170 categories from the second
level of hierarchy. Each of selected categories had at least 5 samples.
Participants were asked to assign a list containing maximum 5 categories to each
of 22000 web sites from the collection. At the evaluation stage all the
assignments from 17 selected categories were judged by at least two assessors.
Contact us: [email protected].
Яндекс.Маркет