[home]
Content Based Retrieval of Speech Documents using
Information Retrieval Techniques
This PhD thesis was supervised by
Dr. Ross Wilkinson,
who is now the Research Leader of the
Mathematical and Information Sciences of CSIRO, and
Dr. Justin Zobel,
Senior Lecturer of the
Department of Computer Science, RMIT.
This is ALWAYS under construction.
A modified abstract of my thesis
Spoken documents occur in forms such as voice mail, radio
broadcasts, dictation and court transcripts. Speech is also an
element of multimedia documents such as news broadcasts,
speech-annotated images, home videos, video conference transcripts, and films. Techniques for managing and retrieving
written documents are well-understood. This thesis addressed
the problem of retrieving spoken documents.
The most common approach to retrieval of spoken documents is
to perform speech recognition first, to create transcriptions
of the speech data, then techniques currently used in textual
information retrieval can be used to search these transcriptions.
Three assumptions are made regarding the speech recognition
process: that words useful in retrieving relevant documents can
be recognised accurately; that the documents to be recognised
using language that has been modelled by the recognition system
can improve accuracy; and that the computational resources
required for recognition are readily available.
In this thesis, we considered the circumstances where the above
assumptions are not necessary valid. First, it may not be
possible to use an accurate word recogniser because the resources
may not be available. Second, the document collection could
contain a high proportion of out-of-vocabulary (OOV) words,
which can adversely affect retrieval when mis-recognised.
In the first part of this thesis, we investigated the effectiveness of phoneme n-gram retrieval, where spoken
documents were either recognised directly as phoneme sequences
or as words and later translated to phoneme sequences using a
pronouncing dictionary. We explored the feasibility of retrieval
using phoneme n-grams as well as he effect of using IR techniques
such as stopping, word boundary information, and combination of
evidence. The standard document collection of TREC was used to
evaluate our experiments. We found that phoneme n-grams had
little impact on retrieval effectiveness because there was
sufficient evidence from other words to allow the retrieval
of relevant documents.
- Experiments in Spoken Document Retrieval using Phoneme N-grams
C. Ng, R. Wilkinson and J. Zobel,
in Speech Communication, Vol 32, Issue 1-2, Sept 2000, Pg 61 - 77.
- The RMIT/CSIRO Ad Hoc, Q & A, Web, Interactive, and Speech Experiments at TREC 8
M. Fuller, M. Kaszkiel, S. Kimberley, C. Ng, R. Wilkinson, M. Wu
and J. Zobel,
in Proceedings of the Eighth Text REtrieval Conference (TREC-8),
Gaithersburg, MD, USA, Nov 1999, Pg 549 -- 564.
- TREC 7 Ad Hoc, Speech, and Interactive tracks at MDS/CSIRO
M. Fuller, M. Kaszkiel, D. Kim, C. Ng, J. Robertson, R. Wilkinson, M. Wu and J. Zobel,
in Proceedings of the Seventh Text REtrieval Conference (TREC-7),
Gaithersburg, MD, USA, Nov 1998, Pg 465 -- 474.
- Factors affecting Speech Retrieval
C. Ng, R. Wilkinson and J. Zobel,
Student Day paper, in Proceedings of the Seventh Australian Speech Science and Technology Conference (SST-98) which has been incorporated into the Fifth International Conference on Spoken Language Processing (ICSLP'98), Sydney, Australia, 30th Nov - 4th Dec 1998, Pg 45 -- 50.
- Speech Retrieval using Phonemes with Error Correction
C. Ng and J. Zobel,
extended abstract, Proceedings of the Twenty-First International ACM-SIGIR Conference on Research and Development in Information Retrieval, Melbourne, Australia, Aug 1998, Pg 365 -- 366.
- MDS TREC 6 Report
M. Fuller, M. Kaszkiel, C.L. Ng, P. Vines, R. Wilkinson and J. Zobel,
in Proceedings of the Sixth Text REtrieval Conference (TREC-6),
Gaithersburg, MD, USA, Nov 1997, Pg 241 -- 258.
The views made on this page are solely my own. Do email me at
[email protected] if you like to comment.
[home]
Corinna NG
[email protected]
last modified 19th April 2001