![]() |
Precise Understanding of Language by Computers |
The PULC project has the following characteristics:
The PULC project contrasts with other projects that aim at a superficial level of language "understanding" using so-called "broad coverage" text processing techniques. For example, current information retrieval, information extraction, and so-called "question answering" techniques essentially try to do pattern-matching between the structures of a query and a text, and so the structure and content of the two must closely resemble each other, and the only questions that might be addressable are of the factoid who-did-what-to-whom kind.
Representing and utilizing world knowledge in a computer is an extremely difficult endeavor. Doing so for all knowledge and all topics is impossible now and in the forseeable future. In fact, this is an AI-complete problem (if it were possible, then all Artificial Intelligence tasks would be solved, since the computer would possess knowledge describing how to do any ordinary or sophisticated task, and so would be able to do it). Therefore, it is impossible to precisely understand arbitrary texts on arbitrary topics.
It is possible, nevertheless, to achieve precise understanding on particular texts and topics if the necessary background knowledge is fed into the computer. This is possible for certain kinds of texts.
Of course, the entire field of Linguistics aims at mapping out the human knowledge of English and all other natural languages, and there has been a decent progress in the past several decades in discovering and documenting this knowledge. However, there are still many gaps in this knowledge. Moreover, the knowledge is usually stated only semi-rigorously, and is not immediately applicable for a computational system. The PULC project aims at stating the linguistic knowledge in a comprehensive, consistent, and coherent format, so precise that it could be used by a computer.
On the other hand, there has been a lot of work in natural language processing (NLP) aiming at producing computer applications that do useful things with language input. But just as linguistic theories to the most part ignore computational issues, NLP work by and large ignores linguistic theory, and moreover emphasizes formalisms and techniques while ignoring the study of language phenomena themselves. The PULC project is founded on the belief that no amount of computational sophistication can replace a meticulous investigation of the subject matter itself (i.e. language).
In short, the PULC project aims at both computational rigor and a principled scrupulous study of language phenomena.