NAIST NLT: Description

NAIST Natural Language Tools



NAIST-NLT (Nara Intitute of Science and Technology Natural Language Tools) aims at providing a flexible natural language processing environment. The system consists of morphological analysers for Japanese and English, a compiler of a DCG to a bottom-up Chart parser, a visual interface for showing the partial results of the parsing process, and supporting programs for implementing natural language grammars. Modularity and extensibility are important features of the tools, and various customization is possible by the users.

Introduction

Practical natural language processing requires a number of resources, such as dictionaries, grammars, parsers for implementing them, and other tools for coping with various linguistic problems. A usable system is indispensable for the development of dictionaries and/or grammars, for designing lexical representation and grammar formalisms, and for building application systems.

NAIST Natural Language Tools (NAIST-NLT) intends to provide such an environment for researchers and developers of natural language systems. The NAIST-NLT consists of morphological and syntactic parsers for general purpose natural language analysis, and a visual interface to encapsulate the implemetation details of the parsing programs.

Overview of the Tools

This section gives a brief overview of the components of the Natural Language Tools. Although they are meant to be used in an integrated way, each of the compoments can be used as a stand-alone system.

JUMAN: Morphological Analyzer

The Japanese morphological analyzer is called JUMAN. The system is implemented in GCC and produces a lattice like structure of Japanese morphemes given a Japanese sentence. It works as a UNIX filter. Besides, an interface to SICStus Prolog is provided, so that the system is invoked from Prolog and returns a lattice of morphemes back to the Prolog program. The attached dictionary contains about 120,000 entries.

The most important feature of the system is that the basic definition of Japanese morphological grammar system, such as the set of part of speech, inflection rules, and connection rules of morphemes. Since a number of Japanese grammars have been proposed, this feature is indispensable.

The English morphological analyzer deals with inflection of English nouns, verbs and adjectives. Since the treatment of the information given by inflection differs in systems, the detailed inforamation is assumed to be written in grammar rules by the user.

SAX: Concurrent Chart Parser

The SAX parsing transforms a Definite Clause Grammar (DCG) [Pereira 80] into a Prolog program that realizes a bottom-up Chart parser. The system is implemented by a collection of Prolog clauses directly derived from DCG rules. They are called SAX clauses. A set of the SAX clauses with the same predicate name corresponds to a grammatical phrase and defines a concurrent process. Parsing is performed through data communication between those processes.

The system is implemented in two levels: The first consists of the transformed grammar rules, and the second works as an interface with other programs as well as the interface to the user.

A number of supporting programs are provided to make users easy to implement their own grammars in the system, e.g., interface programs with the morphological analyzers and an visual interface described in the next section, a unification progam for feature structures.

VisIPS: Visual Interface for Parsing Systems

The VisIPS system is a visual interface to parsing systems that shows partial parse results of the parser in an intuitive way.

The SAX parsing system works as a black box for the users in that the user-defined DCG is transformed into a bottom-up concurrent Chart parser. When running the system, directly looking into the Prolog code is quite complicated since the original grammar rules are transformed in a nontrivial way. However, in a development phase of a grammar or a dictionary, it is indispensable to have some way to figure out the system's behaviour. It should be noted that the users are usually not interested in the transformation details of the system. The system, therefore, should inform the behaviour that is related only with the user defined grammar and dictionary.

The VisIPS system is originally developed to monitor the behaviour of the SAX processes. It is, however, applicable to any phrase structure based parsing systems. It shows occurrences of phrases in a triangular table. Two versions of the system have been developed: One is a batch system where the information of phrase structures are written out into a file in a predetermined format and VisIPS shows the results after the parsing process terminates. The other is an interactive mode where a newly constructed phrase structure is immediately presented. Both versions are implemented in C and the X11R5 system. Current system uses socket I/O facility of SICStus Prolog for data communication.

Figure shows the system in operation. On the background is the VisIPS triangular table, that shows the partial parse results in a triagular table like the one used in the CKY parsing algorithm. When the user points the phrase name in the box by the mouse and click the button, a small window ((3) in the Figure) appears, in which as many number of the phrase are written if there is syntactic ambiguity. Pointing and clicking one of the phrase name in the box produces another window (4), in which detailed information is included. In (4), the first box shows the position and length of the phrase, the name of the phrase, and the identifying number if there is ambiguity. The second box shows the inside information of the phrase. The third and fourth boxes show the list of parent phrases and the list of child phrases in the parse tree. By clicking the words ``Children List'' in the box, a parse tree appears in another window (5). The nodes in the parse tree interrelate with the phrase names in the triangular table, so that clicking a node in the tree makes the corresponding phrase in the table to blink.

Description of the tools

The following list shows the information that is common to the presented systems.
[Manuals] Japanese manuals are obtainable as NAIST technical reports. English manuals will be provided shortly.
[Price] free
[Limitation] no limitation
[FTP site] {\sf cactus.aist-nara.ac.jp: ftp}
[Media] available in an 8mm tape
[Format] UNIX file format (EUC for Japanese)
[Contact person] Yuji Matsumoto ({\sf matsu@is.aist-nara.ac.jp})
[Platform] Sun OS 4.1.3
[Implementation Languages] SICStus Prolog, gcc, X11R5
[Size] 15M bytes (approximate size of the whole tools)

Bibliography

[Matsumoto 93a] Matsumoto, Y., Kurohashi, S., Utsuro, T., Myoki, Y., and Nagao, M., ``Japanese Morphological Analysis System JUMAN Manual, version 1.0,'' Nara Institute of Science and Technology, 1993.
[Matsumoto 93b] Matsumoto, Y., Den, Y. and Yanagi, K., ``A Flexible Natural Language System with Concurrency and Meta-level Processing,'' Fourth International Workshop on Natural Language Understanding and Logic Programming, pp.146-157, 1993.
[Pereira 80] Pereira F.C.N. and Warren, D.H.D., ``Definite Clause Grammars for Language Analysis -- A Survey of the Formalism and a Comparison with Augmented Transition Networks,'' Artificial Intelligence, Vol.13, pp.231-278, 1980.
___________________________________________________________________________
Written by Yuji Matsumoto