An Automatic Vocabulary Switching System Converting from Keyword Phrases Assigned by Searchers to Descriptors
Many searchers of online information retrieval systems are unfamiliar with the structure and content of thesauri and they usually have the problem of selecting adequate descriptors for their search. Therefore, the system which converts a set of searcher-assigned keywords to the corresponding descriptors will be quite useful and helpful for them.

This paper describes the result of an experiment on developing an automatic vocabulary switching system. It is designed to switch those keywords included in the field of “System & Control Theory”, “Control Technology”, “Computer Hardware”, “Computer Software”, and “Computer Application”. Switching is carried out by using a special conversion table.

1506 descriptors as well as 659 nondescriptors were extracted from the INSPEC thesaurus tape for this experiment. Since each descriptor has broader, narrower, related, and/or used-for terms, information about these relations was used to compile the conversion table. In addition, 1531 single words were extracted from the descriptors and nondescriptors to produce ingredients of the searcher-assigned keywords. This mapping relation between single words and descriptors was also kept in the same table. Thus, the table includes four kinds of relations among descriptors as well as whole-part relation between words and descriptors. Altogether it shows the extent of relatedness of individual single words to the descriptors included in the INSPEC thesaurus.

Input keywords are decomposed into single words, and they are collated with the words in the table. Then the relation vector of the matched words are taken from the table and summed up. Finally, each element of the resultant vector, namely descriptor, is sorted by descending order of the corresponding values and those descriptors of which order is within the range specified by the searcher are outputted.

Actual switching was done successively and the performance of the system seems to be promising although there are still several problems to be solved.

