- A Clustering Method of Scientific Literature Based on Averaged Citation Multiplicity
- No.17, p.93-102
A Clustering method is proposed in order to form groups of articles in a specific discipline of science, with an expression of their inter-relationship. The technique is based on the similarity of citation, introducing a topological space which is called here “citation spac”: Each cited article forms one axis of the citation space and a source article is considered to be a point in this space. Then the measure of clustering is defined to be a scalar product for arbitrary pair of articles. Furthermore, any groups of source articles may be represented by a point in the space, not by a set of points, with each coordinate given by the citation probability of the corresponding axis of article in this group. On the other hand, the hierarchical clustering method imposes a restriction on the amount of data, due to memory requirement and the complexity of the resulting dendrogram. The above argument solves this problem by taking initial clusters for the hierarchical connection to be groups of articles instead of individual papers.
This method is applied to 3505 articles in instrumentation/control engineering extracted from Science Citation Index (1977) and 225 initial clusters are made by the source articles citing a specific article (axis). On the resulting dendrogram, groups, formed above a specified similarity are summarized and named according to their contents, giving 25 reduced clusters hierarchically connected. The major part of the clusters shows theoretical development of the control engineering including 4 distinct groups of reseaches in USSR.
The essential difference between the Garfield's clustering and the present method is that the former is based on set-theoretical definition of the similarity measure, while the latter uses the notion of a topological space.