VIETNAM NATIONAL UNIVERSITY, HANOI
UNIVERSITY OF ENGINEERING AND TECHNOLOGY
NGUYEN KIM ANH
VIETNAMESE WORD CLUSTERING
AND ANTONYM IDENTIFICATION
MASTER THESIS OF INFORMATION TECHNOLOGY
Hanoi - 2013
VIETNAM NATIONAL UNIVERSITY, HANOI
UNIVERSITY OF ENGINEERING AND TECHNOLOGY
NGUYEN KIM ANH
VIETNAMESE WORD CLUSTERING
AND ANTONYM IDENTIFICATION
Major: Computer science
Code: 60 48 01
MASTER THESIS OF INFORMATION TECHNOLOGY
SUPERVISOR: PhD. Nguyen Phuong Thai
Hanoi - 2013
2
Table of Contents
Acknowledgements ......................................................................................................................... 4
Abstract ........................................................................................................................................... 5
Chapter I - Introduction .............................................................................................................. 10
1.1. Word Similarity .............................................................................................................................. 11
1.2. Hierarchical Clustering of Word .................................................................................................. 11
1.3. Function tags ................................................................................................................................... 12
1.4. Objectives of the Thesis ................................................................................................................. 13
1.5. Our Contributions .......................................................................................................................... 13
1.6. Thesis structure .............................................................................................................................. 14
Chapter II - Related Works ........................................................................................................ 15
2.1. Word Clustering ............................................................................................................................. 15
2.1.1. The Brown algorithm................................................................................................................ 15
2.1.2. Sticky Pairs and Sematic Classes ............................................................................................. 17
2.2. Word Similarity .............................................................................................................................. 18
2.2.1. Approach .................................................................................................................................. 18
2.2.2. Grammar Relationships ............................................................................................................ 19
2.2.3. Results ...................................................................................................................................... 20
2.3. Clustering By Committee .............................................................................................................. 20
2.3.1. Motivation ................................................................................................................................ 21
2.3.2. Algorithm.................................................................................................................................. 21
2.3.3. Results ...................................................................................................................................... 23
Chapter III - Our approach ........................................................................................................ 25
3.1. Word clustering in Vietnamese ..................................................................................................... 25
3.1.1. Brown's algorithm .................................................................................................................... 25
3.1.2. Word similarity ......................................................................................................................... 26
3.2. Evaluating Methodology ............................................................................................................. 28
3.3. Antonym classes .............................................................................................................................. 31
3.3.1. Ancillary antonym .................................................................................................................... 31
3.3.2. Coordinated antonym ............................................................................................................... 32
3.3.3. Minor classes ............................................................................................................................ 33
3.4. Vietnamese functional labeling ..................................................................................................... 34
6
Chapter IV - Experiment............................................................................................................. 37
4.1. Results and Comparison ................................................................................................................ 37
4.2. Antonym frames ............................................................................................................................. 40
4.3. Effectiveness of Word Cluster feature in Vietnamese Functional labeling .............................. 42
4.4. Error analyses ................................................................................................................................. 43
4.5. Summarization ............................................................................................................................... 44
Chapter V - Conclusion and Future works................................................................................ 45
5.1. Conclusion ....................................................................................................................................... 45
5.2. Future works ................................................................................................................................... 45
Bibliography .......................................................................................................................................... 46
7
List of Figures
Figure 1. An example of Brown's cluster algorithm .......................................................... 16
Figure 2. An example of Vietnamese word cluster ............................................................ 26
Figure 3. The syntax tree of a sentence .............................................................................. 26
Figure 4. An example about Vietnamese word similarity ................................................. 28
Figure 5. Select word clusters by dictionary ...................................................................... 30
Figure 6. An example about sentences parses.................................................................... 35
Figure 7. The true of k-clusters .......................................................................................... 38
8
List of Tables
Table 1. Results of CBC with discovering word senses .................................................... 24
Table 2. Results of CBC with document clustering ........................................................... 24
Table 3. Ancillary antonym frames.................................................................................... 32
Table 4. Coordinated antonym frames ............................................................................... 33
Table 5. Transitional antonym frames ............................................................................... 34
Table 6. An unlabeled corpus in Vietnamese .................................................................... 37
Table 7. The result of five initial clusters .......................................................................... 39
Table 8. The comparison between Word clustering and Word similarity ......................... 40
Table 9. The result of antonym frames .............................................................................. 41
Table 10. The relation of w1 and w2 pairs .......................................................................... 42
Table 11. The effectiveness of word cluster feature .......................................................... 43
9
Chapter I
Introduction
In recent years, statistical learning methods have been vastly successful for using
natural language processing tasks. Most of machine learning algorithms which are used in
natural language processing tasks are supervised and they require labeled data. These
labeled data are often made by hand or in some other ways, which could be time
consuming and expensive. However, while the labeled data is difficult to be created by
hand, the unlabeled data is basically free on the Internet which is considered as raw text.
This raw text can be easily preprocessed to be made suitable for using in an unsupervised
or semi-supervised learning algorithm. Previous works have shown that using the
unlabeled data to replace a traditional labeled data can improve performance (Miller et al.,
2004; Abney, 2004; Collins and Singer, 1999) [19][2][7].
In this thesis, I focus on some word clustering algorithms with the unlabeled data, in
which I mainly apply two methods: word clustering by Brown’s algorithm [22] and word
similarity by Dekang Lin [10]. Those two methods are used in clustering words in the
corpus. While Brown’s method cluster words basing on the relationships between words
standing before and after the clustered word, Dekang Lin’s method uses the relationships
among those three words. To compare the advantages and disadvantages of these two
methods, I experimented them on the same corpus, using the same evaluation method and
the same main word in clusters. The result of word clustering contained different clusters
and each cluster included words in the same contexts. This result was used as features for
the application: Vietnamese functional labeling. I also evaluated influence of word
clustering when using them as features in this application. For example, word clusters
were used to solve the data sparseness problem of the head word feature. Besides, I use
the statistics method to extract 20 frames of antonym which can use to identify antonym
classes in clusters.
In this chapter, I describe word similarity, hierarchical clustering of word and their
applications which are used in natural language processing. Besides, I would like to
10
introduce function tags, word segmentation tasks, objective of thesis and our contribution.
Finally, I will describe the structure of the thesis.
1.1. Word Similarity
The semantic of an unknown word can be inferred from its context. Consider the
following examples:
A bottle of Beer is on the table.
Everyone likes Beer.
Beer makes you drunk.
Contexts of the word Beer in which it is used suggest that Beer could be a kind of
alcoholic beverage. This means that other alcoholic beverage may occur in the same
contexts as contexts of Beer and they may be related. Consequently, two words are
similar if they appear in similar contexts or they can be exchangeable to some extent. For
example, “Tổng_thống” (President) and “Chủ_tịch” (Chairman) are similar according to
this definition. In contrast, two words “Kéo” (Scissors) and “Cắt” (Knife) are not similar
under this definition, while semantically related. Intuitively, if I can generate a good
clustering, the words in this cluster should be similar.
1.2. Hierarchical Clustering of Word
In recent years, some algorithms have been proposed to automatically clustering
words based on a large unlabeled corpus, such as (Brown et al. 1992, Lin et al, 1998)
[22][10]. I consider a corpus of T words, a vocabulary of V words, and a partition π of the
vocabulary. The likelihood L(π) of a bigram class model generating the corpus is given
by:
L(π) = I – H
Here, H is the entropy of the 1-gram word distribution, and I is the average mutual
information of adjacent classes in the corpus:
I = ∑ Pr(c1c2 ) log
c1 ,c2
11
Pr(c1c2 )
Pr(c1 ) Pr(c2 )
Where, Pr(c1c2 ) is the probability of a word in c1 is followed by a word in c2. So H and
π are independent, the partition also maximizes the likelihood L(π) of the corpus, because
it maximizes the average mutual information. Thus, I can use the average mutual
information to construct the clusters of word by repeating the merging step until the
number of clusters is reduced to the predefined number C.
1.3. Function tags
Functional tags labeling is an important processing step for many natural language
processing applications such as question answering, information extraction, and
summarization. Thus, there were some researches focusing in function tagging problem to
cover additional semantic information which are more useful than syntactic labels.
There are two kinds of tags in linguistics: syntactic tags and functional tags. For
syntactic tags there are many theories and projects in English, Spanish, Chinese and etc.
Main tasks of these researches are common finding the part-of-speech and tag them for
their constituents. Functional tags are understood as abstract labels because they are not
similar syntactic labels. If a syntactic label has one notation for batch of words for all
paragraphs, functional tags are representative of the relationship between a phrase and its
utterance in each difference context. So for each phrase, functional tags might be
transforming. It depends on the context of its neighbors. For example, when I consider a
phrase: “Baseball bat” the syntax of this phrase is “noun phrase” (in almost researches
they are annotated as NP). But its functional tag might be a subject in this sentence:
This baseball bat is very expensive
In another case, its functional tag might be a direct object:
I bought this baseball bat last month
Or instrument agent in a passive voice:
That men was attacked by this baseball bat
Functional tags are directly mentioned by Blaheta (2003) [13]. Since there are a lot of
researches focusing on how to tag functional tags for a sentence. This kind of research
problem is called functional tags labeling problem, a class of problems aiming at finding
semantic information of phrase. To sum up, functional tag labeling is defined as a
problem of how to find the semantic information of bag of words, and then tag them with
a given annotation in its context.
12
1.4. Objectives of the Thesis
Most of these successful machine learning algorithms are supervised algorithm, and
they usually use labeled data. These labeled data are often created by hand, which is time
consuming and expensive. While unlabeled data is free, they can be obtained from the
newspapers, website… and they exit as raw texts on the Internet.
In this thesis, I expected to investigate some methods of clustering words for
unlabeled data, which was easily extracted from online sources. Among automatic
clustering methods, I focused on two: hierarchical word clustering by Brown’s algorithm
and word similarity by Dekang Lin. Besides, I also suggested a common evaluating tool
for both methods when they are applied in the same Vietnamese corpus.
The output of the word clustering was used as features in natural language processing
tasks such as Vietnamese functional labeling. I also evaluated the influences of word
clusters when they were used as features in this task.
1.5. Our Contributions
As I discussed above, the main aim of this thesis is to cluster unlabeled Vietnamese
words. Thus, the contribution of this paper is as follows
• Firstly, I managed to do automatic word clustering for unlabeled Vietnamese data
with the corpus of about 700.000 sentences.
• Secondly, I suggested a qualified evaluating method for clusters after I clustered
words using thesaurus dictionary with 5 criteria.
• Thirdly, I compared two clustering methods for Vietnamese, which are word
clustering by Brown and word similarity by Dekang Lin. I used the results of clusters as
features in Vietnamese functional labeling task to increase the task’s efficiency. Besides, I
use the statistics method to extract 20 frames of antonym which can use to identify
antonym classes in the clusters.
In conclusion, our contribution is that I have implemented word clustering about
700,000 sentences in Vietnamese by hierarchical word clustering algorithm, using
Vietnamese thesaurus dictionary and five criteria to evaluate the true of clusters and using
13
clusters as features of NLP tasks such as Vietnamese functional labeling. Finally, extract
20 frames of antonym to identify antonym pairs in antonym classes.
1.6. Thesis structure
In this section, I would like to introduce brief outline of thesis. Thus, you can have
overviews on next sections where I want to discuss.
Chapter 2 – Related works
In this chapter I would like to introduce some recent researches about word clustering,
functional labeling and word segmentation.
Chapter 3 – Our approach
This chapter suggests the method I applied to cluster Vietnamese, how I evaluate the
qualities of the clusters after the word clustering process, how to use those clusters as
features in Vietnamese functional labeling task. And how to extract frames of antonym
from the corpus.
Chapter 4 – Experiment
In this chapter, I would like to discuss the corpus I used to cluster and some tools
applied in this thesis. Besides, I pointed out and analyzed some errors for erroneous
clusters in the word clustering process. Finally, I expected to evaluate the influence of
clusters when I applied them in task of Vietnamese functional labeling.
Chapter 5 –Conclusions and Future works
In the last chapter, I want to have a general conclusion about the advantages and
restrictions of our works. Besides, I propose some works which I will do in future to
improve our model.
Finally, references will show close researches which our system referred to.
14
Chapter II
Related Works
In this chapter, I would like to introduce some researches in recent years in word
clustering and word similarity tasks such as: class-based n-grams models by Brown’s
algorithm [22], word similarity [10] and clustering by committee [23].
2.1. Word Clustering
2.1.1. The Brown algorithm
Word clustering is considered here as a method for estimating the probabilities of low
frequency events. One of the aims of word clustering is the problem of predicting a word
from previous words in a sample of text. In this task, authors used the bottom-up
agglomerative word clustering algorithm (Brown et al,. 1992) [20] to derive a hierarchical
clustering of words. The input to the algorithm is a corpus of unlabeled data which consist
of a vocabulary of words to be clustered. The output of the word cluster algorithm is a
binary tree in Figure 1, in which the leaves of the tree are the words in the vocabulary and
each internal node is interpreted as a cluster containing words in that sub-tree. Initially,
each word in the corpus is considered to be in its own distinct cluster. The algorithm then
repeatedly merges the pair of clusters that maximizes the quality of the clustering result
and each word belongs to exactly one cluster until the number of clusters is reduced to the
predefined number of clusters as follows:
Initial Mapping: Put a single word in each cluster
Compute the initial AMI of the collection
repeat
for each pairs of clusters do
Merge the pair of clusters temporarily
Compute the AMI of the collection
end for
Select a pair of clusters with min. decrement of AMI
15
Compute AMI of the new collection
until reach the predefined number of clusters
repeat
Move each term to the cluster for which the resulting
partition has the greatest AMI
until no more increment in AMI
Figure 1. An example of Brown's cluster algorithm
To identify the quality of a clustering, this algorithm considers a training text of Twords t1T , a vocabulary of V-words, and a partition π of the vocabulary. The maximum
likelihood estimates of the parameters of a 1-gram class model generating the corpus
given by:
Pr(w | c) =
C(w)
C(c)
and
Pr(c) =
C(c)
,
T
where, C(c) is the number of words in T-words for the class c. Since, c = π (w) and:
Pr(w) = Pr(w | c)* Pr(c) =
C(w)
T
For a 1-gram class model, the choice of a partition π of the vocabulary has no effect. For a
16
2-gram class model, the sequential maximum likelihood estimates of the order 2parameters maximize Pr(t 2T | t1 ) are given by:
Pr(c2 | c1 ) =
C(c1c2 )
;Pr(c c ) = Pr(c1 )Pr(c2 | c1 )
∑ C(c1c) 1 2
c
⇒ Pr(c1c2 ) =
C(c1c2 )
C(c1 )
×
T
∑ C(c1c)
c
where, C(c1 ) and
∑ C(c c)
1
are the number of words in the t1T and t1T −1 which the class is
c
the c1 . Let, L(π ) = (T − 1)−1 log Pr(t 2T | t1 ) then:
L(π ) =
C(w1w2 )
log Pr(c2 | c1 )Pr(w2 | c2 )
T −1
w1w2
∑
= ∑ Pr(w)log Pr(w) + ∑ Pr(c1c2 )log
w
c1c2
Pr(c2 | c1 )
Pr(c2 )
= I(c1 ,c2 ) − H (w)
in which, H represents the entropy of the 1-gram word distribution, and I represent the
average mutual information of adjacent classes c1 and c2
2.1.2. Sticky Pairs and Sematic Classes
One of the aims of word clustering is group words together base on the statistical
similarity of their surroundings. In addition, the information context of words is also
viewed as features to group words together. For example, consider two words wa and wb
in the same contexts, the mutual information of the two words as adjacent is:
MI (wa wb ) = log
Pr(wa wb )
Pr(wa )Pr(wb )
If wb follows wa and the mutual information of the pair ( wa wb ) is greater than a
threshold, then the pair ( wa wb ) is called sticky pairs. Furthermore, let Prnear (wa wb ) be the
probability when a word chosen at random from the corpus is wa and a second word
17
chosen from a window of 100,1 words centered on wa but excluding the words in a
window of 5 centered on wa is wb . If Prnear (wa wb ) is much larger then Pr(wa )Pr(wb ) then
wa and wb are semantically sticky. Using Prnear (wa wb ) this algorithm shows some
interesting classes such as:
we our us ourselves ours
question questions asking answer answers answering
performance performed perform performs performing
tie jacket suit
write writes writing written wrote pen
moring noon evening night nights midnight bed
attorney counsel trial court judge
problems problem solution solve analyzed sloved solving
2.2. Word Similarity
2.2.1. Approach
The meaning of a word in the corpus can often be inferred from other words in its
context. Two words are called similarity if they appear in similar contexts or that can be
exchangeable to some extent. The contexts in which the word “vodka” can be inferred
that “vodka” is similar to “beer”, “wine”, “wisky”,... when considering the example as
follows:
A bottle of vodka is on the table
Everyone likes vodka
Vodka makes you drunk
We make vodka out of corn
The similarity between two objects is identified to be the amount of information
contained in the commonality between the objects divided by the amount information in
the descriptions of the objects (Lin, 1997) [11]. Considering two words wa and wb , the
similarity of two words in the same context is:
sim(wa ,wb ) =
log P(common(wa ,wb ))
log P(describe(wa ,wb ))
18
To compute the similarity of two words in its context, Dekang Lin used a dependency
triple (w, r, w’) which contains two words and the grammatical relationship between them
in the input sentence. In which, w is considered; r is the grammatical relationship between
w and w’; w’ is the word context of w and the notation w,r,w' is the frequency counts of
the dependency triple (w, r, w’) in the parsed corpus. When w, r, or w’ is the wild card
(*) then the frequency counts of all the dependency triples which matches the rest of the
sample are summed up. For example, ||uống, obj, *|| (||drink, obj, *||) is the frequency
counts of “uống-obj” (drink-obj) relationships in the parsed corpus.
The description of a word w contains the frequency counts of all the dependency
triples that matches the pattern (w, *, *). Let I(w, r, w’) denote the amount information
contained in w,r,w' , this value is computed as follows:
I(w,r,w') = log
w,r,w' × ∗,r,∗
w,r,∗ × ∗,r,w'
Let T (w) is the set of pairs (r, w’), the similarity of two words wa and wb is
sim(wa ,wb ) that computed as follows:
Sim(wa ,wb ) =
∑
(I(wa ,r,w) + I(wb ,r,w))
(r,w)∈T (wa )∩T (wb )
∑
I(wa ,r,w) +
(r,w)∈T (wa )
∑
I(wb ,r,w)
(r,w)∈T (wb )
2.2.2. Grammar Relationships
Dekang Lin used 8-grammar relationships to construct the dependency triples as
follows:
a) Subject and Subject-of (subj and subj-of): In this relationship, the central word is
verb, context word is noun or pronoun (as subject). The signs on the syntax trees
are: parent node as S, central node as VP, context node as NP.
b) Object and Object-of (Obj and Obj-of): In this relationship, central word is verb,
context word is noun (as object). The signs are: parent node as VP, central node as
V, context node as NP.
19
c) Complement and Complement-of (Mod and Mod-of): In this relationship, central
word is noun, context word is modifiers for nouns (modifiers maybe N, A or V).
The signs on the tree are: parent node as NP, central node as N, context node as N,
A or V.
d) Prepositional object (Proj and Proj-of): In this relationship, central word is
preposition (E), context word is noun. The signs are: parent node as PP, central
node as E, context node as NP.
2.2.3. Results
Dekang Lin extracted about 56.5 million dependency triples from a 64-million-word
corpus parsed that contains the Wall Street Journal (24 million words), San Joe Mercury
(21 million words) and AP Newswire (19 million words). Their experiment computed the
pairwise similarity between all the nouns, all the verbs and all the adjectives/adverbs. For
each word, Dekang Lin creates a thesaurus entry which contains the top-N words that are
most similar to it as follows:
w( pos) : w1 , s1 ,w2 , s2 ,...,wN , sN .
In which, pos is a part of speech; wi is a word and si = sim(w,wi ) . The top-10 words in
the noun, verb and adjective entries with the word “brief” as follows:
brief (noun): affidavit 0.13, petition 0.05, memorandum 0.05, motion 0.05, lawsuit
0.05, deposition 0.05, slight 0.05, prospectus 0.04, document 0.04 paper 0.04, ...
brief (verb): tell 0.09, urge 0.07, ask 0.07, meet 0.06, appoint 0.06, elect 0.05, name
0.05, empower 0.05, summon 0.05, overrule 0.04, ...
brief (adjective): lengthy 0.13, short 0.12, recent 0.09, prolonged 0.09, long 0.09,
extended 0.09, daylong 0.08, scheduled 0.08, stormy 0.07, planned 0.06, ...
2.3. Clustering By Committee
Clustering by Committee (CBC) focuses on addressing the general goal of clustering,
that is to group data elements when the intra-group similarities are high and the intergroup
similarities are low. CBC is presented by two versions of the algorithm: a hard clustering
20
version, in which each element is assigned to only one cluster and a soft clustering
version where elements can be assigned to multiple clusters.
2.3.1. Motivation
Clustering By Committee is constructed by a desire to automatically extract concepts
and word senses from large unlabeled collections of text. In previous researches, word
senses are usually defined by using a manually constructed lexicon such as WordNet. By
this way, lexicons also have several problems such as manually created lexicon often
contain rare senses and they miss many domain specific senses. One way to solve these
problems is to use a clustering algorithm to automatically induce semantic classes (Lin
and Pantel 2001) [9].
Many clustering algorithms represent a cluster by the centroid of all its members such
as K-means or by representative elements. For example, when clustering words, this task
can use contexts of the words as features and group together the words. In CBC, the
centroid of a cluster is constructed by averaging the feature vectors of a subset of the
cluster members and the subset is viewed as a committee that determines other elements
that belong to the cluster.
2.3.2. Algorithm
CBC algorithm consists of three phases: in phase I, the algorithm computes each
element’s top-k similar elements, with some small value of k; In phase II: the algorithm
constructs a collection of tight clusters using the top-k similar elements from phase I, in
which elements of each cluster form a committee; In the final phase of the algorithm, each
element is assigned to its most similar clusters.
2.3.2.1. Phase I: Find top-similar elements
Computing the top-similar elements of an element e, the algorithm first sort features
according to their point-wise mutual information values and then only consider a subset
of the features with highest mutual information. Finally, the phase I compute the pairwise
similarity between e and elements that share a feature from this subset.
2.3.2.2. Phase II: Find committee
21
In the second phase of the clustering algorithm recursively finds tight clusters
scattered in the similarity space. In each recursive step, the algorithm finds a set of tight
clusters which is called committees. A committee covers an element if element’s
similarity to the centroid of the committee exceeds some high similarity threshold. The
phase II is presented as follows:
• Input: A list of elements E to be clustered, a similarity database S from phase I,
thresholds 𝜃! and 𝜃!
• Step 1: For each element 𝑒 𝜖 𝐸
Cluster the top-similar elements of e from S using average-link clustering.
For each cluster discovered c compute the following score: 𝑐 ×𝑎𝑣𝑔𝑠𝑖𝑚(𝑐),
where 𝑐 is the number of elements in c and avgsim(c) is the average pairwise
similarity between elements in c. Store the highest-scoring cluster in a list L.
• Step 2: Sort clusters in L in descending order of their scores.
• Step 3: Let C be a list of committees, initially empty. For each cluster 𝑐 𝜖 𝐿 in
sorted order, compute the centroid of c by averaging the frequency vectors of
its elements and computing the mutual information vector of the centroid in the
same way. If c’s similarity to the centroid of each committee previously added
to C is below a threshold 𝜃! , add c to C.
• Step 4: If C is empty, the algorithm is done and return C.
• Step 5: For each element 𝑒 𝜖 𝐸. If e’s similarity to every committee in C is
below threshold 𝜃! , add e to a list of residues R.
• Step 6: If R is empty, the algorithm is done and return C. Otherwise, return the
union of C and the output of a recursive call to phrase II using the same input
except replacing E with R.
• Output: a list of committees.
2.3.2.3. Phase III: Assign element to clusters
In the phase III, it has two versions: the hard clustering version for the document
clustering and the soft clustering for discovering word senses. In the first version, every
element is assigned to the cluster containing the committee to which it is most similar.
This version resembles K-means in that every element is assigned to its closest centroid.
22
Unlike K-means, the number of clusters is not fixed and the centroids do not change.
The second version, each element e is assigned to its most similar clusters in the
following way:
let C be a list of clusters initially empty
let S be the top-200 similar clusters to e
while S is not empty {
let c ∈ S be the most similar cluster to e
if similarity(e, c) < σ
exit the loop
if c is not similar to any cluster in C {
assign e to c
remove from e its features that overlap with the
features of c;
add c to C
}
remove c from S
}
When computing the similarity between a cluster and an element (or another cluster)
CBC use the centroid of the committee members as the representation for the cluster. The
key to the soft-clustering version of the algorithm for discovering word senses is that once
an element e is assigned to a cluster c, the intersecting features between e and c are
removed from e. This allows CBC to discover the less frequent senses of a word and to
avoid discovering duplicate senses.
2.3.3. Results
• CBC discovers word senses by assigning words to more than one cluster. Each
cluster to which a word is assigned represents a sense of that word. Using the Softclustering version of Phase III of CBC allows CBC to assign words to multiple
clusters, to discover the less frequent senses of a word and to avoid discovering
duplicate senses. CBC applied the precision/recall evaluation methodology and
used the test set consisting of 13,403 words, S13403.
23
- Xem thêm -