Đăng ký Đăng nhập
Trang chủ Vietnamese word clustering and antonym identification = phân tích cụm từ tiếng v...

Tài liệu Vietnamese word clustering and antonym identification = phân tích cụm từ tiếng việt và nhận diện từ trái nghĩa.

.PDF
44
13
108

Mô tả:

VIETNAM NATIONAL UNIVERSITY, HANOI UNIVERSITY OF ENGINEERING AND TECHNOLOGY NGUYEN KIM ANH VIETNAMESE WORD CLUSTERING AND ANTONYM IDENTIFICATION MASTER THESIS OF INFORMATION TECHNOLOGY Hanoi - 2013 VIETNAM NATIONAL UNIVERSITY, HANOI UNIVERSITY OF ENGINEERING AND TECHNOLOGY NGUYEN KIM ANH VIETNAMESE WORD CLUSTERING AND ANTONYM IDENTIFICATION Major: Computer science Code: 60 48 01 MASTER THESIS OF INFORMATION TECHNOLOGY SUPERVISOR: PhD. Nguyen Phuong Thai Hanoi - 2013 2 Table of Contents Acknowledgements ......................................................................................................................... 4 Abstract ........................................................................................................................................... 5 Chapter I - Introduction .............................................................................................................. 10 1.1. Word Similarity .............................................................................................................................. 11 1.2. Hierarchical Clustering of Word .................................................................................................. 11 1.3. Function tags ................................................................................................................................... 12 1.4. Objectives of the Thesis ................................................................................................................. 13 1.5. Our Contributions .......................................................................................................................... 13 1.6. Thesis structure .............................................................................................................................. 14 Chapter II - Related Works ........................................................................................................ 15 2.1. Word Clustering ............................................................................................................................. 15 2.1.1. The Brown algorithm................................................................................................................ 15 2.1.2. Sticky Pairs and Sematic Classes ............................................................................................. 17 2.2. Word Similarity .............................................................................................................................. 18 2.2.1. Approach .................................................................................................................................. 18 2.2.2. Grammar Relationships ............................................................................................................ 19 2.2.3. Results ...................................................................................................................................... 20 2.3. Clustering By Committee .............................................................................................................. 20 2.3.1. Motivation ................................................................................................................................ 21 2.3.2. Algorithm.................................................................................................................................. 21 2.3.3. Results ...................................................................................................................................... 23 Chapter III - Our approach ........................................................................................................ 25 3.1. Word clustering in Vietnamese ..................................................................................................... 25 3.1.1. Brown's algorithm .................................................................................................................... 25 3.1.2. Word similarity ......................................................................................................................... 26 3.2. Evaluating Methodology ............................................................................................................. 28 3.3. Antonym classes .............................................................................................................................. 31 3.3.1. Ancillary antonym .................................................................................................................... 31 3.3.2. Coordinated antonym ............................................................................................................... 32 3.3.3. Minor classes ............................................................................................................................ 33 3.4. Vietnamese functional labeling ..................................................................................................... 34 6 Chapter IV - Experiment............................................................................................................. 37 4.1. Results and Comparison ................................................................................................................ 37 4.2. Antonym frames ............................................................................................................................. 40 4.3. Effectiveness of Word Cluster feature in Vietnamese Functional labeling .............................. 42 4.4. Error analyses ................................................................................................................................. 43 4.5. Summarization ............................................................................................................................... 44 Chapter V - Conclusion and Future works................................................................................ 45 5.1. Conclusion ....................................................................................................................................... 45 5.2. Future works ................................................................................................................................... 45 Bibliography .......................................................................................................................................... 46 7 List of Figures Figure 1. An example of Brown's cluster algorithm .......................................................... 16   Figure 2. An example of Vietnamese word cluster ............................................................ 26   Figure 3. The syntax tree of a sentence .............................................................................. 26   Figure 4. An example about Vietnamese word similarity ................................................. 28   Figure 5. Select word clusters by dictionary ...................................................................... 30   Figure 6. An example about sentences parses.................................................................... 35   Figure 7. The true of k-clusters .......................................................................................... 38   8 List of Tables Table 1. Results of CBC with discovering word senses .................................................... 24   Table 2. Results of CBC with document clustering ........................................................... 24   Table 3. Ancillary antonym frames.................................................................................... 32   Table 4. Coordinated antonym frames ............................................................................... 33   Table 5. Transitional antonym frames ............................................................................... 34   Table 6. An unlabeled corpus in Vietnamese .................................................................... 37   Table 7. The result of five initial clusters .......................................................................... 39   Table 8. The comparison between Word clustering and Word similarity ......................... 40   Table 9. The result of antonym frames .............................................................................. 41   Table 10. The relation of w1 and w2 pairs .......................................................................... 42   Table 11. The effectiveness of word cluster feature .......................................................... 43   9 Chapter I Introduction In recent years, statistical learning methods have been vastly successful for using natural language processing tasks. Most of machine learning algorithms which are used in natural language processing tasks are supervised and they require labeled data. These labeled data are often made by hand or in some other ways, which could be time consuming and expensive. However, while the labeled data is difficult to be created by hand, the unlabeled data is basically free on the Internet which is considered as raw text. This raw text can be easily preprocessed to be made suitable for using in an unsupervised or semi-supervised learning algorithm. Previous works have shown that using the unlabeled data to replace a traditional labeled data can improve performance (Miller et al., 2004; Abney, 2004; Collins and Singer, 1999) [19][2][7]. In this thesis, I focus on some word clustering algorithms with the unlabeled data, in which I mainly apply two methods: word clustering by Brown’s algorithm [22] and word similarity by Dekang Lin [10]. Those two methods are used in clustering words in the corpus. While Brown’s method cluster words basing on the relationships between words standing before and after the clustered word, Dekang Lin’s method uses the relationships among those three words. To compare the advantages and disadvantages of these two methods, I experimented them on the same corpus, using the same evaluation method and the same main word in clusters. The result of word clustering contained different clusters and each cluster included words in the same contexts. This result was used as features for the application: Vietnamese functional labeling. I also evaluated influence of word clustering when using them as features in this application. For example, word clusters were used to solve the data sparseness problem of the head word feature. Besides, I use the statistics method to extract 20 frames of antonym which can use to identify antonym classes in clusters. In this chapter, I describe word similarity, hierarchical clustering of word and their applications which are used in natural language processing. Besides, I would like to 10 introduce function tags, word segmentation tasks, objective of thesis and our contribution. Finally, I will describe the structure of the thesis. 1.1. Word Similarity The semantic of an unknown word can be inferred from its context. Consider the following examples: A bottle of Beer is on the table. Everyone likes Beer. Beer makes you drunk. Contexts of the word Beer in which it is used suggest that Beer could be a kind of alcoholic beverage. This means that other alcoholic beverage may occur in the same contexts as contexts of Beer and they may be related. Consequently, two words are similar if they appear in similar contexts or they can be exchangeable to some extent. For example, “Tổng_thống” (President) and “Chủ_tịch” (Chairman) are similar according to this definition. In contrast, two words “Kéo” (Scissors) and “Cắt” (Knife) are not similar under this definition, while semantically related. Intuitively, if I can generate a good clustering, the words in this cluster should be similar. 1.2. Hierarchical Clustering of Word In recent years, some algorithms have been proposed to automatically clustering words based on a large unlabeled corpus, such as (Brown et al. 1992, Lin et al, 1998) [22][10]. I consider a corpus of T words, a vocabulary of V words, and a partition π of the vocabulary. The likelihood L(π) of a bigram class model generating the corpus is given by: L(π) = I – H Here, H is the entropy of the 1-gram word distribution, and I is the average mutual information of adjacent classes in the corpus: I = ∑ Pr(c1c2 ) log c1 ,c2 11 Pr(c1c2 ) Pr(c1 ) Pr(c2 ) Where, Pr(c1c2 ) is the probability of a word in c1 is followed by a word in c2. So H and π are independent, the partition also maximizes the likelihood L(π) of the corpus, because it maximizes the average mutual information. Thus, I can use the average mutual information to construct the clusters of word by repeating the merging step until the number of clusters is reduced to the predefined number C. 1.3. Function tags Functional tags labeling is an important processing step for many natural language processing applications such as question answering, information extraction, and summarization. Thus, there were some researches focusing in function tagging problem to cover additional semantic information which are more useful than syntactic labels. There are two kinds of tags in linguistics: syntactic tags and functional tags. For syntactic tags there are many theories and projects in English, Spanish, Chinese and etc. Main tasks of these researches are common finding the part-of-speech and tag them for their constituents. Functional tags are understood as abstract labels because they are not similar syntactic labels. If a syntactic label has one notation for batch of words for all paragraphs, functional tags are representative of the relationship between a phrase and its utterance in each difference context. So for each phrase, functional tags might be transforming. It depends on the context of its neighbors. For example, when I consider a phrase: “Baseball bat” the syntax of this phrase is “noun phrase” (in almost researches they are annotated as NP). But its functional tag might be a subject in this sentence: This baseball bat is very expensive In another case, its functional tag might be a direct object: I bought this baseball bat last month Or instrument agent in a passive voice: That men was attacked by this baseball bat Functional tags are directly mentioned by Blaheta (2003) [13]. Since there are a lot of researches focusing on how to tag functional tags for a sentence. This kind of research problem is called functional tags labeling problem, a class of problems aiming at finding semantic information of phrase. To sum up, functional tag labeling is defined as a problem of how to find the semantic information of bag of words, and then tag them with a given annotation in its context. 12 1.4. Objectives of the Thesis Most of these successful machine learning algorithms are supervised algorithm, and they usually use labeled data. These labeled data are often created by hand, which is time consuming and expensive. While unlabeled data is free, they can be obtained from the newspapers, website… and they exit as raw texts on the Internet. In this thesis, I expected to investigate some methods of clustering words for unlabeled data, which was easily extracted from online sources. Among automatic clustering methods, I focused on two: hierarchical word clustering by Brown’s algorithm and word similarity by Dekang Lin. Besides, I also suggested a common evaluating tool for both methods when they are applied in the same Vietnamese corpus. The output of the word clustering was used as features in natural language processing tasks such as Vietnamese functional labeling. I also evaluated the influences of word clusters when they were used as features in this task. 1.5. Our Contributions As I discussed above, the main aim of this thesis is to cluster unlabeled Vietnamese words. Thus, the contribution of this paper is as follows • Firstly, I managed to do automatic word clustering for unlabeled Vietnamese data with the corpus of about 700.000 sentences. • Secondly, I suggested a qualified evaluating method for clusters after I clustered words using thesaurus dictionary with 5 criteria. • Thirdly, I compared two clustering methods for Vietnamese, which are word clustering by Brown and word similarity by Dekang Lin. I used the results of clusters as features in Vietnamese functional labeling task to increase the task’s efficiency. Besides, I use the statistics method to extract 20 frames of antonym which can use to identify antonym classes in the clusters. In conclusion, our contribution is that I have implemented word clustering about 700,000 sentences in Vietnamese by hierarchical word clustering algorithm, using Vietnamese thesaurus dictionary and five criteria to evaluate the true of clusters and using 13 clusters as features of NLP tasks such as Vietnamese functional labeling. Finally, extract 20 frames of antonym to identify antonym pairs in antonym classes. 1.6. Thesis structure In this section, I would like to introduce brief outline of thesis. Thus, you can have overviews on next sections where I want to discuss. Chapter 2 – Related works In this chapter I would like to introduce some recent researches about word clustering, functional labeling and word segmentation. Chapter 3 – Our approach This chapter suggests the method I applied to cluster Vietnamese, how I evaluate the qualities of the clusters after the word clustering process, how to use those clusters as features in Vietnamese functional labeling task. And how to extract frames of antonym from the corpus. Chapter 4 – Experiment In this chapter, I would like to discuss the corpus I used to cluster and some tools applied in this thesis. Besides, I pointed out and analyzed some errors for erroneous clusters in the word clustering process. Finally, I expected to evaluate the influence of clusters when I applied them in task of Vietnamese functional labeling. Chapter 5 –Conclusions and Future works In the last chapter, I want to have a general conclusion about the advantages and restrictions of our works. Besides, I propose some works which I will do in future to improve our model. Finally, references will show close researches which our system referred to. 14 Chapter II Related Works In this chapter, I would like to introduce some researches in recent years in word clustering and word similarity tasks such as: class-based n-grams models by Brown’s algorithm [22], word similarity [10] and clustering by committee [23]. 2.1. Word Clustering 2.1.1. The Brown algorithm Word clustering is considered here as a method for estimating the probabilities of low frequency events. One of the aims of word clustering is the problem of predicting a word from previous words in a sample of text. In this task, authors used the bottom-up agglomerative word clustering algorithm (Brown et al,. 1992) [20] to derive a hierarchical clustering of words. The input to the algorithm is a corpus of unlabeled data which consist of a vocabulary of words to be clustered. The output of the word cluster algorithm is a binary tree in Figure 1, in which the leaves of the tree are the words in the vocabulary and each internal node is interpreted as a cluster containing words in that sub-tree. Initially, each word in the corpus is considered to be in its own distinct cluster. The algorithm then repeatedly merges the pair of clusters that maximizes the quality of the clustering result and each word belongs to exactly one cluster until the number of clusters is reduced to the predefined number of clusters as follows: Initial Mapping: Put a single word in each cluster Compute the initial AMI of the collection repeat for each pairs of clusters do Merge the pair of clusters temporarily Compute the AMI of the collection end for Select a pair of clusters with min. decrement of AMI 15 Compute AMI of the new collection until reach the predefined number of clusters repeat Move each term to the cluster for which the resulting partition has the greatest AMI until no more increment in AMI Figure 1. An example of Brown's cluster algorithm To identify the quality of a clustering, this algorithm considers a training text of Twords t1T , a vocabulary of V-words, and a partition π of the vocabulary. The maximum likelihood estimates of the parameters of a 1-gram class model generating the corpus given by: Pr(w | c) = C(w) C(c) and Pr(c) = C(c) , T where, C(c) is the number of words in T-words for the class c. Since, c = π (w) and: Pr(w) = Pr(w | c)* Pr(c) = C(w) T For a 1-gram class model, the choice of a partition π of the vocabulary has no effect. For a 16 2-gram class model, the sequential maximum likelihood estimates of the order 2parameters maximize Pr(t 2T | t1 ) are given by: Pr(c2 | c1 ) = C(c1c2 ) ;Pr(c c ) = Pr(c1 )Pr(c2 | c1 ) ∑ C(c1c) 1 2 c ⇒ Pr(c1c2 ) = C(c1c2 ) C(c1 ) × T ∑ C(c1c) c where, C(c1 ) and ∑ C(c c) 1 are the number of words in the t1T and t1T −1 which the class is c the c1 . Let, L(π ) = (T − 1)−1 log Pr(t 2T | t1 ) then: L(π ) = C(w1w2 ) log Pr(c2 | c1 )Pr(w2 | c2 ) T −1 w1w2 ∑ = ∑ Pr(w)log Pr(w) + ∑ Pr(c1c2 )log w c1c2 Pr(c2 | c1 ) Pr(c2 ) = I(c1 ,c2 ) − H (w) in which, H represents the entropy of the 1-gram word distribution, and I represent the average mutual information of adjacent classes c1 and c2 2.1.2. Sticky Pairs and Sematic Classes One of the aims of word clustering is group words together base on the statistical similarity of their surroundings. In addition, the information context of words is also viewed as features to group words together. For example, consider two words wa and wb in the same contexts, the mutual information of the two words as adjacent is: MI (wa wb ) = log Pr(wa wb ) Pr(wa )Pr(wb ) If wb follows wa and the mutual information of the pair ( wa wb ) is greater than a threshold, then the pair ( wa wb ) is called sticky pairs. Furthermore, let Prnear (wa wb ) be the probability when a word chosen at random from the corpus is wa and a second word 17 chosen from a window of 100,1 words centered on wa but excluding the words in a window of 5 centered on wa is wb . If Prnear (wa wb ) is much larger then Pr(wa )Pr(wb ) then wa and wb are semantically sticky. Using Prnear (wa wb ) this algorithm shows some interesting classes such as: we our us ourselves ours question questions asking answer answers answering performance performed perform performs performing tie jacket suit write writes writing written wrote pen moring noon evening night nights midnight bed attorney counsel trial court judge problems problem solution solve analyzed sloved solving 2.2. Word Similarity 2.2.1. Approach The meaning of a word in the corpus can often be inferred from other words in its context. Two words are called similarity if they appear in similar contexts or that can be exchangeable to some extent. The contexts in which the word “vodka” can be inferred that “vodka” is similar to “beer”, “wine”, “wisky”,... when considering the example as follows: A bottle of vodka is on the table Everyone likes vodka Vodka makes you drunk We make vodka out of corn The similarity between two objects is identified to be the amount of information contained in the commonality between the objects divided by the amount information in the descriptions of the objects (Lin, 1997) [11]. Considering two words wa and wb , the similarity of two words in the same context is: sim(wa ,wb ) = log P(common(wa ,wb )) log P(describe(wa ,wb )) 18 To compute the similarity of two words in its context, Dekang Lin used a dependency triple (w, r, w’) which contains two words and the grammatical relationship between them in the input sentence. In which, w is considered; r is the grammatical relationship between w and w’; w’ is the word context of w and the notation w,r,w' is the frequency counts of the dependency triple (w, r, w’) in the parsed corpus. When w, r, or w’ is the wild card (*) then the frequency counts of all the dependency triples which matches the rest of the sample are summed up. For example, ||uống, obj, *|| (||drink, obj, *||) is the frequency counts of “uống-obj” (drink-obj) relationships in the parsed corpus. The description of a word w contains the frequency counts of all the dependency triples that matches the pattern (w, *, *). Let I(w, r, w’) denote the amount information contained in w,r,w' , this value is computed as follows: I(w,r,w') = log w,r,w' × ∗,r,∗ w,r,∗ × ∗,r,w' Let T (w) is the set of pairs (r, w’), the similarity of two words wa and wb is sim(wa ,wb ) that computed as follows: Sim(wa ,wb ) = ∑ (I(wa ,r,w) + I(wb ,r,w)) (r,w)∈T (wa )∩T (wb ) ∑ I(wa ,r,w) + (r,w)∈T (wa ) ∑ I(wb ,r,w) (r,w)∈T (wb ) 2.2.2. Grammar Relationships Dekang Lin used 8-grammar relationships to construct the dependency triples as follows: a) Subject and Subject-of (subj and subj-of): In this relationship, the central word is verb, context word is noun or pronoun (as subject). The signs on the syntax trees are: parent node as S, central node as VP, context node as NP. b) Object and Object-of (Obj and Obj-of): In this relationship, central word is verb, context word is noun (as object). The signs are: parent node as VP, central node as V, context node as NP. 19 c) Complement and Complement-of (Mod and Mod-of): In this relationship, central word is noun, context word is modifiers for nouns (modifiers maybe N, A or V). The signs on the tree are: parent node as NP, central node as N, context node as N, A or V. d) Prepositional object (Proj and Proj-of): In this relationship, central word is preposition (E), context word is noun. The signs are: parent node as PP, central node as E, context node as NP. 2.2.3. Results Dekang Lin extracted about 56.5 million dependency triples from a 64-million-word corpus parsed that contains the Wall Street Journal (24 million words), San Joe Mercury (21 million words) and AP Newswire (19 million words). Their experiment computed the pairwise similarity between all the nouns, all the verbs and all the adjectives/adverbs. For each word, Dekang Lin creates a thesaurus entry which contains the top-N words that are most similar to it as follows: w( pos) : w1 , s1 ,w2 , s2 ,...,wN , sN . In which, pos is a part of speech; wi is a word and si = sim(w,wi ) . The top-10 words in the noun, verb and adjective entries with the word “brief” as follows: brief (noun): affidavit 0.13, petition 0.05, memorandum 0.05, motion 0.05, lawsuit 0.05, deposition 0.05, slight 0.05, prospectus 0.04, document 0.04 paper 0.04, ... brief (verb): tell 0.09, urge 0.07, ask 0.07, meet 0.06, appoint 0.06, elect 0.05, name 0.05, empower 0.05, summon 0.05, overrule 0.04, ... brief (adjective): lengthy 0.13, short 0.12, recent 0.09, prolonged 0.09, long 0.09, extended 0.09, daylong 0.08, scheduled 0.08, stormy 0.07, planned 0.06, ... 2.3. Clustering By Committee Clustering by Committee (CBC) focuses on addressing the general goal of clustering, that is to group data elements when the intra-group similarities are high and the intergroup similarities are low. CBC is presented by two versions of the algorithm: a hard clustering 20 version, in which each element is assigned to only one cluster and a soft clustering version where elements can be assigned to multiple clusters. 2.3.1. Motivation Clustering By Committee is constructed by a desire to automatically extract concepts and word senses from large unlabeled collections of text. In previous researches, word senses are usually defined by using a manually constructed lexicon such as WordNet. By this way, lexicons also have several problems such as manually created lexicon often contain rare senses and they miss many domain specific senses. One way to solve these problems is to use a clustering algorithm to automatically induce semantic classes (Lin and Pantel 2001) [9]. Many clustering algorithms represent a cluster by the centroid of all its members such as K-means or by representative elements. For example, when clustering words, this task can use contexts of the words as features and group together the words. In CBC, the centroid of a cluster is constructed by averaging the feature vectors of a subset of the cluster members and the subset is viewed as a committee that determines other elements that belong to the cluster. 2.3.2. Algorithm CBC algorithm consists of three phases: in phase I, the algorithm computes each element’s top-k similar elements, with some small value of k; In phase II: the algorithm constructs a collection of tight clusters using the top-k similar elements from phase I, in which elements of each cluster form a committee; In the final phase of the algorithm, each element is assigned to its most similar clusters. 2.3.2.1. Phase I: Find top-similar elements Computing the top-similar elements of an element e, the algorithm first sort features according to their point-wise mutual information values and then only consider a subset of the features with highest mutual information. Finally, the phase I compute the pairwise similarity between e and elements that share a feature from this subset. 2.3.2.2. Phase II: Find committee 21 In the second phase of the clustering algorithm recursively finds tight clusters scattered in the similarity space. In each recursive step, the algorithm finds a set of tight clusters which is called committees. A committee covers an element if element’s similarity to the centroid of the committee exceeds some high similarity threshold. The phase II is presented as follows: • Input: A list of elements E to be clustered, a similarity database S from phase I, thresholds 𝜃! and 𝜃! • Step 1: For each element 𝑒  𝜖  𝐸 Cluster the top-similar elements of e from S using average-link clustering. For each cluster discovered c compute the following score: 𝑐 ×𝑎𝑣𝑔𝑠𝑖𝑚(𝑐), where 𝑐 is the number of elements in c and avgsim(c) is the average pairwise similarity between elements in c. Store the highest-scoring cluster in a list L. • Step 2: Sort clusters in L in descending order of their scores. • Step 3: Let C be a list of committees, initially empty. For each cluster 𝑐  𝜖  𝐿 in sorted order, compute the centroid of c by averaging the frequency vectors of its elements and computing the mutual information vector of the centroid in the same way. If c’s similarity to the centroid of each committee previously added to C is below a threshold 𝜃! , add c to C. • Step 4: If C is empty, the algorithm is done and return C. • Step 5: For each element 𝑒  𝜖  𝐸. If e’s similarity to every committee in C is below threshold 𝜃! , add e to a list of residues R. • Step 6: If R is empty, the algorithm is done and return C. Otherwise, return the union of C and the output of a recursive call to phrase II using the same input except replacing E with R. • Output: a list of committees. 2.3.2.3. Phase III: Assign element to clusters In the phase III, it has two versions: the hard clustering version for the document clustering and the soft clustering for discovering word senses. In the first version, every element is assigned to the cluster containing the committee to which it is most similar. This version resembles K-means in that every element is assigned to its closest centroid. 22 Unlike K-means, the number of clusters is not fixed and the centroids do not change. The second version, each element e is assigned to its most similar clusters in the following way: let C be a list of clusters initially empty let S be the top-200 similar clusters to e while S is not empty { let c  ∈  S be the most similar cluster to e if similarity(e, c) < σ exit the loop if c is not similar to any cluster in C { assign e to c remove from e its features that overlap with the features of c; add c to C } remove c from S } When computing the similarity between a cluster and an element (or another cluster) CBC use the centroid of the committee members as the representation for the cluster. The key to the soft-clustering version of the algorithm for discovering word senses is that once an element e is assigned to a cluster c, the intersecting features between e and c are removed from e. This allows CBC to discover the less frequent senses of a word and to avoid discovering duplicate senses. 2.3.3. Results • CBC discovers word senses by assigning words to more than one cluster. Each cluster to which a word is assigned represents a sense of that word. Using the Softclustering version of Phase III of CBC allows CBC to assign words to multiple clusters, to discover the less frequent senses of a word and to avoid discovering duplicate senses. CBC applied the precision/recall evaluation methodology and used the test set consisting of 13,403 words, S13403. 23
- Xem thêm -

Tài liệu liên quan