Title: A MFoM Learning Approach to Robust Multiclass Multi-Label Text Categorization
Speaker: Dr.Gao Sheng
From: Institute of Infocomm Research (I2R), Singapore
Time: 2:30 pm, Jul.2, 2007
Place: Room 218, Dezhao Building
About the speaker:
Dr. Gao Sheng got his Ph.D from Institute of Automation in April, 2001. His thesis is on large vocabulary continuous speech recognition, mainly acoustic modeling, search, and real-time system implementation. He is now a Senior Research Fellow in Institute of Infocomm Research (I2R), Singapore. His current research interests are on multimedia information retrieval, machine learning and graphic model. Before joining I2R, he is a Research Fellow in National University of Singapore from Jan 2002 to Dec. 2002, and is an Invited Researcher at ATR, Japan. He has published more than 30 papers in relevant international conferences and journals such as ICASSP, ICSLP, ICIP, ACM SIGIR, ICML, ACM Trans. on Information System, IEEE Trans. on Multimedia, etc.
Abstract:
In the talk, we introduce a multiclass (MC) classification approach to text categorization (TC). To fully take advantage of both positive and negative training examples, a maximal figure-of-merit (MFoM) learning algorithm is introduced to train high performance MC classifiers. In contrast to conventional binary classification, the proposed MC scheme assigns a uniform score function to each category for each given test sample, and thus the classical Bayes decision rules can now be applied. Since all the MC MFoM classifiers are simultaneously trained, we expect them to be more robust and work better than the binary MFoM classifiers, which are trained separately and known to give very top TC performance. Experimental results on the Reuters-21578 TC task indicate that the MC MFoM classifiers achieve a micro-averaging F1 value of 0.377, which is significantly better than 0.138, obtained with the binary MFoM classifiers, for the categories with less than 4 training samples. Furthermore, for all 90 categories, most with large training sizes, the MC MFoM classifiers give a micro-averaging F1 value of 0.888, better than 0.884, obtained with the binary MFoM classifiers.