Disambiguating Hindi Named Entities: Ensemble learning from multiple learners

Post date: Nov 27, 2011 7:56:28 PM

Shashank Srivastava

Advisor: Dr. RMK Sinha, IIT Kanpur

We address the problem of disambiguating Hindi names, using knowledge infusion from multiple sources of evidence. The role of parsargs and POS tags is seen to be critical in Hindi NER. Other sources such as semantic word classes and co-occurrence data are also studied as possible discriminatory evidence. Furthermore, an ensemble approach combining votes from two statistical methods (Decision trees and SVM’s), and using Cost sensitive learning, is seen to significantly improve both accuracy and recall. A classification accuracy of 77% , and best F-measure of 0.56 is achieved on the provided dataset using the proposed hybrid approach.