Advisor: Dr. RMK Sinha, IIT Kanpur
We address the problem of disambiguating Hindi names, using knowledge infusion from multiple sources of evidence. The role of parsargs and POS tags is seen to be critical in Hindi NER. Other sources such as semantic word classes and co-occurrence data are also studied as possible discriminatory evidence. Furthermore, an ensemble approach combining votes from two statistical methods (Decision trees and SVM’s), and using Cost sensitive learning, is seen to signiﬁcantly improve both accuracy and recall. A classiﬁcation accuracy of 77% , and best F-measure of 0.56 is achieved on the provided dataset using the proposed hybrid approach.