Academics‎ > ‎


Disambiguating Hindi Named Entities: Ensemble learning from multiple learners

posted Nov 27, 2011, 11:56 AM by Shashank Srivastava

Shashank Srivastava
Advisor: Dr. RMK Sinha, IIT Kanpur

We address the problem of disambiguating Hindi names, using knowledge infusion from multiple sources of evidence. The role of parsargs and POS tags is seen to be critical in Hindi NER. Other sources such as semantic word classes and co-occurrence data are also studied as possible discriminatory evidence. Furthermore, an ensemble approach combining votes from two statistical methods (Decision trees and SVM’s), and using Cost sensitive learning, is seen to significantly improve both accuracy and recall. A classification accuracy of 77% , and best F-measure of 0.56 is achieved on the provided dataset using the proposed hybrid approach.

A Reinforcement Learning Autoguider for Astrophotography

posted Nov 27, 2011, 8:14 AM by Shashank Srivastava   [ updated Nov 27, 2011, 11:58 AM ]

Shashank Srivastava, Michael Hirsch
Advisors: Dr. Jan Peters, and Dr. Bernhard Scholkopf, Max Planck Institute for Biological Cybernetics

Contemporary autoguiders for guiding telescope mounts for astronomy are essentially rule-based, and depend on rule-based corrections based on object positions at different time instances. We propose a two variable Q-learning optimization algorithm is proposed for autoguiding a German Equitorial Mount for star-tracking and astrophotography. We formulate the problem as a continuous Markov Decision Process with agent ‘actions’ corresponding to motor movement durations, using the GPUSB controller from Shoestring astronomy. The new approach uses robust star-tracking heuristics and reinforcement learning to learn favourable guiding policies, while incorporating mount specific behavior and current motion of the mount in deciding new movements. The method is extensively tested on artifical data using simulations, as also for real tracking. The learning algorithm is seen to converge quickly to an optimum strategy, and error rates using the proposed autoguider are seen to be lower than rule-based approaches.

Using structure of Gene Ontology for Prognosis and Inference from Microarray data

posted Nov 27, 2011, 8:11 AM by Shashank Srivastava   [ updated Nov 27, 2011, 11:59 AM ]

Shashank Srivastava and Snigdha Chaturvedi
Advisor: Dr. Arnab Bhattacharya, IIT Kanpur

Recently, there has been much interest in the study and analysis of microarrays for mining valuable biological information, building predictive models for pathological conditions, and unraveling latent correlations signifying biological pathways. Several techniques have focused on identifying di.erentially expressed genes, and proposed representations of the microarrays through dimensionality reduction techniques to overcome the `curse of dimensionality'. Statistical tests such as Anova and the Fisher's test, and methods such as clustering and SVD decomposition have been useful in determining di.erentially regulated genes, and building statistical models of medical conditions from gene-expression values while ignoring noise and normal variations. Recently, the Gene-set approach has been proposed to evaluate expression patterns of gene groups instead of individual genes. These methods, however, do not allow direct inference of relations between gene-expression values and genetic concepts. This study provides an approach to extend the statistical method using additional knowledge infusion from the structure of the Gene Ontology database. We propose a hierarchical approach towards data representation, which bypasses several limitations of existing methods, and can directly yield biological understanding and interpretability. The proposed method is tested on two standard datasets. Prognostic predictions from our method are seen to validated precisely by existing biological literature and highly specific control-studies. The proposed representation also shows predictive potential, and classi.cation accuracies from our novel representation scheme using decision trees compare favorably with statistical methods.

A Content-based Similarity Search for Monophonic Melodies

posted Nov 27, 2011, 8:08 AM by Shashank Srivastava   [ updated Nov 27, 2011, 11:59 AM ]

Shashank Srivastava and Snigdha Chaturvedi
Advisor: Dr. Arnab Bhattacharya, IIT Kanpur

Feature extraction based methods have been used for identifying genres in musical pieces, and there are also attempts to predict the popularity of songs by statistical methods such as clustering. Transportation distances have been extensively used, especially in image searches. Advantages of such methods are their incorporation of the notions of continuity and partial matching. We proposed to combine both methods: we test different kinds of feature representations to cluster songs in a database through global level parameters. At the time of search, the input Midi sequence is classified to one of the clusters by an SVM, and the songs in the sequence with the minimum Levenshtein distances are returned. The sequential approach is seen to yield encouraging results on two standard datasets, for Midi inputs by amateur players.

A Color-based Indexing for Image Similarity Searching

posted Nov 27, 2011, 8:04 AM by Shashank Srivastava   [ updated Nov 27, 2011, 11:59 AM ]

Shashank Srivastava and Sahil Suneja
Advisor: Dr. Arnab Bhattacharya, IIT Kanpur

Color is an important attribute of visual information, and hence can be a very useful attribute for image matching and retrieval, instead of dominantly shape matching approaches. Color is largely independent of view and resolution and also serves as a local identifying feature. Color based indexed searches have previously relied primarily on histogram-based approaches or color clustering for similarity searches. We propose a new indexing approach based on a large hierarchical colormap, where similarity can be given simply as an inner product. A feature vector is used to represent the color content of an image, and is used as a preliminary index to prune most image searches. In the second step, the pruned images (with a greater than threshold color-match) are matched with the input object for individual features by L1 norm. The two-step approach is seen to improve search time with a high selectivity in the first step, and efficient results on an image-dataset compiled from the Alamy database.

Evolution of compositional languages in multiple agent social communities

posted Nov 27, 2011, 8:01 AM by Shashank Srivastava   [ updated Nov 27, 2011, 12:00 PM ]

Thesis by: Shashank Srivastava
Advisor: Dr. Harish Karnick, IIT Kanpur

While computational modeling has yielded several plausible models for language emergence in a set of uniformly endowed agents, most of these treatments do not address emergence of syntax. They also ignore population turnover and do not incorporate dynamical and structural aspects of populations. Most earlier simulations for realistic populations have ignored the syntactic and compositional nature of human language; and have focused on the evolution of a coherent lexicon. While a coherent vocabulary is a necessity for any language, it is in fact syntax which allows humans to express seemingly in.nite meanings using a .nite set of phonetic elements. 

In this thesis, we have extended a well known inductive learning model of language learning to large populations, heterogeneous interactions, and realistic social communities. The model induces grammatical rules on the basis of phonetic resemblances between lexical entities, and similarities in semantic meanings they correspond to. We have developed a framework where multiple agents can interact in an iterated learning setting, and each agent can receive its primary linguistic input from a set of speakers according to distributions specified by the existing social topology. We also try to extend the deterministic production model to a probabilistic one, and investigate possible biases which can expedite the emergence of compositional syntax.

In particular we study the effect of population size and the structure of social topology on linguistic coherence and language emergence for this model. Our investigation of the extended model on diff.erent social graphs leads to several insights, and indicate that social topology can have significant effects on the acquisition and evolution of language.

1-6 of 6