Research – Rafael Ehren

My dissertation project is concerned with developing a method that automatically disambiguates semantically idiomatic multiword expressions (MWEs) from their literal counterparts. To accomplish this, the first step is to build a corpus of annotated instances of the aforementioned MWEs. This corpus subsequently will be used to train and evaluate a classifier that is capable of differentiating the readings of a target expression by using the information given by its surrounding context. Besides the use of appropriate machine learning algorithms (supervised vs. unsupervised), the main focus of this work will be on finding the features that are best suited for this task.