Published in Volume XV, 2005, pages 137-152

Authors: Raluca URICARU, Liviu CIORTUZ


This paper describes the system that we designed for solving the “Learning Language in Logic” Challenge task 2005 (LLL’05) concerning the extraction of directed genic interactions from sentences in {sc Medline} abstracts. We see this task as a classification problem: the system must separate the interacting pairs of genes/proteins from the non-interacting ones, and it achieves this by learning a model from the training data given in the form of positive and negative examples. Each example of gene/protein interactions found in the given {sc Medline} abstracts is described linguistically by a set (chain) of syntactic relations, which constitute (most part of) the annotations accompanying the abstracts.

The key task in the present work was the selection of those features that best describe the training and test examples.rem{ Such feature include: the `root’ and the `head’ word of the syntactic chain/tree associated to the example, the (number of) positive and negative examples “close to” a certain instance according to a conveniently defined distance measure, etc. }

To learn the classification model from the training data, we used different classifiers implemented in the WEKA library (Waikato Environment for Knowledge Analysis)rem{: the naive Bayes classifier, Bayes belief networks, radial basis function networks, support vector machines (SVMs) and ADA Boost. }

Our best result was 58.7% F-measure on the entire dataset, and it was obtained using the naive Bayes classifier. This result proved to be second best compared to the results reported by the participants at the {em LLL’05 Challenge} task.


  title={Genic Interaction Extraction from MEDLINE Abstracts - A Case Study.},
  author={Raluca URICARU and Liviu CIORTUZ},
  journal={Scientific Annals of Computer Science},
  organization={``A.I. Cuza'' University, Iasi, Romania},
  publisher={``A.I. Cuza'' University Press}