Data Driven Computational Class Discovery in Microarray Data

Raffaele Giancarlo

Dipartimento di Matematica ed Applicazioni Università di Palermo

Microarrays, along with ChIP-Seq technologies, are a de facto standard for large scale genomic and proteomic studied. However, most of their success is intimately connected to the effectiveness of the computational techniques one uses to infer meaningful structure in a microarray dataset. Some of the Information Sciences areas connected to microarray data analysis are classic: Clustering and Statistical Validation Measures for the assessment of cluster quality. Unfortunately, mcroarrays offer special challenges, e.g, high data dimensionality and noise. The aim of this tutorial is to present some of the basic ideas and techniques that have been designed in the past ten years for Clustering and Statistical Validation Measures and that try to address the challenges posed by mcroarrays. The focus will be on paradigms, rather than single techniques, and particular attention will be given to the experimental and algorithm engineering aspects of this area which, although important, are largely neglected in the specialistic literature.

Main collaborations to this line of research: Davide Scaturro and Giosue Lo Bosco (Unipa), Filippo Utro (IBM T.J. Watson Research Center), Luca Pinello (Harvard Medical School)