High Performance, Cloud and Symbolic Computing in Big-Data problems applied to mathematical modelling of Comparative Genomics

Financed by : European Commission

Grant Agreement ID:  324554

Program : FP7-PEOPLE-2012-IAPP – Marie Curie Action: “Industry-Academia Partnerships and Pathways”

Duration: February 2013 to March 2017

Beneficiario:Perkins, James

Supervisor:Blanca Gómez, Miguel

Implementation centre: Hospital Regional Universitario de Málaga

Research group of IBIMA involved: Molecular Basis of Cellular Proliferation and Rare Diseases


Large scale genomics projects exploiting high throughput leading technology have produced and continue to produce massive data sets with exponential growing rates. So far, only a small part of this data can be abstracted, managed and processed, giving an incomplete understanding of the biological process being observed. The lack of processing power is a bottle neck in acquiring results. Comparative genomics is a good example since it includes all the ingredients: huge and ever growing datasets, complex applications that demands large computational resources and new mathematical and statistical models for analysing and synthetizing genomic information. A promising approach to address such massive data sets is the creation of new computer software that makes effective use of parallel processing.

This proposal pursues the linking of different research domains to come up with a coordinated multi-disciplinary approach in the development of tools targeting Big-Data and computationally intensive scientific applications. Generic solutions for Big-Data storage, management, distribution, processing and final analysis will be developed. These solutions will target a broad range of scientific applications, in concrete, as proof-of-concept they will be implemented in the ‘Comparative Genomics’ field of bioinformatics and biomedical domains. Applications such as the detection of main evolutionary events, new comparative genomics’ models that can be evaluated experimentally, for inter-species evolutionary distance, the composition of the k-mers dictionaries for each specie, or customising symbolic computing methods to determine the consensus tree from a sequence of trees with application in multiple sequence alignments, phylogenetic studies, clustering algorithms, etc. present in diverse fields of bioinformatics, from NGS-DNA assembly to gene-expression, all of them well suited applications to apply HPC-CC approaches and with high and attractive potential for commercialization.

Consulta el histórico de Newsletter:

Calle Severo Ochoa, 35
Parque Tecnolgico de Andaluca (PTA) Campanillas, Mlaga 29590

(+34) 951 440 260
Fax: (+34) 951 440 263