Нашли опечатку? Выделите ее мышкой и нажмите Ctrl+Enter
Название: Lecture notes in electrical engineering (№25 2008). Data mining and applications in genomics
Автор: Ao S.I.
Аннотация:
With the results of many different genome-sequencing projects, hundreds of genomes from all branches of species have become available. Currently, one important task is lo search for ways that can explain the organization and function of each genome. Data mining algorithms become very useful to extract the patterns from the data and to present it in such a way that can belter our understanding of the structure, relation, and function of die subjects. The purpose of this book is to illustrate die data mining algorithms and their applications in genomics, with frontier case studies based on the recent and current works of the author and colleagues at die University of Hong Kong and the Oxford University Computing Laboratory, University of Oxford.
It is estimated that there exist about 10 million single-nucleotide polymorphisms (SNPs) in the human genome. The complete screening of all the SNPs in a genomic region becomes an expensive undertaking. In Chapter 4, it is illustrated how the problem of selecting a subset of informative SNPs (lag SNPs) can be formulated as a hierarchical clustering problem with the development of a suitable similarity function for measuring the distances between the clusters. The proposed algorithm takes account of both functional and linkage disequilibrium in formal ion with the asymmetry thresholds for different SNPs. and does not have the difficulties of the block-detecting methods, which can result in different block boundaries. Experimental results supported that the algorithm is cost-effective for tag-SNP selection. More compact clusters can be produced with the algorithm to improve the efficiency of association studies.