Validating clustering for gene expression data

Posted by / 16-Oct-2017 04:46

Validating clustering for gene expression data

Since biological processes are time varying [1], they may be best described by time series gene expression rather than by a static gene expression analysis.

Acknowledging the nature of genes that are involved in dynamic biological processes (e.g., developmental processes, mechanisms of cell cycle regulation, etc.) has potential to provide insight into the complex associations between genes that are involved.

This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

By taking advantage of techniques and theories from time frequency analysis, periodic gene expression profiles are dynamically clustered based on the assumption that different spectral frequencies characterize different biological processes.

For example, Madeira and Oliveira [8] discretized real-valued gene expression data as upregulated, downregulated, and unchanged according to the slope of expression change from one time point to the next.

To our knowledge, none of the current or existing subspace clustering methodologies is able to provide biclusters that are varying in their duration of time length.

A two-step cluster validation approach is proposed to statistically estimate both the optimal number of clusters and to distinguish significant clusters from noise.

The resulting clusters reveal coordinated coexpressed genes.

Unfortunately, clustering methodologies that group genes for the purpose of novel gene discovery fail to acknowledge the dynamic nature of biological processes and provide static clusters, even when the expression of genes is assessed across time or developmental stages.

It is well accepted that genes are simultaneously involved in multiple biological processes and that genes are coordinated over the duration of such events.

validating clustering for gene expression data-33validating clustering for gene expression data-33validating clustering for gene expression data-67

Although these bicluster (i.e., clusters obtained by any subspace clustering method are referred to as biclusters from this point forward) approaches are popular, they have limitations.

One thought on “validating clustering for gene expression data”