Waters | Nonlinear Dynamics

Progenesis QI for proteomics

The next generation in LC-MS proteomics data analysis software.
Discover the significantly changing proteins in your samples.


Correlation Analysis

The correlation analysis is performed on arcsinh-normalised protein abundance levels. Proteins can then be clustered according to how closely correlated they are. Proteins with a high correlation value (i.e. close to 1) show similar abundance profiles while proteins which a high negative correlation value (i.e. close to -1) show opposing abundance profiles.

What can we do with this information?

Draw a dendrogram showing clusters of proteins according to how strongly correlated the proteins are. This correlation can be seen in the abundance profiles of proteins from the same cluster.

Example dendrogram

What is a Dendrogram?

The dendrogram is a visual representation of the protein correlation data. The individual proteins are arranged along the bottom of the dendrogram and referred to as leaf nodes. Protein clusters are formed by joining individual proteins or existing protein clusters with the join point referred to as a node. This can be seen in the diagram above. At each dendrogram node we have a right and left sub-branch of clustered proteins. In the following discussion, protein clusters can refer to a single protein or a group of proteins. The vertical axis is labelled distance and refers to a distance measure between proteins or protein clusters. The height of the node can be thought of as the distance value between the right and left sub-branch clusters. The distance measure between two clusters is calculated as follows:


where D = Distance and C = correlation between protein clusters.

If proteins are highly correlated, they will have a correlation value close to 1 and so D=1-C will have a value close to zero. Therefore, highly correlated clusters are nearer the bottom of the dendrogram. protein clusters that are not correlated have a correlation value of zero and a corresponding distance value of 1. Proteins that are negatively correlated, i.e. showing opposite abundance behaviour, will have a correlation value of -1 and D = 1 - -1 = 2.

As we move up the dendrogram, the protein clusters get bigger and the distance between protein clusters increases in value. It becomes difficult to interpret distance between protein clusters when protein clusters increase in size. A possible way to think about the abundance profile behaviour of two proteins would be to see how far up the dendrogram you need to go so you can move between the two proteins. In the dendrogram below, you see that to get from the protein on the left to the protein in the middle, you need to move up a distance of 0.6 (just follow the branches).

Example dendrogram

Therefore, you would expect the same general behaviour for these proteins. This can be seen in the following abundance profile graph

Abundance profiles graph

Now, compare the following protein clusters. Cluster 1 (left side and in red), cluster 2 (middle left and in brown) and cluster 3 (middle right and in blue). This illustrates the degree to which you can comment on the distance between protein clusters.

Example dendrogram

The abundance profiles for proteins in those clusters are show below.

Abundance profiles graph

Finally, looking at all the abundance profiles on the main right hand branch, we see that while abundance profiles are generally quite similar, there is certainly a variety in individual abundance behaviour. In other words, as clusters increase in size, their abundance profiles become more general.

Abundance profiles graph