# Progenesis QI

The next generation in LC-MS data analysis software.
Discover the significantly changing compounds in your samples. ## How does database fragmentation scoring work?

To score database fragmentation matches, we use an algorithm based on the well adopted cosine similarity method. A similar method is, for example, implemented by MassBank [pdf].

### Cosine similarity method

The dot product of two 2-dimensional vectors, ${\bf x} = x_1 {\bf i} + x_2 {\bf j}$ and ${\bf y} = y_1 {\bf i} + y_2 {\bf j}$ is:

It can also be expressed as:

Where $\theta$ is the angle between the two vectors, and $|{\bf x}| = \sqrt{x_1^2 + x_2^2}$.

By equating these two formulae, the "similarity" between the two vectors is given by the cosine of the angle between them, which has the nice property that it ranges from 0 to 1 when all co-efficients are positive:

This method can also be expanded to n-dimensional vectors:

A similarity of 1 means the two vectors are identical, and a similarity of 0 means they are orthogonal and independent of each other.

### Cosine similarity method applied to ms/ms scoring

We apply this method to scoring of ms/ms database matches as follows.

We create two vectors ${\bf E}$ and ${\bf D}$, where each element of the vector is a weighted peak intensity given by:

We combine all m/z's of peaks from the experimental and database spectra, and go through them in ascending m/z order. For each m/z, there are 3 possibilities:

1. There is an experimental peak at the given m/z, but no matching database peak.
2. There is a database peak at the given m/z, but no matching experimental peak.
3. There is an experimental peak at the given m/z, and a database peak at the same m/z (to within a threshold).

For each of these scenarios, we add elements to the vectors ${\bf E}$ and ${\bf D}$ as follows:

1. We add the weighted experimental peak intensity to ${\bf E}$ and a 0 to ${\bf D}$.
2. We add a 0 to ${\bf E}$ and the weighted database peak intensity to ${\bf D}$.
3. We add the weighted experimental peak intensity to ${\bf E}$ and the weighted database peak intensity to ${\bf D}$.

Finally, we calculate the similarity metric on ${\bf E}$ and ${\bf D}$ as defined above. To obtain a score between 0 and 100, we multiply this result by 100.

### Example

To illustrate this method, suppose we have the following experimental and database spectra:  In this case, the two vectors produced are as follows (where $W(m,i) = m^2 \sqrt{i}$ is the weighted intensity function):

The similarity metric is then:

So these two spectra will be given a fragmentation score of ~93 - they are fairly well matched, but there are a few peaks which are either not matched, or not expected to be present, lowering its score.