Explanation: Interpreting Gene Set Scores

Gene set scores

During analysis, each gene set is given a score. These scores appear in the output file as well as in the tooltips for the table view.

This score is used to compute a p-value for each gene set. Because the size of a gene set must be taken into account as well, there is no simple relationship between a score and a pvalue, though higher scores will be better. This has several important implications:

  1. This means that the gene set score should not be used in isolation to evaluate the significance of a gene set. It is displayed for your information.
  2. Two gene sets of different sizes will have different pvalues for the same score
  3. The ranking of gene scores will not be the same as the ranking of p values

The meaning of the score depends on the type of analysis (see details of each analysis type for more information). The scores are:

  1. ORA: The number of genes in the gene set above the threshold you set.
  2. GSR (Resampling): Either the mean or median of the gene scores for the genes in the gene set, depending on the settings.
  3. Precision-recall: The average precision for the genes in the set, given the ranking of genes implied by the gene scores.
  4. ROC:The area under the ROC curve for the genes in the set, given the ranking of genes implied by the gene scores.
  5. Correlation: The mean value of the absolute value of the correlation between all pairs of genes in the gene set.


ORA: Say your gene score threshold is 0.001 and that selects 50 genes. Say some gene set has 30 genes, of which 3 are in the 50 genes you selected. The score displayed will be 3.

Resampling: For the same gene set of 50 genes, say the mean (log-transformed, negated) p-value is 2. That means that the geometric mean p-value is 0.01. The gene set score is 2.

Correlation: For that same gene set, we measure the correlation between each pair of genes, a value that can vary from -1 to 1. We use the absolute value, yielding values from 0 to 1. The average of this value is the gene set score. (comparisons of a gene to itself are not included).