Output Format


How ermineJ stores your results

When you save your results to a file from ermineJ, what you get is a plain text file that can be opened in Excel or a similar program. Here is an example.

The same format is used for “projects”, but with potentially multiple results stored in the same file.

Note that the file format has changed slightly in version 3.0 and is not backwards compatible

If you intend to load the results back into ermineJ at some future point, we recommend not editing this file. Instead, make a copy after you load it into excel.

The file is tab-delimited and can contain zero, one, or more result sets. Files with zero result sets can just contain settings. Each result set has its own settings stored in the file.

Each result set is stored in two main sections:

  1. A header, which contains settings (information about the configuration used during analysis). This information is used by ermineJ to recreate the conditions that existed after the analysis was run. Most of it is human-readable. In versions of ErmineJ prior to 3.0, some settings were encoded as numbers (e.g.,”rawScoreMethod = 0″). This format is still usable by ErmineJ but might be removed at a later date. If you are planning downstream analysis of the output, be aware that the number of lines in the header is not fixed and may change in future versions of ermineJ.
  2. The analysis results themselves, one row for each gene set. This section has a row of column headers, which are preceded by “#!”. For explanations of these values, see below. Note that when you reload an analysis into ermineJ you may lose small amounts of numerical precision. For example, small but non-zero pvalues might be read in only as zeros.

In addition to these sections, the file may contain comments preceded by “#”.

The columns in the output file are:

  1. ! – a column of !’s. These are here for boring reasons explained below.
  2. Name – the name of the gene set
  3. ID – the id of the gene set
  4. NumProbes – the number of elements (e.g. probes) in the gene set (“size” in earlier versions).
  5. NumGenes – the number of genes in the gene set (“effective_size” in earlier versions)
  6. RawScore – the raw statistic for the gene set. For explanations see this page
  7. Pval – the p value for the gene set.
  8. CorrectedPvalue – the corrected p pvalue. See this page for more information.
  9. MFPvalue – pvalue after multifunctionality correction. Might be missing if correction was not performed.
  10. CorrectedMFPvalue – Like CorrectedPvalue, but for the multifunctionality “corrected” pvalue.
  11. Multifunctionality – How biased the genes in the set are towards multifunctional genes.
  12. Same as – a list of gene sets which have the exact same members as this one. Such gene sets are not listed anywhere else.
  13. GeneMembers – If you selected the “Include genes” option when saving, this will contain a list of the genes that are in the gene set, separated by “|”.

Note to programmers: ErmineJ uses Apache Commons configuration to manage properties files. The results file is treated as a properties file when it is reloaded, but we’re only interested in the configuration lines, not the analysis results. Therefore lines that are not properties must be commented out. This is why there are lines that start with # and !. We use ! instead of # to help parsers distinguish between data lines and header lines. One of these days we’ll fix this oddity (well, probably not; it works pretty well).