Supplement to Ballouz et al.

This web page provides details about the analysis and results in “Using predictive specificity to determine when gene set analysis is biologically meaningful” by Ballouz, Pavlidis and Gillis.

Analyses were conducted in Matlab and R, and with ErmineJ 3.0.

Files used in simulations and MolSigDB analysis

MolSigDB: Originally from c2.cgp.v3.1.symbols.gmt.txt Please obtain this file from the Broad Institute. Note that we did not analyze all of the sets. The data we used is in MolSigDB_human_data

Neurocarta: neurocarta.genesets.txt. Disease Ontology (DO) to gene associations based on a dump of the Neurocarta database from April 2013.

KEGG: c2.cp.kegg.v3.1.symbols.gmt.txt Please obtain this file from the Broad Institute

NCBO Annotator for associating MolSigDB lists with Disease Ontology terms:

MolSigDB2DO.txt – inferred DO mappings based on NCBO annotator.

MolSigDB.DO.erminej.results.txt – Aggregated results of analyzing MolSigDB hit lists for enrichment in DO groups as provided by Neurocarta.

pubmed2do.groovy. This script (in the groovy language) depends on Apache commons-httpclient and Gemma.

Citation counts (supplementary data)

MolSigDB_Pub_ids_cites_set, obtained with the help of fetchCites.R; uses the rentrez package from CRAN.

ErmineJ

The following files were used for the case studies. Put this in your ErmineJ data directory ($HOME/ermineJ.data).

Case study data files

Along with the above ErmineJ files, you should be able to load these files into ErmineJ 3.0 and reproduce the results from the case studies. The “QuickList” files are the score files; put them in your ErmineJ data directory so they will be automatically found by the software. Otherwise you may be prompted to located them, or you will have to edit the results file to refer to the correct path.