ermineJ API

While the internal use of the ermineJ API can be complex, most of that complexity is not needed for use by third parties.

Here we document two ways to use ermineJ programmatically (as opposed to using the command line). The first is a “simple API” for Java. The second is to use Groovy.

The ermineJ Javadoc is here. For more information and examples, download the source code.

Using the ermineJ “Simple” API

See ClassScoreSimple for the formal documentation.

For the simple API, the minimal requirements for an analysis are:

  • The ermineJ jar file and its dependencies are in your classpath.
  • A java.util.List of probe ids. This should contain all the probes on the platform (e.g. microarray design)
    (or at least, not just the ones that met some selection criterion)
  • A List of gene symbols (in the same order as the probe ids)
  • A List of Collections of GO terms for the gene symbols (keyed by the probe ids, in the same order as the probes)
  • A List of scores for the probe ids. Typically these are p-values, but they can be any value you like, inluding a dummy variable indicating cluster membership etc.

The use of java.util.Lists was intended to make it very easy for third parties to create data structures that ermineJ can handle. It is the programmer’s responsibility to make sure the Lists are in the correct order. While ermineJ will detect some types of problems with the input data structures, it cannot tell that you put the probe IDs in a different order than the gene symbols.

Once the above are assembled, the analysis proceeds in three phases:

  1. Create a ClassScoreSimple object with the above lists as arguments to the constructor
  2. Configure settings.
  3. Run the analysis by calling the ‘run’ method.

The results can then be obtained with a simple method call..

The following code snippets demonstrate how to implement these steps.


List probes = null; // List of identifiers to be analyzed
List genes = null; // List of genes corresponding to the probes.
// Indicates the Many-to-one mapping of probes to genes.
List goAssociations = null; // List of Collections of go terms for the probes.
List geneScores = null; // List of Doubles

/* code to initialize data structures omitted */

ClassScoreSimple css = new ClassScoreSimple( probes, genes, goAssociations );

// in our raw data, smaller values are better (like pvalues, unlike fold
// change)
css.setBigGeneScoreIsBetter( false );

// set range of sizes of gene sets to consider.
css.setMaxGeneSetSize( 100 );
css.setMinGeneSetSize( 5 );

// use this pvalue threshold for selecting genes. (before taking logs)
css.setGeneScoreThreshold( 0.001 );

// use over-representation analysis.
css.setClassScoreMethod( Settings.ORA );
/* ... etc. Reasonable defaults (?) are set for all parameters if you don't set them. */

css.run( geneScores ); // might want to run in a separate thread.

// You should iterate over your tested gene sets.
double fooPvalue = css.getGeneSetPvalue( "foo" );
double barPvalue = css.getGeneSetPvalue( "bar" );

Using ermineJ with Groovy

Groovy is a scripting language based on Java. It’s a good way to access the functionality of ErmineJ. Configuring Groovy to run this script can be as simple as copying the ErmineJ jar file and its dependencies to your ~/.groovy/lib directory. Then for writing your own scripts, the important packages are ubic.erminej.data and ubic.erminej.analysis.

Here is an example script that runs analysis on multiple “hit lists” in one run, saving each to a file (scripts similar to these are part of the ErmineJ source distribution).

#!/usr/bin/groovy
/*
 * Demo script showing how to run an analysis of hit lists, save results.
 */
package ubic.erminej.script
import ubic.erminej.*
import ubic.erminej.analysis.*
import ubic.erminej.data.*

/*
 * Set things up.
 */
config = new Settings(true)
config.useMolecularFunction = false
config.useBiologicalProcess = true
config.useCellularComponent = false
config.useUserDefined = true
config.classScoreMethod = "ORA"
config.geneScoreThreshold = 1.0  // this is a peculiarity of the hitlist style
config.doLog = true
config.bigIsBetter = false

goData = new GeneSetTerms( "go_daily-termdb.rdf-xml.gz" )
parser = new GeneAnnotationParser( goData, null )
geneData = parser.read( "Generic_human.an.txt", "DEFAULT", config )

// Assume file contains a tab-delimited list of lists, one per line, with list name in first field.
// The list name is just ignored by the gene parser so I didn't bother removing it.
file = new File("mysets.txt")
file.eachLine{ line ->
    f = line.tokenize('\t')
    name = f[0]
    println(name + " " + (f.size - 1) + " genes ...")

    try {
        gs = new GeneScores(f, config, null, geneData)
        results = new GeneSetPvalRun(config, gs)
        ResultsPrinter.write(name + ".erminej.txt", results, false)
    } catch (e ) {
        println(name + " FAILED: " + e)
    }
}

 Groovy example 2: Reading results and writing images

This example shows how results can be read in, and then used to generate images of the top 10 gene sets. Note that reading results requires that the settings stored in the file are valid (in particular, paths to files).


#!/usr/bin/groovy
/*
 * Demo script showing how to load an analysis and save pngs of the top 10 results.
 */
package ubic.erminej.script
import ubic.basecode.graphics.ColorMap;
import ubic.erminej.*
import ubic.erminej.analysis.*
import ubic.erminej.data.*
import ubic.erminej.util.*

loadFile = args[0]
assert loadFile != null

settings = new Settings( loadFile );
assert settings != null

assert settings.classFile != null

goData = new GeneSetTerms( settings.classFile + "", settings )
parser = new GeneAnnotationParser( goData, null )
geneData = parser.read( settings.annotFile, settings.annotFormat, settings )

gs = new GeneScores(settings.scoreFile, settings, null, geneData)

athread = new Analyzer( settings, null, geneData, loadFile );
results = athread.loadAnalysis();

assert results != null && results.size > 0, "No results were loaded from " + loadFile

result = results.asList()[0]

rlist = result.getResults().values()
rlist.sort()

rlist.eachWithIndex {  it, i ->
  if (i > 10) {
        return
  }

  details = new GeneSetDetails(it.geneSetTerm, it, result.geneData, settings, gs, null)
  GeneSetDetailsImageWriter.writePng( details, result.name + "." + (1+i) + ".png", ColorMap.BLACKBODY_COLORMAP, true, false, true )
}