This file contains brief description of the input/output of the Evidence Ranked Motif Indentification (cERMIT) algorithm. Basic instructions on how to run the code are also included. Example input files are provided with the code (ACE2_YPD yeast chip-chip dataset, using conservation with the remaining 4 species in the sensu strico clade) Input files: ------------ 1) Evidence file (e.g. evidence_ACE2_YPD) 2) Regulatory sequence files (e.g. ACE2_YPD_cerv_seq, ACE2_YPD_bay_seq, etc.) 3) literature PSSM (e.g. pssm_ACE2_YPD.) 4) Evidence parameters file (e.g. evidence_file_list) 5) Regulatory sequence parameters file (e.g. sequence_file_list) Output file: ------------ 1. _summary.txt - ranked list of the motif predictions according to the score assigned by (c)ERMIT 2. _prediction.pdf - top predicted motif PSSM logo 3. _based_on_literature_consensus.pdf - literature PSSM logo 4. _best_pssm - PSSM representation for top prediction The file formats of are described below: **************************************** 'Evidence file' (assume given 'p' regulatory regions) --------------- ... 'Evidence parameters file' (assume given 'p' regulatory regions) -------------------------- ... *NOTE: The 'literature consensus' is used only when generating the output summary. The actual motif prediction score is calculated based on the provided 'literature PSSM' using the similairty metric described in the text. 'Regulatory sequence file' -------------------------- gene1_regulatatory_region$rev_compl(gene1_regulatroy_region)$gene2_regulatatory_region$rev_compl(gene2_regulatroy_region)$... *NOTE: Regulatory sequences for all regions must be the same length, shorter sequences need to be padded at the end by 'X' symbols. The DNA bases should be capitalized. 'Regulatory sequence parameters file' (assume given 'k' (k >=0) orthologous species) ------------------------------------- .... 'literature PSSM' ----------------- '_summary.txt' (main output file) ----------------------- contains the detailed description of all "evolved" 5-mer seeds which constitute cERMIT's motif predictions. The predictions are ordered in order of their enrichment score assigned by the Objective Function (defined in the text). for each cluster cERMIT outputs: (number of predicted target genes)literature: '_best_pssm ' ---------------------- same format as 'literature PSSM' Running Instructions: ********************* Before running cERMIT with the provided sample input, make sure that the 'input' subdirectory exists and contains all provided input files. The 'Evidence_file_list' and 'Sequence_file_list' files should be in the same directory as the executable 'c_ERMIT'. run command: ./c_ERMIT 'Evidence_file_list' 'Sequence_file_list' 'output_directory' example run script: ./sample_run *NOTE: The output will be directed to 'output_directory' specified by the user (will be created if doesn't exist). cERMIT default parameters for chip-seq data: -------------------------------------------- max-motif set size = 30% of sample space of regions min motif set size = 20 regions cERMIT default parameters for chip-chip data: --------------------------------------------- max-motif set size = 15% of sample space of regions min motif set size = 20 regions