This file contains brief description of the input/output of the Evidence Ranked Motif Indentification (cERMIT) algorithm. (version 1.01) Basic instructions on how to run the code are also included. Example input files are provided with the code: - ACE2_YPD yeast ChIP-chip dataset, using conservation with 4 closely related species - STAT1 human ChIP-seq dataset, using DNaseI data to define the set of putative regulatory regions - miR1 micro-RNA overexpression assay Input files: ------------ 1) Evidence file (e.g. evidence_ACE2_YPD) 2) Regulatory sequence files (e.g. ACE2_YPD_cerv_seq, ACE2_YPD_bay_seq, etc.) 3) literature PSSM (if available) (e.g. pssm_ACE2_YPD.) 4) Evidence parameters file (e.g. evidence_file_list_ACE2_YPD) 5) Regulatory sequence parameters file (e.g. sequence_file_list_ACE2_YPD) Output file: ------------ 1. _summary.txt - ranked list of the motif predictions according to the score assigned by cERMIT 2. _web_logo_top_cluster[1-10].pdf - top 10 predicted motif cluster PSSM logos 3. _web_logo_literature.pdf - literature PSSM logo (if literature PSSM has been provided as input) The file formats are described below: ************************************* 'Evidence file' --------------- ... (e.g. 'score' could be a probe p-value from a ChIP-chip experiment) 'Evidence parameters file' -------------------------- ... 'Regulatory sequence file' -------------------------- regulatatory_region1$rev_compl(regulatroy_region1)$regulatatory_region2$rev_compl(regulatroy_region2)$... *NOTE: All regulatory regions must be the same length, shorter sequences should be padded at the end by 'X' symbols (at the beginning for the reverse complement of the region). The DNA bases should be capitalized. 'Regulatory sequence parameters file' (assume given 'k-1' (k >=1) orthologous species) ------------------------------------- ... 'literature PSSM' ----------------- e.g. 4 6 0.012 0.012 0.012 0.012 0.012 0.012 0.964 0.012 0.012 0.964 0.964 0.012 0.012 0.964 0.012 0.012 0.012 0.012 0.012 0.012 0.964 0.012 0.012 0.964 '_summary.txt' (main output file) ----------------------- contains the detailed description of all "evolved" 5-mer seeds which constitute cERMIT's motif predictions. The predictions are ordered according to their enrichment score assigned by the Objective Function (defined in the text). for each cluster cERMIT outputs: (number of predicted target genes)literature: Running Instructions: ********************* Before running cERMIT with the provided sample input, make sure that the 'input' subdirectory exists and contains all provided input files. The 'Evidence_file_list' and 'Sequence_file_list' files should be in the same directory as the executable 'c_ERMIT'. run command: ./c_ERMIT possible values for : chip_chip, chip_seq, micro_RNA example run scripts - ChIP-chip: ./sample_run_ACE1_YPD - ChIP-seq: ./sample_run_STAT1 - micro-RNA: ./sample_run_miR1 *NOTE: The output will be directed to specified by the user (will be created if doesn't exist).