Transcription start site prediction.
The McPromoter system to identify vertebrate and arthropod transcription start sites has been developed by Uwe, starting at the University of Erlangen. The fly version has been improved in 2006 and includes predictions for different configurations of core promoter elements.
Here are batch fly predictions in gff format for release 4 and release 5. Both are run with a threshold of 0.03; the number of predictions is higher for release 5 because of extra predictions in unassembled contigs, as well as the reduction of the minimal distance between predicted start sites from 1,000 nt to 100 nt which increases the number of predictions for possible alternative TSSs.
The vertebrate McPromoter version is NOT further developed at this point and is provided "for your convenience" only, as its performance is still competitive, and it has been often used as benchmark to evaluate newly developed systems. However, we DO encourage you to use our new approach S-Peaker (2009) to predict mammalian transcription start sites. It was trained on recent high-throughput data reflecting precise TSS locations, and strongly outperforms McPromoter especially in terms of resolution. S-Peaker was developed by Molly Megraw, in collaboration with Artemis Hatzigeorgiou and others at U Penn. The code is available, but we have not yet set up an interface. Please make use of this system especially when benchmarking predictions.
Alignment & motif modeling.
Weichun Huang has developed a program called ACANA (ACcurate ANchoring Alignment), for fast heuristic pairwise alignments of biological sequences at both the local and global level. OMiMa , also developed by Weichun, is a motif model with flexible higher order dependencies not restricted to neighboring nucleotides.
NEW: Our most recent work describes a simulator of cis-regulatory sequence evolution , allowing for turnover events of transcription factor binding sites.
Motif identification.
With Alex Hartemink's group, we have worked on motif finding in a Gibbs sampler setting. The Priority system makes use of location-based priors to improve the efficiency of motif finding.
Gene finding.
Bill Majoros has been developing a number of open source programs to identify genes in eukaryotic genomes. Check out his page .