Drosophila Melanogaster Major Position Matrix Motifs (DMMPMM) MAY-26-2009

This site contains supplementary materials for the paper 'Motif discovery and motif finding from genome-mapped DNase footprint data'.
All genome-mapped data uses Drosophila melanogaster v.4 Apr. 2004 (UCSC dm2) genome assembly (that is called dmel40). All coordinates are one-based.
Sequence logos are generated by pmflogo tool. It uses green grid for DIC-based scaling (in case PCM exists) and red grid for weblogo-like classic information content scaling (for PPM-only cases).
Check bottom of the page for the 'downloads' section. This is static page, so if you want to see the latest version do not be afraid of hitting F5 (refresh) in your browser. Some specific miscellaneous details can be found here.


[UPDATE] improved and integrated Drosophila Melanogaster Major Position Matrix Motifs (iDMMPMM)

Newest motifs constructed by integrating different experimental data are available here according to the [Kulakovskiy, I.V. and Makeev, V.J. (2009) Discovery of DNA motifs recognized by transcription factors through integration of different experimental sources. Biophysics, 54(6), 667-674].


Please, use blocks for navigation. Within tables factor IDs lead to the detailed data (this is not so only for features and similarity tables).

There you can see comparison of motif models came from different sources. Click corresponding motif ID to see PWM ROC curves and the motif similarity table. 'Winner' is the motif with better sensitivity in most of the cases in range of 10-90% of total footprints count.

Color codes: yellow wins in some cases, green wins in most of the cases.


There you can see comparison of motif models by Bigfoot and SeSiMCMC with fixed length from Pollard motif set. Click corresponding motif ID to see PWM ROC curves and the motif similarity table. 'Winner' is the motif with better sensitivity in most of the cases in range of 10-90% of total footprints count.


There you can see the whole list of motifs built by Bigfoot. 'ready' xml contains the motif built with the dmel40-background distribution and pseudocount taken as 1. 'map' xml corresponds to site positioning within footprints. 'sites' xml corresponds to genome-mapped site positioning.

You can check motifs with different lengths by clicking on the corresponding motif ID. You can open a big motif logo by clicking on the small one.


Motif features

Bigfoot

Down <link>

Pollard <link>

SeSiMCMC <link>

Bergman <link>

Papatsenko <link>

Noyes <link>

Noyes_hd <link>

Each link corresponds to the motif features table for different motif set. By 'site' we mean the best PWM hit within footprint (with or without flanking sequences depending on concrete case). Column descriptions:
average, the best, the worst score are defined over the whole set of possible words. For motifs longer than 10 we calculate these values over the set of 410 randomly selected words (with uniform background distribution).

average+sigma,+2sigma,+3sigma corresponds to the confident intervals defined by the mean and standard deviation for overall score distribution (if it's taken as Normal distribution).
sequence count is the total footprint count for selected factor, green if more than 16, yellow if more than 8.

We define site absence threshold as average+sigma. There are some cases when there are footprints containing no hits scoring better than average+sigma. In this case we can check additional columns.
sequences w/o score is the number of sequences without a 'ready' PWM hit better than corresponding value (so sequences w/o score > average+sigma corresponds to the number of sequences without site).

worst site, best site is the worst and best score in the set of sites (considering the site as the best PWM hit in the footprint).
flank length is the flank length (selected equal to motif length) used for possible motif appearance if there were some footprints without site.
% w/o sites is the percent of sequences containing no hits scoring better than site absence threshold (red if > 10%).
sites appearing column corresponds to the percent of appeared sites (green if > 10%) after adding flanking sequences to initial footprints (site appearance threshold was set to average+2sigma).
%weak sites corresponds to percent of sequences with the best site within [average+sigma, average+3sigma] interval (for footprints without any flanking sequences).

weak words in the 'ready' motif corresponds to the number of sites in the alignment scoring lower than threshold for the 'ready' motif (available for Bigfoot and Noyes, uses word-list from the 'ready' motif file).

It should be noted, that for each motif set we pick only motifs presented in our main Bigfoot footprint-made collection. If you are interested in the whole motif set you can download it in the 'downloads' section or follow concrete link to access original data.

One can not be sure that links sending outside of this site are working at this moment. All links were checked at 01-march-2009.


This table corresponds to AhoPro-based similarities of Bigfoot motifs for different factors.

Color codes for similarity coefficient scale: ...0.4...0.3...0.2...0.1...0.05...0.01


Funding

The work was supported by Russian Fund of Basic Research projects [07-04-01623 and 07-04-01584]; INTAS Project [05-1000008-8028]; Russian Federation Agency in Science and Innovation State Contract [02.531.11.9003]; Russian Academy of Sciences Program in Molecular and Cellular Biology, Project #10; and French INRIA Équipe associé MIGEC.


Downloads

DMMPMM and Ytilib source archives include minimal installation and usage instructions.

*_motif files correspond to the initial motifs (without threshold estimation), *_ready files correspond to the motifs (threshold included) normalized over the DMEL40 background. Sites within footprints are mapped using Bigfoot motif collection.

Footprint-mapped motifs and motif collections (small-BiSMark and plain text format): dmmpmm_motifs.zip
Modified DMMPMM scripts used for control motif creation and comparison: dmmpmm_ctrl.zip
Actual DMMPMM release (including DMMPMM source and motif collections): dmmpmm_release.zip
Ytilib ruby library and some useful tools (including Bigfoot software): ruby_ytilib.zip
Bismark-adapted MySQL database dump: mysql_bismark.zip
small-BiSMark DTD, samples and details: bismark.zip

Updates

February 2011 - Updated link to the Bergman Lab motifs.

Novermber 10 2009 - Updated link to the Bergman Lab motifs.

May 26 2009 - Fixed PMW comparison graph titles; some other corrections.

March 18 2009 - Renewed motifs sets. Separated control motif sets comparison.

March 17 2009 - Renewed Bigfoot_ctrl set.

March 11 2009 - Bigfoot_ctrl set added to the main comparison; renewed downloads section.

March 03 2009 - SeSiMCMC set renewed.

March 02 2009 - Fixed incorrect coloring of the main comparison table; Noyes homeodomain data added to the main comparison.

March 01 2009 - Noyes data added to the main comparison.

January 01 2009 - Initial public release.


Please, send all questions regarding this site and software to ivan-dot-kulakovskiy-wow-gmail-dot-com.
Fruit fly logo is kindly presented by Irina Eliseeva.