We present the
Homo Sapiens Comprehensive Model Collection (HOCOMOCO) of transcription factor (TF)
binding models obtained by careful integration of data from different sources. HOCOMOCO contains
426 non-redundant curated binding models for 401 human TFs.
DNA sequences of TF binding regions obtained by both pregenomic and high-throughput methods were
collected from existing databases and other public data. The
ChIPMunk software was used to construct
positional weight matrices. Four motif discovery strategies were tested based on different motif shape
priors including flat and periodic priors associated with DNA helix pitch. A quality rating was manually
assigned to each model based on known binding preferences. An appropriate TFBS model was selected for each
TF, with similar models selected for related TFs.
In any case only one model per TF was selected unless there was additional evidence for two distinct binding
models or different stable modes of dimerization. All TFBS models and initial binding segments data used for
motif discovery were mapped to UniPROT IDs.
More information is available in the
Details section.