The sequence of the human genome has been known for over a decade, but well-defined functional annotations exist mainly for the small portion of the genome that encodes proteins. In sharp contrast, the 98% of the genome that is non-coding remains incompletely annotated. Examples from gene-centric studies have provided strong evidence that the non-coding portion of the genome harbors distant-acting transcriptional enhancers and have shown that these regulatory sequences are critical for normal embryonic development. Human genetic studies also identified examples of human disease caused by perturbed enhancer sequences. However, systematic identification and functional characterization of distant-acting enhancers in the human genome continues to present a formidable challenge for several reasons:
- In contrast to the genetic code of proteins, the sequence features of enhancers are poorly understood, impeding their reliable computational identification in the human genome.
- Transcriptional enhancers can act over long distances (hundreds of thousands of basepairs) and can be located upstream, downstream or in introns of protein-coding genes. Their genomic position can therefore not be easily inferred from the known position and structure of a protein-coding gene.
- Many genes are surrounded by complex arrays of regulatory elements with distinct and/or partially redundant activities.
- The in vivo activity patterns of regulatory elements are difficult to predict by computational methods, and generally transgenic reporter assays are required to establish with certainty when and where a particular regulatory sequence is active during development or in adult organs.
- Various epigenomic features and molecular marks of regulatory sequences including enhancers have been identified through chromatin studies, yet it remains unclear how these insights can be leveraged to provide a complete functional annotation of in vivo activities of regulatory sequences genome-wide.
We are interested in combining comparative genomics, sequencing-based chromatin studies (ChIP-seq), genome engineering methods (CRISPR/Cas9), and transgenic reporter assays to identify and characterize distant-acting transcriptional enhancers at a genomic scale. These studies provide genome-wide predictions of sequences that are likely to be enhancers, as well as experimental evidence for the in vivo function of subsets of these enhancer candidates. Our efforts focus in particular on the generation of experimental datasets that will be useful to elucidate fundamental mechanisms of development (e.g. forebrain, heart and craniofacial development) and reveal regulatory sequences that are relevant for related forms of human disease.