The functional annotation of eukaryotic DNA sequences represents a great challenge
post-genomic biological research. A great aid to functional annotation of genome
provided by comparative genomics methods which, since a few years, have been extended
non coding DNA regions.
However, comparison of non coding sequences requires new algorithms and strategies to
account the different evolutionary mechanisms affecting regulatory sequences. Here,
an novel large scale alignment strategy which aims at drawing a precise map of
conserved non-coding regions between genomes, even when these regions have undergone
scale rearrangement events. Our procedure is optimized to take into account the
of non coding DNA, such as shuffling and sequence variability of binding sites
modules, low scale translocations, inversions and duplications.
The recent availability of 12 Drosophila species sequencing and annotation offers a
complete and reliable genomic dataset for developing and testing methods for
genomics of non coding DNA.We used a `gene-centric` approch, in that it starts with a
orthologous genes between two species, and applies a local alignment algorithm to
corresponding flanking intergenic regions and intronic regions of these orthologous
For each Drosophila species took in exam, we compile a list of orthologous
genes with D. Melanogaster, according to the `12 genomes project` data.
Considering each locus in our list, according to the mentioned genome annotation, we
the whole repeated-masked sequences containing the transcriptional unit and the
corresponding flanking intergenic regions up to the preceding and the following
way, our selection is not constrained from the syntenic order of the orthologous
is relevant when we compare species very distant in the evolutionary tree.
Local pairwise alignments between related sequences was performed using CHAOS, which
an heuristic alignment algorithm with some peculiar features optimized for large non
sequences. CHAOS works chaining small words which match between the two
input sequences. Differently from BLAST, it is a double seed technique and it allows
degeneracy in seeds. It chains toghether seeds that are closer that a maximum
and it returns the highest scoring chains, according to a standard Needleman-Wunsch
metric. Hence, it is able to identify conserved blocks rearranged in non-colinear
order or in
a reverse order with a very high resolution. On the other hand, it is able to rapidly
large sequences with a better specificity than purely local aligners, thanks to the
technique.We choose two different sets of parameters in CHAOS and build a
conservative and a
more sensitive version of our alignments.
We applied the described procedure to each list of ortholgous genes between D.
seven other Drosophila species, providing a provisional data browser at:
We obtained a genome-wide high resolution map of D. Melanogaster compared to other
seven drosophila species.
According to this map, we can extimate conservation features of Drosophila genome at
large scale: the percentage
of conservation of intronic and intergenic genome, the rate of low scale
rearrangement events, as inversions and
reshuffling. Interestingly, we observe numerous small scale rearrangement events,
such as local inversions,
duplications and translocations, which are not observable in the whole genome
available. For example, about 15% of the conserved blocks have been obtained aligning
orthologous regions on opposite strands, indicating small scale inversions. Moreover,
allow 1-n relationships between blocks in both aligned species, we immediately spot
events, like the duplication of the tRNA gene K5:84ABd of D.melanogaster in D.
This catalog of non-coding conserved blocks will constitute the starting point for
investigations, related to the evolution of conserved non-coding regions in the
drosophila or the
discovery of cis-regulatory regions.