DNA Structure 1. Nucleotides Deoxyribonucleic acid, or DNA, like proteins, is a linear macromolecule found in all living cells. In contrast to proteins, however, it is build up of only 4 different types of building blocks, called nucleotides. Nucleotides are composed of a base, being either a purine or pyrimidine group, and a 2'-deoxyribosyl-tri-phosphate. The four types of bases composing the sequence of DNA are: Purines: Pyrimidines: (Note: more structures can be found at KEGG database) The sugar is a 2'-deoxy ribose which is phosphorylated at its 5' hydroxyl group. Free nucleotides contain either one, two or three phosphates representing the mono-, di-, or triphosphate form of the nucleotide, the latter being known as dATP, dGTP, dTTP, and dCTP. Fig. dATP Nucleotides are covalently linked in DNA through their 5'-phosphate to
the 3'-hydroxyl of the next nucleotide. The linear polymer is synthesized
in the 5' to 3' direction (the 3’-OH nucleophilically attacks the Pa
of the incoming nucleotide) and the deoxyribose phosphate backbone is
a polyanion at physiological pH, i.e., it carries multiple negative
charges because the pKa value of the phosphodiester bond is
close to 1. 2. B-DNA or Watson and Crick double helix Single DNA strands are not stable, but associate with a second strand
to form a double helix structure, where both strands intertwine around
each other. At the center of the helix the four bases are H-bonded to
each other and they do so in a very specific way. The bases point toward
the helix center and H-bond in the following manner only: G with C 3 H-bonds
10bp per turn
Fig.. B-DNA sideview and projection along helical axis Incidentally, it was the structure of the a -helix in proteins that led most structural biologist to look for an analogous structure in nucleic acids, i.e., helical or double helical structures with the bases sticking away from the helix center as do the residues in protein helices. It was the insight to model the base pairing inside the double helix that was truly revolutionary about discovery of the DNA structure. (See the reprints of the two original papers by Watson and Crick at the end of this section.) A measurable quantity of DNA is its contents of G+C and A+T base pairs (besides the sequence per se) which differs from organisms to organism (Chargaff rule) and is characteristic of an organism's genome. The G+C content in mammals varies from 39 to 46% and in bacteria from 25 to 75%. The G+C content exhibits an important correlation with the stability of the double helical conformation of DNA due to the difference in hydrogen bonding capacity of GC pairs with 3 H-bonds and AT pairs with 2 H-bonds. This has been shown through heat denaturation of the double helix into a single strand conformation. So called melting curves can be measured by increasing the temperature of the solution. At a certain temperature, called melting temperature or Tm, 50% of the DNA is found in double strand (ds) and 50% in single strand (ss) form. The melting temperature is directly proportional to the GC content. Fig. Denaturation behavior of DNA
A conformational change can be induced in DNA when the relative humidity
of the sample is lowered below 75%. The double helix forms now what has
been described as A-form. The A form is a wider and flatter helix, the
base pair plane is tilted with respect to the helix axis, the helix also
exhibits a minor and major groove with the following structural parameters:
11bp per turn
5. Z-DNA This helix has a left handed conformation with the following structural
parameters: 12bp per turn
Fig. Z-Dna Hexamer With 5' Overhangs That Form A Reverse Watson-Crick
Base Pair projection along helical axis: Note: To view DNA structures above from the Brookhaven
Protein Database search for keywords or use the followoing accession numbers:
355d or 311d for B-DNA; 104d for a DNA/RNA duplex; 309d for A-DNA, and
312d for Z-DNA. 6. Genomic size of DNA Consider the size of DNA and its structure of being a long, linear polymer. In prokaryotes, the entire genome consists of one single molecule, mostly in circular form. As mentioned in the Introduction, the total genome of Haemophilus influenzae has been sequenced containing more than 1.8 million base pairs. Often DNA is found to be methylated, the result of an enzymatic modification that serves as chemical protection. Methylated bases can often not be recognized by DNA binding proteins such as nucleases that hydrolyze the phosphate ester bond between nucleotides. In this way, bacteria protect their own DNA from degradation, whereas the DNA of intruders (like viruses) which is not methylated can be degraded. Methylation occurs at: N6-methyl-dA 5-methyl-dC Because of the large size of circular DNA, the helix is often wound up
in a super-coiled form. The combination of supercoiling and helix
conformation has an effect on the recognition of DNA binding proteins
(it changes the local conformation of the B-DNA double helix). 1. Ribonucleic acid RNA is very similar to DNA in that is made of 4 different building blocks, the ribonucleotides. The pyrimidine base thymine is modified in that it lacks a methyl group and the resulting uracil takes its place in base pairing. The ribose comes in its fully hydroxylated form. Together, the presence of uracil in place of thymine, and the 2'-OH in the ribose constitute the two chemical differences between RNA and DNA. RNA is composed of the four bases: Purines: Pyrimidines: (Note: more structures can be found at KEGG
database) 2. RNA structure RNA differs, however, from DNA because it does not form an analogous
double helical structure. RNA does, however, form base pairs with DNA
resulting in a heteromeric double helix consisting of one DNA and one
RNA strand. This annealing of an RNA strand to its complementary DNA strand
is called hybridization and plays a crucial role in the transcription
and translation of genetic sequences into protein sequences. RNA does,
in contrast to DNA, form short double strand structures on itself, thereby
forming so called stem and loop structures. Both DNA/RNA double helices
and RNA/RNA double strands have an A-DNA like conformation, also called
A-RNA or RNA-II. The helix has 11bp per turn
messenger RNA or mRNA
Before discussing the individual types of RNA, let's have a look at the cellular machinery involved in protein synthesis. This diagrammatic overview is helpful in understanding the distinction of those RNA types. In both prokaryotes and eukaryotes, protein synthesis is very similar in its hierarchy, but differently organized in spatial arrangements and intra-cellular location. Prokaryotes, having no nucleus, transcribe the DNA sequence into mRNA, and the mRNA sequence is translated into the protein sequence with the intermediate tRNA molecules, which have the anti-codon information covalently linked to the corresponding amino acid. The ribosome is the place where mRNA and aa-tRNA come together and protein elongation takes place. As the diagram shows these can be simultaneous processes, i.e., the emerging mRNA strand is immediately translated into the polypeptide. In eukaryotes, transcription, or synthesis of mRNA, and translation,
the synthesis of proteins, are separated in space and time. Transcription
occurs in the nucleus, where the mRNA is synthesized and processed (splicing).
The resulting mRNA is transported to the cytoplasm and translated into
a polypeptide, in an identical process as found in prokaryotes. 3. Transfer or tRNA This section will only deal with the structure of the smallest of all RNA species, the transfer RNA. The tRNA molecules are key to the translation process of the mRNA sequence into the amino acid sequence of proteins (at least one type of tRNA for every amino acid). To be precise, the amino-acyl-tRNA-synthase proteins are the 'true' translators of the genetic code into an amino acid sequence. These synthetases acetylate tRNA molecules with the proper amino acid that corresponds to the anti-codon in the structure of the tRNA molecule. The anti-codon later recognizes the codon, the triple base sequence which 'codes' for the amino acid along the mRNA strand. A failure of properly acetylating the tRNA with the right amino acid results in a amino acid mutation even though the DNA sequence has not been changed. tRNA molecules are small nucleic acids of 60-95 nucleotides, mostly 76,
with a molecular weight 18-20kD, with the secondary structure resembling
a clover leaf. Here are a few common features shared by all tRNA molecules
found in various organisms. (1) 5' terminus always phosphorylated
Fig. Rasmol structure of tRNA from yeast
Note:Acceptor stem loop upper left; anticodon loop bottom; To view RNA structures from the Brookhaven Protein Database search with keywords or use accession number 1SLO (first stem loop of S11 ribosomal RNA of C.elegans, NMR structure) The structural complexity of tRNA is reminiscent of that of a protein
with 71 out of 76 bases participating in stacking interaction (of which
42 in double helical stem structures). 9 bp interaction are cross linking
the tertiary structure, i.e., they interact with bases from a different
stem and loop region. All of these 9 bp are non-Watson-Crick associations
and are highly conserved which makes it likely to predict similar structures
for all tRNA molecules (in fact, only few tRNA molecules have been crystallized
and their structure determined). 4. Codon - anti-codon interaction The genetic code is made such that always three bases in a row, a triplet, codes for a specific amino acid. Thus a sequence of triplets in the DNA can be transcribed into a sequence of triplets in the mRNA strand. Since the DNA is a double helix formed by two complementary strands, the anti-sense strand is transcribed into mRNA resulting in the +sense on the mRNA level (with U instead of T). As a rule, every tRNA can be covalently linked to only one type of amino acid, through the specificity of the amino acyl tRNA synthase. A specific amino acid can be linked to different tRNAs. These tRNA molecules are called iso-accepting tRNAs. There are as many tRNA species as codons used for translation. Many tRNAs bind to two or three of the codons specifying their amino acid. This happens through non-Watson-Crick base pairing at the third codon-anti-codon position and is responsible for the degeneracy of the genetic code. This not-so-precise base pairing is referred to as the wobble hypothesis and is due to the presence of a modified base in the anti-codon structure, namely a methylated guanosine, Gm, or I. For example the tRNA for phenylalanine (PhetRNA) binds two codons in the mRNA: tRNA anti codon
3' A A Gm5' 3' A A Gm5'
The Standard Code (follow the link to the genetic code tables at NCBI) By default all transl_table in GenBank flatfiles are equal to id 1, and
this is not shown. When transl_table is TTT F Phe TCT S Ser
TAT Y Tyr TGT C
Cys CTT L Leu CCT P Pro
CAT H His CGT R Arg ATT I Ile ACT
T Thr AAT N Asn
AGT S Ser GTT V Val GCT A Ala
GAT D Asp GGT G Gly |