Molecular interaction in biological systems

An Introduction to Molecular Interaction in Biological Systems

by Lukas K. Buehler

Introduction	Complexity of Cellular Systems
Principles of Molecular Recognition	Similarity of Molecular Surfaces The Ligand-Receptor Complex and its Pharmacophore Water As Solvent and Binding Energetics The Structural Properties of Biological Macromolecules
Recognition of Proteins	Molecular Motion and Protein Stability Self-assembly Systems and Protein Complexes Allosteric Properties of Proteins Receptors signaling Enzymes and Their Inhibitors
Recognition of DNA & RNA	DNA Binding Proteins Nuclear Receptors Antisense & Small Molecule Ligands

Find information in this biotechnology and biochemistry web reader: This page contains short summaries of each chapter. To read an extended version of a chapter, click on the links provided or use the Search option (note: searches entire www.whatislife.com site). Find literature referenced throughout the web-reader here!

Introduction

The last fifty years have completely changed the way biological and medical researchers can study and understand life, its development from conception to death, susceptibility to infectious and inherited diseases, in short, the molecular mechanisms of metabolic processes. One reason that brought about this understanding lies in the ability to access the information contained in biological macromolecules. Information stored in the structure of molecules is a function of their physical and chemical properties. A second and more important reason is the ability to manipulate this information by virtue of changing the structures of macromolecules - proteins, nucleic acids, or polysaccharides. The advancement of molecular biology has been the driving force behind these changes in the biomedical sciences. But the functional manipulation of biological material could not generate much of what is done today by the pharmaceutical industry, if it were not for the preceding developments in physics and chemistry during the 19th and 20th centuries including thermodynamics, statistical mechanics, and the nature of the chemical bond. The reductionist's approach - the study of chemistry and physics of life - created an enormous wealth of biochemical and genetic data available for the rational design of drugs and the manipulation of the genome. The ensuing atomistic view shall be presented and discussed within the context of molecular interactions in biological systems. While DNA is the storage of hereditary information, proteins and RNA are its agents, accessing and executing the genetic programs. The mechanism of protein function is simple; proteins accelerate chemical reactions (as enzymes) by providing optimal binding to substrates, or drastically improve solubility and target specific binding (as receptors) of small ligand molecules. All catalytic activity and ligand binding occurs on solvent accessible protein surfaces provided by preferential molecular interactions. These molecular interactions are electrostatic in nature. The strength of these interactions, the forces among atoms, can be categorized according to their thermodynamic and kinetic behavior and is defined as affinity. The conformational precision of interaction leads to selective binding and is defined as specificity. Both properties are directly dependent on the physico-chemical properties of the solvent of life - water.

PART I

Principles of Molecular Recognition

The function of most proteins is controlled by small molecule ligands that reversibly bind to proteins and either stimulate or inhibit their activity. Because different areas of research have studied different kinds of proteins, more than one nomenclature for these small ligand molecules are used. This is summarized in the table below.

TABLE Classification of Ligands

Target structure	Positive effector	Negative effector
Enzyme	Substrate	Competitive inhibitor
Receptor	Agonist	Antagonist

For enzymes which catalyze chemical reactions, natural ligands are the substrates which have to bind before they are being chemically processed into products. Catalytic reactions can be suppressed by competitive inhibitors which bind to the same location on the surface of an enzyme as the substrate. This location is called the active site. Receptors (e.g. cell surface proteins, nuclear DNA-binding proteins) bind ligands without chemically modifying them. Instead binding induces a conformational change in the receptor protein that can trigger a chemical reaction of a substrate bound somewhere else on the same protein or affect the binding affinity of a second molecule that interacts with the receptor. This process is known as allosteric regulation. Ligands activating receptors are called agonists, while competitive inhibitors of these ligands are called antagonists.

Binding events are characterized as reversible chemical equilibrium and binding (and thus effect) of both agonists and antagonists are concentration dependent. The affinity of the ligand for the binding site can be quantified by the equilibrium constant of binding (association constant KA) or unbinding (dissociation constant KD) over an effective concentration range, i.e., where an agonist induces an effect, or an antagonist can block agonist activity. Affinity tells us how strong a ligand binds to its receptor, is related to the Gibbs free energy of binding. Affinity is a macroscopic property if binding representing an averaged behavior of a very large number of events that are the result of an often complex series of events and molecular interactions. The latter are microscopic properties of molecular structures and are described by non-covalent bond structures. Bridging the qualitative difference between macroscopic properties (thermodynamic quantities and kinetic data) and microscopic structural information (chemical bonds, electron density maps) is the biggest obstacle to predicting functional aspects of ligand-receptor interaction.

Molecular similarity space and the structure of a ligand/receptor complex

The following discussion refers to the structure of a stable complex between a ligand and its receptor which is a microscopic description. The strength of an interaction (its affinity, which is a macroscopic description) depends on the complementarity of the physico-chemical properties of atoms that bind, i.e., protein surface and ligand structure. Excluding catalytic mechanisms from the discussion, two classes of molecular properties important for binding can readily be distinguished:

- shape or volume
- surface potential

Talking about these properties, chemists refer to them as the molecular similarity space which can now more precisely be described as:

atom pair matching function	(shape or volume)	weak interactions
charge matching function	(surface potential)	strong interactions

It is intuitive to think that simple binding has to do with similarity (or complementarity) of properties and structures such that the higher a similarity the more specific the recognition will be. Atom pair matching function mostly refers to Van der Waals surfaces of molecules and hydrogen bonds. Both interactions are weak and are effective only over very short distances in terms of potential energy function, i.e., they are short-range interactions that can be easily broken, but help define the conformational specificity of an interaction. This is particularly true for the hydrogen bond, which has a directional quality related to its strength of interaction.

Recognition by a ligand of its receptor binding site can be envisaged as a result of orientational and translational movement of the ligand within the electric surface potential field of the receptor. A specific interaction is encountered when the orientation of the ligand fits complementary physical properties on the receptor surface.

TABLE Ligand - Receptor Complementarity

Physical Property	Receptor	Ligand	Interaction	Distance relation
shape or volume	convex concave	concave convex	induced dipole	short range (1/r^6)
surface potential	positive negative	negative positive	charge	long range (1/r)
hydrogen bonds	acceptor donor	donor acceptor	dipole	short range (1/r^3)
solubility	apolar polar	apolar polar	induced dipole dipole	short range (1/r^6) short range (1/r^3)

A combination of any of these four physical properties summarized in the table above defines multivariate surfaces. This can obviously lead to complex surface structures or binding motifs, specially for large contact surfaces such as found between proteins, where one protein is the 'ligand' and the other the 'receptor'. Protein-protein interaction is relevant for any enzyme or receptor complex, cytoskeletal structures, or chromatin structures. In the present context, peptide ligands (e.g. neuropeptides, insulin etc.…) are among those agonists providing the largest variability in similarity space. Since protein surfaces are determined by their amino acids residues, binding surfaces can be mathematically described as sequence space. An artificial peptide sequence space for the development of novel antibiotics has recently been achieved using cyclic peptides that self-assemble into peptide nanotubes.

It is the complementarity of these motifs between receptor and ligand that determines the specificity of the interaction. The electrostatic force between ligand and receptor helps define the affinity of the interaction. There are long range and short range interactions. Hydrophobic interactions and hydrogen bonds are short range interactions based on induced-dipole and molecular dipole moments, respectively. Electrostatic interactions are long range meaning that electric fields can be sensed several angstroms away from the point charge. The strength (and effective distance) of these interactions is a function of the dielectric property of the environment. Water molecules are able to shield locale charges and dipoles reducing the range of their electric field forces. In a hydrophobic environment like a cell membrane charges are not shielded by non-polar molecules and can have an effect over distances covering entire proteins.

To find a good (drug-) ligand for a protein or DNA surface (a receptor), one only has to study the structure and function of natural agonists and antagonists, or the surface topology of the binding site on the macromolecule. Using structural information for drug discovery is referred to as rational drug design and makes use of the concepts of chemical similarity and complementarity. Chemical similarity is measured by identifying distances between atoms on a receptor and a ligand. Based on the chemical properties of the interacting atoms (or group of atoms = functional groups) small differences in distance have a great influence on the 'reactivity' of a ligand. Since proteins are fluid like entities (alas highly viscous ones), their structures are very sensitive towards disturbances at their surface. For ligands that are similar but not identical, disturbance results in different molecular properties (antagonist or agonist; inhibitor or activator).

Thus a local conformational change initiated by the agonist, but not antagonist binding results in a destabilization of the protein structure. This destabilization is not strong enough to denature the protein, but results in a long range effect across the protein affecting its active site several angstrom away from the ligand binding site. This is known as an allosteric mechanism. As a rule, agonists induce structure destabilization, while antagonists merely bind , but do not affect the protein structure (or trigger a conformational change that locks a protein in its inactive position). One way to visualize the action of ligands on receptors is to realize that proteins constantly undergo conformational changes which is best described as an equilibrium between an active and inactive, or even among multiple states, including desensitized states (different types of inactive states). Agonists and antagonists shift this equilibrium towards an active or inactive conformation, respectively.

The Pharmacophore

In general, using the surface topology of a group of ligands that all exhibit effector quality (agonist or antagonist) can be overlapped and the contours of all molecules averaged into a union surface. This union surface of a ligand is expected to be complementary to the surface mold of the corresponding binding site on the receptor or enzyme. In complex structures the distribution and combination of physical properties used to search for similarity (complementarity) is large. Modeling structures of ligands in different ways and superimposing different structures with similar affinity exposes the critical fragment or overall similarity of these fragments. The critical fragment of an antagonist/agonist structure is called the pharmacophore. The classical stick and ball model of chemical structures allows to overlap the bond structures and identify these critical segments, often single atoms, in ligands. In many cases ligand receptor interaction is not necessarily mediated by the entire ligand structure, but by ligand points or critical fragments, the pharmacophore. Thus when analyzing existing data of antagonist and agonist structures, it becomes clear why compounds belonging to very different classes of chemicals so often act on the same target proteins.

Select Ligand Systems

Chlorpromazine is an agonist of the dopamine and contains a superimposable fragment.

Neurotoxins saxitoxin (STX) and tetrodotoxin (TTX) block voltage gated sodium channels. A solvent accessible surface area match shows that the dissimilar structure have identical surface topology.

Sigma ligands (steroidal hormone receptor antagonists) show common points or critical fragments, triangle representation of pharmacophore.

Benzodiazepine (GABA antagonist) and beta-carboline fit the same surface mold based on the modeling of the solvent accessible binding site topology, embedded ligand points and hydrophobic core.

Monooxygenase P450 substrate (camphor) and inhibitor (phenyl-imidazol).

Fragments can be designed for enzyme inhibitors that mimic the structure, but are not hydrolysable. An example are the HIV protease inhibitors where commonly employed bioisosters (non hydrolysable) are replacing the functional amide group (hydrolysable).

Dihydrofolate reductase inhibitors

Ligands binding to cell surface receptors are either amphipathic/hydrophilic/charged and often too large to cross the cell membrane. Intracellular receptors are targets for small hydrophobic ligands which easily diffuse across membranes. A well studied model system includes the inhibitors of bacterial dihydrofolate reductase (DHFR). This enzyme catalyzes the reduction of dihydrofolate (DHF) to tetrahydrofolate (THF), an important step in DNA synthesis. Comparing sequence and structure of DHFR from different organisms shows many similarities but also explains why some inhibitors are selective against bacteria, while having little effect on the enzyme of the host organism. Inhibitors of human DHFR are also studied for their effectiveness as anticancer drugs. Again, the need of tumors to synthesize DNA is much greater than for surrounding healthy cells. DHFR inhibitors therefore may act as chemotherapeutic as well as antibacterial agents.

Pharmacophore of serotonin receptor agonists and antagonists

Serotonin (5-hydroxytryptamine or 5-HT) acts on any one of nine known serotonin receptors which play important roles in neuronal signaling in the central and peripheral nervous system. The cytotoxic agents used in cancer chemotherapy provoke the release of 5-HT from enterochromaffin cells in the peripheral vagal afferent fibers of the gastrointestinal tract initiating vomit reflexes (emesis). 5-HT3 receptor specific antagonists block this action and thereby greatly reduce the number of emetic episodes that occur during cancer chemotherapy.

5-HT3 receptor antagonists have been shown to produce beneficial effects in animal models of cognitive and psychiatric disorders. Whether 5-HT3 receptor antagonists may have similar profound effects in the treatment of anxiety, depression or psychosis will be determined by the outcome of ongoing clinical trials. However, it is in the treatment of cancer chemotherapy induced emesis that 5-HT3 receptor antagonists have had their greatest impact. The marked clinical efficacy of 5-HT3 receptor antagonists such as ondansetron, granisetron and tropisetron together with their lack of adverse side effects has greatly improved the treatment of cancer chemotherapy induced emesis.

5-HT3 receptors belong to the family of ligand-gated ion channels. When activated, ions flowing through these channels depolarize the cell membrane triggering action potentials and thus nerve conduction events. 5-HT3 receptor-mediated ion currents evoked by the full agonists 5-hydroxytryptamine (5-HT), quaternary 5-HT (5-HTQ), meta-chlorophenylbiguanide (mCPBG) and the partial agonists dopamine and tryptamine in whole-cell voltage clamp experiments can be used to characterize binding properties of these ligands such as affinity and specificity. Ligand-gated receptors typically switch into an inactive (desensitized) state within seconds of activation. Both serotonin and its synthetic analogues desensitize the 5-HT3 receptor completely with a steep concentration dependence and a potency order of: mCPBG > 5-HTQ >> 5-HT >> tryptamine > dopamine. The time course of recovery from desensitization depends on the agonist used.

... read about electrophysiology experiments on the serotonin receptor in frog oocytes

A quantitative molecular pharmacophore model was derived to predict drug affinities for 5-hydroxytryptamine (5-HT3) receptors. The model was based on the molecular characteristics of a learning set of 40 pharmacological agents that had been analyzed previously in radioligand binding studies. Molecules were analyzed for various structural features, i.e., the presence of a benzenoid ring and nitrogen atom, substitutions on the benzenoid ring, the location of the substitutions on the nitrogen, and the molecular characteristics of the most direct pathway from the benzenoid ring to the nitrogen. Weighting factors, based on published 5-HT3 receptor affinity data, were then assigned to each of 10 molecular characteristics.

The following nine rules have been established for the 5HT-3A receptor pharmacophore structure (from Schmidt et al., 1989, Molecular Pharmacology 36:505-511):

1.	Contains aromatic ring structure (lower half of molecule; consistence with the hypothesis of Lloyd and Andrews which states that all central nervous system active drugs contain an aromatic ring; J.Med. Chem, 1986, 29:453-1093).
2.	A tropane ring embedded nitrogen is present (see upper half of molecule shown above) and located at a nearest distance from the aromatic ring is not more than seven atoms from the aromatic ring
3.	When aligning the tropane ring nitrogen in the same plane as the aromatic ring (torsion angle flexibility) the distance between the nitrogen and the aromatic ring center is 6.0 to 7.8Å.
4.	Chemical substitutions of no more than 3 atoms are allowed at the nitrogen
5.	The tropane ring structure itself does not tolerate substitutions larger than methyl groups (-CH3); larger groups significantly reduce affinity
6.	The linker structure between aromatic ring and ring nitrogen contains steric similarities that reduces flexibility (carbonyls or C=O bonds)
7.	The first and second atom from the aromatic ring in the linker is never a tetrahedral carbon (no torsion angle flexibility; see point 6)
8.	The third atom from the aromatic ring may be a tetrahedral carbon
9.	Substitutions on the aromatic ring must be able to adopt a co-planar conformation

A naturally occurring drug, atropine, demonstrates how a 'small' but significant violation of the 'rules' in its structure explains its very low affinity for the serotonin receptor 3A subtype. The first atom from the aromatic ring is a tetragonal carbon and thus allows torsion angle flexibility (see rules 6 and 7) explaining that the structure can deviate from being co-planar with the aromatic ring. These two atoms are the main difference as compared to the highly specific antagonist ICS 205-930. The molecular weight and chemical formula are nearly identical although the affinity of atropine for the 5HT-3A receptor is x1000 lower than that for ICS 205 930. Atropine is a natural antagonist of cholinergic receptors (acetylcholine receptors) and is used as an antidote to nerve gas (sarine) which inhibits acetylcholine esterase. The toxic effect of prolonged acetylcholine stimulation is thus reversed by blocking the acetylcholine receptor.

The case of atropine and the 5HT-3A pharmacophore structure requirements demonstrate the influence of molecular motion on binding. The flexibility of the atropine molecule between its aromatic and tropane ring structures essentially reduces the chance in atropine of superimposing the tropane ring and aromatic ring, a gross-structure requirement for 5HT-3A antagonist binding. Further structure-activity relationship studies (SARS) show that all known receptor antagonists exhibit at least one degree of freedom. This hinders potential screening of ligand-receptor topology which is best achieved with a rigid molecule as template for rational drug design. This importance of molecular motion in ligands (and receptors) is further demonstrated in thermodynamic analysis of drug binding.

Water as Solvent

Dipole structure of water. Water is composed of one oxygen and two hydrogen atoms forming two O-H single bonds of 0.95 angstrom (Å) in length and a bond angle of 104.5° between them. Based on the asymmetric distribution of electrons in this triatomic molecule, with the electrons attracted to the oxygen nucleus, the water molecule exhibits a molecular dipole moment of 1.84 Debye. A dipole moment m is defined by two point charges q separated by a distance r; m = qr [Cm]. The value of the dipole moment depends on the difference of the electronegativity of atoms sharing a covalent bond structure. The electronegativity series of biologically important atoms (with increasing affinity for electrons) is: H < C << N < O.

Dielectric constant: Molecular dipoles experience either an attractive or repulsive force and react to external electric fields. This property is known as polarizability of the medium and expressed as dielectric constant D (or e) of the solvent. The dielectric constant determines the polarity of a solvent and thus the solubility of molecules. Polarizable solvents (solutes) are polar or hydrophilic (liking water; water is a polar solvent), while non-polarizable solvents (solutes) are non-polar or hydrophobic. As a general rule, hydrophilic solvents mix well with hydrophilic solvents (solutes), and hydrophobic solvents with hydrophobic solvents (solutes).

Hydrogen bond: The dominant electrostatic interaction in water, based on its permanent molecular dipole moment, is the hydrogen bond (H-bond). The hydrogen bond is stronger than an induced dipole-dipole interaction. The latter is known as Van der Waals interaction, a small electrostatic attraction. The hydrogen bond, however, is weaker than a covalent bond. The relative strength of these three types of bonds can be directly assessed by comparing the length of each bond; O-H covalent bond = 0.96 Å (strong), O-H hydrogen bond = 1.8 Å (medium), and O-H Van der Waals bond = 2.6 Å (weak). Based on the tetrahedral bond architecture and the orientation of two unpaired electron pairs on the oxygen atom, water molecules can form as many as four (4) hydrogen bonds with each other. This maximal extend of hydrogen bonds, or saturated hydrogen bond network, is achieved in water's solid state - ice crystals. Liquid water has an average of 2.3 hydrogen bonds per molecule. The system is highly dynamic, the lifetime of an hydrogen bond is very short, and as a consequence there is no discernible structure in liquid water. Hydrogen bonds can also be formed by amine groups containing N-H single bonds or carbonyl bonds (C=O). The ability of water molecules to form hydrogen bonds with themselves and biological (macro-) molecules is the single most important parameter to understand structure, function, and regulation of enzymes, genes, and biological membranes.

Ions and mobile charges: Water is rarely a pure solvent, but contains a multitude of salts, which all exist in the form of dissociated, charged ions. Table salt NaCl for example, quickly and spontaneously decays into Na+ and Cl- ions. This process is driven by a change in heat capacity of the system, an enthalpic reaction, and at the molecular level is stabilized by the formation of hydration shells. Hydration shells are semi-stable structures of water molecules that interact with their dipole moment to the central point charge more strongly than they do with themselves. The positive and negative point charges function as external field with the electric field of the dipole moments reorienting against the charge field to minimize the free energy of the system. The strength of electrostatic interactions between ion pairs like NaCl is described by Coulomb's law, which says that the force holding two equal, but opposing charged ions together is a function of the charge itself, the inverse square distance between the charges and the dielectric constant of the medium. The kinetic energy prevents the durable formation of hydration shells. The ease of solubilization depends on the polarizability of the solvent molecule, a parameter that is a function of the dipole moment as well as the mass and rotational lateral mobility of the molecules.

Hydrophobic effect: Many biologically relevant molecules are partially hydrophobic, meaning that they are not easily soluble in water, because they lack the ability to form strong electrostatic interactions or hydrogen bonds with the solvent. Hydrophobic interactions are typically based on Van der Waals forces (induced dipoles). Their inability to form energetically favorable interactions with water molecules (hydrogen bonds) induces phase separation. Water molecules preferentially interact with each other through hydrogen bonds. Since hydrophobic molecules must form contact with water molecules, but can do so only through Van der Waals forces (weak forces), every water-solute interaction is thermodynamically less stable than corresponding hydrogen bonds (strong forces) among water molecules themselves. The reduced number of potential hydrogen bonds found on hydrophobic surfaces reduces the degree of freedom of water molecules at these interfaces. Rotational and lateral movements are restricted and stable water structures are formed at interface boundaries. Reducing the total area of the hydrophobic surface is energetically favorable. Such a reduction is achieved by clustering hydrophobic solutes into large aggregates. The large number of water molecules no longer needed to form less favorable Van der Waals bonds with the hydrophobic solutes, increases the entropy of the system. The entropy of the liquid water phase (rotational, vibrational, translational degrees of freedom) dominates the thermodynamics of the system. The increase in entropy of the water phase is much larger than the loss in entropy of the aggregated (structured) hydrophobic particles. This entropy driven aggregation of hydrophobic molecules in aqueous solutions is called the hydrophobic effect. It is the major stabilizing force in biological systems determining such wide ranging processes as protein folding, ligand binding, and cell membrane formation.

Binding Energetics

For the search of new and effective agonists and antagonists, computer modeling has become an invaluable tool because powerful processors readily calculate the properties necessary to define a chemical similarity space. Not only can they be used to design new structures, or modify known structures of agonists/antagonists, they are also useful to screen existing compound libraries for structural and chemical similarity. As it turns out, molecular modeling tools are better at simulating specificity (conformation) than affinity (energy) of interactions. There is a simple reason for it and it has to do with the solvent. Molecular modeling (static or dynamic) is usually performed 'in vacuum' drastically reducing the number of calculations by excluding solvent-solvent and solvent-solute interaction. Missing from theoretical analysis of molecular interaction is the role that surface bound water molecules play during the formation of a ligand-receptor complex. The role of these water molecules helps explain experimentally observed affinities for naturally occurring ligand-receptor systems that can not be explained by analyzing the non-covalent interaction between ligand and receptor surface in the complex alone. The challenge here is to understand the binding energy components enthalpy and entropy during complex formation for both ligand-receptor binding and solvent displacement.

R + L = RL

Ka = [RL]/[R][L]

Both ligand (L) and receptor (R) come in hydrated form and ligand-receptor (RL) complex formation requires replacement of surface bound water molecules from both the ligand and receptor binding site (partial dehydration). Often, surface bound water is more structured than liquid bulk water. Thus, the release of many water molecules upon complex formation into the bulk phase increases the entropy of the entire system (protein-ligand solution). Such an entropy driven process is well described for hydrophobic and amphipathic solutes and is known as the hydrophobic effect.

The Enthalpy-Entropy Compensation Challenge

While it has been found that there is little correlation between the change in Gibbs free energy (DG) of binding and the change in solvent accessible surface area (Bogan and Thorn, 1998, J.Mol.Biol. 280:1-9), experimental observation show that despite the overall small change in Gibbs free energy of binding, both its enthalpic and entropic component can be large, yet in opposing direction. Unfavorable enthalpic components of dehydration and ligand-receptor binding can be offset by favorable entropic components stabilizing the ligand-receptor complex.

DG = DH - TDS = - RTlnKa

Generally, increased bonding in a bimolecular interaction will produce a more negative enthalpy change, DH, but this will come at the expense of increased order associated with a more negative entropy change, DS<0. The inverse relationship observed between enthalpy and entropy changes in binding interaction is known as the enthalpy-entropy compensation. Overall, favorable entropy terms of partial dehydration of ligand and receptor binding site offset the unfavorable entropic term of the more ordered ligand-receptor complex. This generally explains how living organism can form and maintain ordered structures - create order out of chaos - at the expense of environmental energy.

In terms of rational drug design the enthalpy-entropy compensation is a difficult challenge that must be overcome to significantly improve the prediction of binding affinity of novel drugs to novel targets (for an excellent review see Holdgate, G. A., 2001, BioTechniques 31:164-186). Nevertheless, successful design of drugs on enzymes with deep binding pockets occluding bulk water (i.e. ordered water structure in binding pocket, a favorable condition for high affinity binding) has been achieved. Examples of drugs are Nelfinavir (Agouron Pharmaceutical's Viracept or AG-1343, an inhibitor of HIV-1 protease) and Ro 46-6240 (Hoffman-LaRoche's inhibitor of thrombin). Non-peptidic, small-molecule mimics as inhibitors of protein-protein interaction have proven more difficult to design. Much of the solvent occlusion of peptide inhibitors is provided by the main-chain and Cb atoms (amino acid side chain carbon) adjacent to binding hot spots, which explains why side-chain modifications heave little effect on affinity.

Molecular crowding

The equilibrium constant Ka used in thermodynamic analysis is a ratio of product over substrate concentration. However, this is an approximation valid only in ideal solutions that do not assume molecular interactions (sic!) which is equivalent to extrapolating experimental measurements to zero concentration (for review see R.J. Ellis, Trends in Biochem. Sci., 2001, 26:597-604). Concentrations of proteins in cells, however, can be as high as 300 to 400 mg/ml. The correct solution to thermodynamic equilibrium therefore uses a correction called the activity coefficient g. Multiplying the concentration c with the activity coefficient to correct for real size molecules with real interactions is given as effective concentration or thermodynamic activity, a = gc. For example, the association constant for protein dimerization in bacterial cytoplasm is 8 to 40 fold increased as compared to an ideal solution, while the association constant, i.e., the affinity, for a tetrameric protein is increased 10^3 to 10^5 fold.

Molecular crowding favors association, protein folding, and ligand-receptor (or substrate enzyme) formation. Binding, however, is also a function of diffusion in solution. Thus, small molecules have favorable diffusion rates even in crowded solutions, while macromolecules experience a drastic drop in diffusion (consider the exclusion volume as in a gel matrix) reducing the positive effect of crowding on affinity.

The Structural Properties of Biological Macromolecules

Three major types of macromolecules are found in biological systems: proteins, nucleic acids, and carbohydrates (polysaccharides). All play important roles in the physiology and structure of organisms. Catalytic and regulatory functions are mainly performed by proteins, although some ribozymes, RNA based catalytic units act as enzymes. All three types can function as receptors. Examples are cell surface receptors (proteins) as hormone or neurotransmitter receptors, transcription factors as regulatory elements of gene expression, or glycolipids and glycoproteins as cell surface matrix that is usually cell type or organism specific (pathogenic microorganisms).

Proteins

The role of proteins in cells is three fold; catalyzing chemical reactions (enzymes); promoting structural stability and mobility (structural proteins and molecular motors), transport of molecules and signal events across biological membranes and filamentous protein structures (e.g. cytoskeleton). Most drug targets are proteins because of their functional importance.

Protein structure: Proteins are linear polymers of amino acids. There are 20 different amino acids based on their side chain chemical and physical properties. Besides the side chain, every amino acid contains an amino group (NH2) and carboxyl group (COOH) and a hydrogen atom linked through the central alpha carbon (Ca). In a protein, the acid-base property of amino acids is not important except for its N- and C-terminal ends, which are always charged at physiological pH values (pH = 7 to 7.5). In the linear polymer, the amino and carboxyl group are covalently linked to form a peptide bond. Every amino acid residue (except the terminal units) lies at the center of two peptide bond structures (amide planes) linked by two single covalent bonds. These covalent bonds have rotational flexibility (degree of freedom) and are called torsion angles. The amino acid sequence of a protein is referred to as primary structure (1°D) and largely determines the three dimensional structure (tertiary structure or 3°D) of a protein. The tertiary structure contains repetitive elements dubbed secondary structures (2°D). This secondary structures are recurrent elements in proteins and can be classified according to the particular polypeptide backbone fold and measured by their torsion angle values. The two most widely found secondary structures are the right handed alpha helix (a-helix) and the beta strand (b-strand). Most active proteins are found in complex with other proteins. The structure of these multi-subunit protein complexes is referred to as quaternary structure (4°D). Protein complexes give the cell an extraordinary functional variability and control over catalytic processes. The complexity of living organism is achieved not only by the number of different proteins or molecules in general, but by their use as small multi-subunit complexes. Thus the expression of one gene is in many ways dependent on the expression of other genes.

Structure Determination and Visualization of Proteins

Structure determination techniques: To obtain information about the structure of molecules at a resolution where the position of individual atoms can be seen, two techniques are currently used: X-ray diffraction from protein or DNA crystals, or nuclear magnetic resonance (NMR) of proteins or nucleic acid in solution. Although both techniques are used to obtain the same high resolution structure information, they measure completely different physical properties of molecules. X-ray diffraction yields information about the electron distribution in a crystal lattice, while NMR measures the magnetic spin resonance of selected isotopes ([1]H, [13]C, [15]N). High resolution structures contain detailed information of atoms separated by 1.5 to 3 Å, roughly the length of a hydrogen bond, or twice the length of a covalent bond. NMR: Because of its wide distribution in biological macromolecules, proton-NMR ([1]H-NMR) is used to determine the atomic neighborhood of protons in macromolecules. Currently, NMR solution structures are limited to molecules with a molecular weight smaller than 20 to 25 kDa (for a protein this is about 200 amino acids). NMR is a technique of choice, if the dynamical aspect of structures have to be determined because the time scale of obtaining the data is very short and lies in the order of milliseconds. X-ray crystallography: X-ray diffraction measures the electron distribution in crystals that contain millions of units in an ordered structure (crystal lattice and unit cell). The regularity of the crystal lattice determines the level of resolution (the more ordered , the higher the resolution). The sampling of the diffraction data requires much longer times than NMR measurements. Crystal structures are therefore also known as frozen structures. Highly flexible protein domains (disordered regions) often create a 'blank stretch' in the elucidated structure.

The many ways to look at a protein structure: There exists no unique way to represent the structure of a protein. The 'structure' we choose to describe a molecule depends on the quality or property of the molecule to be studied. For the same reasons that we have physical, chemical, or biological sciences, we have reasons to focus on different structural aspects of proteins. To mention a few, we can choose the simple space-filled model representing the 'shape' or form (Van der Waals surface) of a protein including every atom in the structure, or we can strip the protein to the reduced information of its polybackbone conformation, the most often used representation of protein structures. We can mark regions of different secondary structures by symbols (cartoon representation) to quickly allow an overview of motifs and domain organization. To obtain an understanding of the structure-function relationship of a protein, we need to include the physical and chemical properties of its surface and represent the protein in a way that 'makes sense from the point of view of an other molecular entity', in other words its electrostatic behavior. This includes clusters of fixed charges, distribution of permanent and induced dipoles, molecular orbitals of aromatic amino acids, the distribution of polar and non-polar amino acids, and the flexibility of surface structures. Their are four major representation of molecular surfaces: the Van der Waals surface, surface potentials, the solvent accessible surface, and the union surface.

Internet Information on protein structures:	Protein Data Bank PDB
	Visualization Software Protein Explorer, Rasmol, and Chime (browser plug in)

Nucleic acids

DNA as carrier of genetic information may be a target for drug interaction because of the ability to interfere with transcription (gene expression preceding protein synthesis) and DNA replication, a major process in cell growth and division (see DHFR inhibitors above). DNA replication is central to tumorigenesis and pathogenesis. Nucleic acids are not commonly used as drug targets except antisense drugs (RNA) and antimicrobial or anticancer drugs that readily damage DNA strands or prevent regulatory proteins from binding.

DNA structure: DNA is a linear polymer made of four different types of nucleotides. Nucleotides are complex structures of a cyclic aromatic base, a ribose sugar unit, and one, two, or three phosphate groups. They are named after their different bases, the variable components of nucleic acids, which come in two basic versions - single ring forms called pyrimidines, and double ring forms called purines. The pyrimidines include cytosine (C) and thymine (T; uracil (U) in RNA), the purines include adenine (A) and guanine (G). The stable form of DNA is a dimer and its tertiary structure the right handed double helix, called B-DNA, or Watson&Crick double helix, named after their co-discoverers. The stability of the B-DNA is provided by the base stacking of flat aromatic ring structures, as well as the hydrogen bonding in base pairs. In B-DNA only two base pair combinations are found - AT pairs with two hydrogen bonds and GC pairs with three hydrogen bonds. The number of hydrogen bonds and thus the thermodynamic stability of a DNA double helix is directly related to its GC content, i.e., the percentage of GC pairs in DNA. Chromosomal regions with high GC content correlate with the presence of functional genes.
There are three principally different ways of drug-binding. First, through control of transcription factors and polymerases. Here, the drugs interact with the proteins that bind to DNA. Second, through RNA binding to DNA double helices to form nucleic acid triple helical structures or RNA hybridization (sequence specific binding) to exposed DNA single strand regions forming DNA-RNA hybrids that may interfere with transcriptional activity. Third, small aromatic ligand molecules that bind to DNA double helical structures by (i) intercalating between stacked base pairs thereby distorting the DNA backbone conformation and interfering with DNA-protein interaction or (ii) the minor groove binders. The latter cause little distortion of the DNA backbone. Both work through non covalent interaction.

Carbohydrates

Although carbohydrates or polysaccharides play a major role in cell surface recognition, they are not commonly drug targets because they have no enzymatic function but serve as structural components between cells of multicelllar organisms and pathogenic organisms and their hosts. As is the case for nucleic acids, antimicrobial activity of potential novel drugs may include those interfering with binding of pathogenic microorganisms to host cell surfaces. Both host and microorganism have carbohydrate coated surfaces. The potential of polysaccharide targets for novel drugs is supported by the observation that both microbial DNA and polysaccharides are immunogenic. Thus novel drugs may be developed that mimic an immune response to develop vaccines or produce competitive inhibitors that can interfere with binding.

PART II

Recognition of Proteins

Molecular Motion and Protein Stability

Proteins would not function if their structures were not flexible. The flexibility originates from thermal motion of its atoms, the result of their kinetic energy. In living cells and organisms, macromolecular structures are not rigid entities, but resemble highly viscous fluids. As a result, protein activity is temperature sensitive, with too low or too high temperatures causing inactivation. While low temperatures inactivate proteins because their structure gets frozen or adopts a crystalline state, elevated temperatures cause proteins to unfold or denature. Both conditions compromise the structural integrity of active sites and binding sites and thus reduce activity.

Molecular dynamics: Protein flexibility is extremely difficult to study experimentally. The time scale of molecular motions ranges from femto- (10^-15) to microseconds (10^-6), with 'longer' time scales correlating with larger structures and longer distances involved (e.g. protein folding). Modern day computational power enables simulations of thermodynamic flexibility of atoms of larger and larger molecular structures. This computer based simulations are known as molecular dynamics simulations.

Gramicidin peptide molecular dynamic simulation

Protein folding: Proteins are synthesized in cells in a linear fashion (on ribosomes) and have to fold into a native, active conformation (tertiary or quaternary structure). This conformation is largely determined by the amino acid sequence and particularly by the distribution pattern of hydrophilic and hydrophobic amino acid residues. As a general rule, hydrophobic residues are buried inside the protein core during the folding process, driven by the hydrophobic effect. The folding process is temperature sensitive and promoted by the molecular crowding conditions inside cells.

Induced fit binding: It has long been established that enzymatic reactions undergo a series of steps called reaction step intermediates. An early intermediate in every reaction is the stabilization of the enzyme-substrate complex, called transition state. The transition state mechanistically explains the ability of enzymes to lower the activation energy of a reaction, thus greatly increasing its catalytic rate. For non-enzymatic reactions, i.e., ligand binding events on receptors, an analogous enhanced interaction between protein and ligand is found. Here, the initial binding event induces a small conformational change, which increases the molecular closeness between protein surface and ligand, and thus strengthens the interaction. This mechanism is called the induced fit model of ligand binding. At low temperature, i.e. rigid protein structures, ligands loose their affinity for the receptor.

Protein folding can be extended to the study of functional changes in protein structures that brings a protein from an inactive to an active state (see also allosteric regulation below). Sometimes these conformational changes can be substantial as is the case for the calcium sensing protein calmodulin (troponin C in muscle). Calmodulin undergoes a reorientation of its two domains upon binding of four calcium ions that results in the exposure of a hydrophobic cavity which allows calmodulin to bind hydrophobic target peptides. These target peptides are surface loops on proteins that are inactivated by these loops (self-inhibition). Calmodulin binding releases the inhibition thus activating the targeted protein/enzyme, often kinases and the calcium pump, a plasma membrane protein responsible for the excretion of cytoplasmic calcium after a physiological signaling event occurred.

Self-assembly Systems and Protein Complex Formation

Structural proteins and molecular motors are typically found as large protein complexes and the conformation of the supra-molecular structure is a function of the strength of interaction between protein surfaces. A typical example is the cell-cell adhesion mediated by protein interaction. The neuronal cell adhesion molecules (NCAM) mediate cell contact in a supramolecular complex called cadherin zippers. The protein-protein interaction is mediated by immunoglobulin like domains and depends on calcium as stabilizing co-factor. These protein-protein interactions are sensitive to environmental conditions like salt concentration, pH, hydrophobicity, temperature or pressure. Molecular motors are protein complexes undergoing controlled changes in their supra-molecular (quaternary) organization (rotation, lateral movement, contraction) due to local changes in cellular electrochemical conditions.

Membrane proteins mediate substrate transport or signaling across cell membranes. Transport is mediated by ion channels, transporters, and pumps. The latter are distinguished kinetically (transport rate) from ion channels which promote a fast, diffusion controlled flux, while pumps control an energy dependent 'uphill' transport. Pumps regenerate the chemical potential stored by biological membranes and dissipated by ion channels. In the process of dissipation chemical energy is converted into chemical (ATP synthase) or mechanical energy (molecular motors). Although metabolic and membrane transport processes occur under non-equilibrium conditions, they are studied experimentally at chemical equilibrium. The first step in a catalysis, complex formation, or transport process is a binding event, which is quantified by its equilibrium constant known as dissociation constant, KD, and measures the affinity of a substrate for an enzyme, a ligand for a receptor, a permeant for a transporter.

Membrane proteins cannot be understood without an understanding of the structure and function of biological membranes, also known as phospholipid bilayers. Membranes are an example of complex self-assembly systems. Complexity and self-assembly have become important paradigms in modern biology and thus discussed here in some more detail. The complexity of cellular structures is obtained by arranging molecular components in regular, repetitive arrays. The determining factors of this assembly is the hydrophobic effect, or more generally, phase separation behavior. The surface structure and shape of the unit molecule defines the overall architecture of the supramolecular structure; its size, shape, and number of units. Unlike macromolecules, which are true polymers and linked by covalent bond formation among units, supra-molecular structures are stabilized by non-covalent bonds.

Examples are manifold. A very important and intriguing biological supra-molecular structure is the cell membrane - a double layer (bilayer) of phospholipids. Phospholipids, the building blocks of cell membranes, are amphipathic molecules, i.e., they are not entirely hydrophobic. They contain a hydrophilic and/or charged headgroup linked to two fatty acid tails. Membranes are stable in water because the hydrophobic fatty acids are protected inside the membrane bilayer by the hydrophilic surface of the tightly packed headgroups. Cell membranes are perfectly water soluble, and they provide a hydrophobic barrier for small polar/charged molecules. Cell membranes allow compartmentalization of cellular processes. The hydrophobic barrier, which is essentially an electrical insulator (capacitor) is regulated by membrane proteins that promote transport processes; ion channels for passive diffusion of small ionic species, facilitators and transporters for specific, passive transport, which may be coupled to symport or antiport of a second molecular species (flux coupling), and finally pumps, active transporters that utilize chemical energy in form of ATP hydrolysis or light (photosynthesis). Thus biological membranes not only form specialized cellular compartments for various metabolic purposes, but also function as storage devices for electrochemical potentials (ion gradients).

...read about how to study ion channels in synthetic membranes

Other examples of biological self-assembly complexes include ribosomes and chromosomes, large multi-subunit particles of proteins and nucleic acids, the cytoskeletal fibers - microfilaments made of actin and microtubules made of tubulin. These elongated fibers have two functions. They determine the shape of mammalian cells and they are dynamic systems providing a way of intracellular transport by means of subunit shuffling between fiber ends. Cells are not a homogeneous solution of molecules, but highly organized compartments. These properties are particularly apparent during embryogenesis, where cells or cell ensembles gain precise polarity, a functional asymmetric distribution of cellular components necessary for proper cell growth and differentiation.

The self-assembly properties of small, amphipathic molecules is utilized to design novel, supra-molecular structures with defined functional properties. The goal is to produce molecular scale structures, molecular motors, fibers, conducting elements etc. in the nanometer range. The technology of producing these tiny assemblies is commonly referred to as nanotechnology. Nanotechnology is an attempt to control the crystallization process of small molecules of varying size, shape and solubility properties. The formation of cylindrical structures called nanotubes form microscopic channels and molecular sieves controlling transport processes across biological membranes. These nanotubes may provide useful for the design of drug delivery devices. Their specificity for what is transported and across membranes of which cell types could be used to deliver molecules to tumor cells only and not healthy tissue.

Internet Information:

The fluid-mosaic membrane model as a scientific fact and history of cell membrane research.

Allosteric Properties of Proteins

Allosteric properties are the result of conformation changes induced by molecular interactions with macromolecules by both small ligands and protein-protein interaction. The conformational changes induced by binding are the essence of regulating the activity of proteins by shifting these macromolecules between functional and non-functional states.

Allostery and cooperativity. Often protein (complexes) contain more than one binding site for more than one type of ligand or substrate. Proteins have the ability to coordinate what is going on at those different binding sites in such a way that the binding of one ligand alters the affinity for an other ligand on the same protein (-complex). This is an allosteric mechanism. The 'interacting' ligands are not identical. For identical ligands, the allosteric mechanism is called a cooperative effect. Examples of allosteric mechanisms are the ligand binding induced changes in conformation of cell surface receptors (signal transduction) or ion channels (action potential), or the interaction of ligand binding induced dimerization of transcription factors (nuclear receptors). The latter are nuclear proteins, which undergo a change in DNA binding affinity as a function of dimer formation. The DNA binding event activates or suppresses gene expression or replication activity. A well known cooperative effect is the binding induced increase in oxygen affinity on the four identical binding sites of hemoglobin. While completely devoid of ligand, the affinity for molecular oxygen is very low. Hemoglobin, which is a tetrameric protein complex with four identical heme binding sites, undergoes a substantial conformational change after the interaction of one molecule of O2. The conformational change drastically increases the oxygen affinity for the remaining three binding sites, a positive cooperativity between the four binding sites.

Receptor Signaling

The scope of drug targets is as large and broad as the proteins and nucleic acids found in cellular metabolism. There are, however, preferred targets and they are mostly located at the extra cellular surface of cells. Similar to hormones, neurotransmitters, growth factors and natural toxins, many drugs bind to membrane proteins that belong to two major classes - G-protein coupled receptors (GPCR; metabotropic) and ion channels (ionotropic). The latter are sensitive to local anesthetics, which directly bind to ion channel proteins, while both iono- and metabotropic receptors are sensitive to general anesthetics, which are believed to function through modification of the physical properties of cell membranes. A special group of membrane interacting antibiotics are pore forming peptides like alamethicin, gramicidin A, and mellittin. They kill cells by perforating the electrochemical gradients of membranes and depleting their energy storage.

An estimated 30% of all currently approved drugs bind to G-protein coupled receptors or GPCRs. Although no high resolution structure for this class of receptors is yet available, structural models based on their homology to bacteriorhodopsin, a bacterial proton pump, and bovine rhodopsin, a light sensitive G-protein coupled protein were used for rational drug design for GPCR ligands. GPCRs are have a simple generic structure with a large N-terminal domain facing the extracellular side of the cell membrane followed by seven transmembrane spanning (TM) domains (alpha helices) and a C-terminal, cytoplasmic domain of variable length. The C-terminal domain and the loop connecting TM5 and 6 interact with G-proteins which are activated by GTP binding and receptor-ligand interaction in what is known as a ternary complex. P2Y, a purine receptor, has recently been modeled for its structure-activity relationship of ligand docking. Both the receptor binding site and the ligand pharmacophore have been characterized. P2Y1-ATP complex models describe three binding sites: a meta-binding site I, meta-binding site II, and the principal transmembrane spanning segment (TM) binding site. The meta binding sites are provided by extracellular loop structures. A thermodynamic modeling of binding energy indicates that meta-binding site I is almost as strong as the principle TM binding site. This ligand-receptor docking model does neither include G-protein binding nor molecular motion. In fact, both agonist and antagonist binding are modeled to the same receptor structure.

Recent studies confirm an increasing complexity in receptor activation by dimerization which can lead to changes in ligand specificity and affinity as compared to monomeric receptors. Ligands belong to a variety of chemical structures including small amino acid derivatives (e.g. dopamine) to larger peptides (e.g. opiates). Over 90% of ligands belong to the peptide class.

The following receptor-assisted G-protein activation cycle has been proposed as a two step model. First, the active receptor ternary complex consists of a receptor with a GDP bound G-protein denoted HRG(GDP). GDP interaction with the G-protein is the weakest interaction leading to GDP dissociation. Second, GTP replaces the diphosphonucleotide on the ligand-receptor-G-protein complex. The HRG(GTP) ternary complex is the transition state of the active cycle (consider the ligand-receptor complex as an enzyme that catalyzes GTP loading on G-proteins) and G(GTP) dissociates from the ligand-receptor complex leaving behind an HR complex that can bind a new G(GDP) unit and stimulating nucleotide exchange, while the newly released G(GTP) functions as an effector module on kinases, lipases, ion channels, or adenylyl cyclase. The GPCR system essentially recharges inactive GTPases by accelerating replacement of non-hydrolysable GDP with hydrolysable GTP on Ga subunits.

Ligand-gated Ion Channels

Unlike the G-protein coupled receptors, ligand gated ion channels open an ion selective pore allowing the flow of ions in or out of the cell, depending on the actual membrane potential and ion gradient. These channels serve as receptors for neurotransmitters like glutamate, GABA, glycine, serotonin, histamine and acetylcholine. The receptor for the latter, the nicotinic acetylcholine receptor (nAChR) has been studied pharmacologically, electrophysiologically, and biochemically since the late 1960s. The kinetics of channel activation and inactivation are well understood and have served as one of the model systems to study allosteric regulation. In this channel, two acetylcholine units bind to each of the two alpha subunits causing the opening of a gate within the membrane spanning portion of the receptor.

The ligand binding sites and channel gate are about 25 Angstrom separated and the recent high resolution structural analysis has helped understand the mechanism of this gating or allosteric mechanism (Miyazawa et al., Structure and gating mechanism of the acetylcholine receptor pore, 2003,Nature 423, 949 - 955). The gating mechanism of this receptor complex is a nice demonstration of internal structural changes (conformational changes) in response to ligand binding. Here, acetylcholine binding changes hydrogen bond networks within the alpha subunits relaxing some conformational stress within the binding site. Upon breaking internal hydrogen bonds, the alpha subunit can undergo a conformational relaxation which can affect subunit contact sites tens of Angstroms apart, i.e., the membrane embedded gate.

The fact that the channel is a pentameric protein complex underlines the observation that protein-protein interaction within enzymatic complexes allow fine tuning and precise control over the activity pattern of proteins. Observation from the nAChR also show that protein complexes spontaneously switch between conformational states - active and inactive - even in the absence of ligand. Thus, the ability of proteins to occasionally adopt active structures even in the absence of the agonist (positive stimulus) corroborates the idea that proteins are essentially fluid entities and that ligand binding and covalent modifications (e.g. phosphorylation and other post-translational modifications) simply stabilize proteins in one of at least two thermodynamically stable conformations.

The change between an active and inactive state can thus be described by a chemical equilibrium that switches between a state T (tense) and state R (relaxed). Both states are equally stable (under the right circumstances) and this similarity in stability can easily be explained by the number of subunit contact sites in each state. The transition from one to the other state requires the breaking of some of these contact sites, but an equal number of similar non-covalent bonds are formed 'trapping' the protein complex in either one of two conformations.

Enzymes and Their Inhibitors

Enzymes catalyze the conversion of substrate(s) into product(s). This process can be measured kinetically, how fast a product is formed, and thermodynamically, in which direction the catalysis proceeds. All metabolic reactions are reversible and are defined by their chemical equilibrium where the net formation of a product is zero. Enzyme catalyzed reactions in vivo are usually not at their chemical equilibrium. They have a preferred direction determined by substrate availability as well as being coupled to large energy releasing reactions (exergonic reactions). The latter makes a catalytic reaction de facto irreversible, a common feature of metabolic pathways. Although enzymes are often involved in the chemical reaction mechanism (covalent bond formation between substrate and enzyme), they are not chemically modified at the end of the process. Enzyme catalyzed rates are several orders of magnitude higher than the corresponding spontaneous reaction in aqueous solution.

Proteases are enzymes that cleave, or cut, or degrade other proteins by hydrolyzing peptide bonds. This may sound like an uninteresting topic, but protein degradation plays a major role in cellular processes. Proteases are involved in cellular control mechanisms like the removal of old and unused proteins, an essential part of the turnover pathway of all proteins in the cells and affects cell growth, degradation of proteins and peptides for nutritional purposes, defense mechanisms against intrusive proteins and peptides, or control of protein activity.

Being enzymes, proteases can be characterized by their substrate affinity and the catalytic rate of the reaction. Using proteases to study the effects of single amino acid substitutions (mutations) on catalytic rate and substrate affinity demonstrated that these two properties are linked and that this linkage can be explained by analyzing the conformation of the catalytic or active site of the enzyme. This analysis showed that four major functional groups are found in the catalytic site of proteases. Chymotrypsin, trypsin, and elastase are three members of the family of chymotrypsin proteases. They all are serine proteases because the amino acid residue at the catalytic site responsible for the transition state stabilization is a serine. The active site of a serine protease can be divided into four essential structural features required for the catalytic action of serine proteases:

TABLE Active site structure of an protease

Structural element of active site	Function
The main chain substrate binding	non-specified binding of polypeptide segment
Specificity pocket	Specific binding of side chains, sequence specificity
Oxyanion hole	Stabilizes transition state S* over S in enzyme
Catalytic triad (Asp-His-Ser)	Forms tetrahedral intermediate (transition state; stabilizes S* over S); hydrolyzes peptide bond

Proteases have preferential cleavage sites in the sequence of a protein substrate. The specificity pocket provides a small binding pocket consisting of 3 amino acid residues that determine the local polarity and electrostatic potential profile for the interaction of residue n-1 on the substrate on the N-terminal side of the scissile bond. For the chymotrypsin family of serine proteases we find the following sequence specificities: chymotrypsin binds bulky, aromatic residues, trypsin binds positively charged residues, and the extra cellular matrix protease binds small, non-charged amino acid residues.

The 3-D folds of these three proteases are very similar, although there sequences are not identical, although they are evolutionarily related. Another serine protease family, subtilisin family of serine proteases, are products of bacilli species. Their overall native fold is very different from that of the chymotrypsin type proteases, but the catalytic dryad conformation is identical. The 3-D fold of subtilisin has an a/b motif (instead of the b-barrel motif of chymotrypsin domains) with five parallel b-strands surrounded by 4 a-helices. The comparison of the two families of serine proteases tells us two different things. First, it has been reasoned to be an example of convergent evolution, where the formation of a catalytic site has evolved twice, with each serine protease family exhibiting a different overall 3-D structure. Second, the differences in the 3-D structure gives us an idea of the different cellular locations of the corresponding protein families: the catalytic site of an enzyme is conserved over evolutionary time, while the overall structure is conserved to provide structural stability for optimal activity of the protein in any given environment. We need only understand that very different sequences can provide similar 3-D structures because water solubility depends only on the distribution of hydrophilic and hydrophobic residues, but not on other chemical properties. Overall structural features thus reflect the location of the protein, if it is located intracellular, extra-cellular, if it is cell membrane protein, or if it is resistant to temperature changes or sensitive to proton or calcium concentrations.

Trypsin (protease) Inhibitor

Cellular control of proteases is carried out by protease inhibitors. These are small peptides or proteins that can bind to the active site of the protease (competitive inhibitor) but which are not hydrolyzed, thereby blocking the access of substrates, e.g. protecting tissue proteins. One example of a protease inhibitor is the bovine pancreatic trypsin inhibitor (BPTI), a small protein of 58 amino acids. Its structure has been determined by X-ray crystallography and the protein has been widely used for folding studies of proteins. BPTI binds to trypsin through hydrogen bonding forming a tightly packed interface between inhibitor and enzyme. The Michaelis-Menten constant of BPTI binding Km = 10^-13M. The lysine at position 15 binds to the specificity pocket followed by an alanine. The reaction is blocked at the formation of the transition state intermediate.

HIV protease

Understanding structure-function relationship of proteins can give vital information for the development of drugs that interact with proteins in a host-pathogen environment. A recent example of rational drug design has been the development of an anti-HIV drug, the protease inhibitor. What is important is that the knowledge of the structure of a protein, which is essential for the life cycle of the virus, has been elucidated by X-ray crystallography and functional studies on related proteases, the aspartate family of proteases, has provided insight into the ligand-enzyme interaction. Thus, a HIV protease inhibitor has been designed by predicting a structure that binds with an affinity several orders of magnitude higher to the viral protease than to related host proteases. Consequently, virus replication can be inhibited without interfering with of the host metabolism.

The human immunodeficiency virus encodes for an aspartate protease (HIV PR). This protease is essential for proper virion assembly and maturation. Inactivation of this protease has therefore been identified as a therapeutic approach to suppress virus replication and complements already existing drugs interfering with HIV reverse transcriptase.

Essential for the rational design of a protease inhibitor was the successful crystallization of the protease with and without bound substrate. The HIV protease is a member of the family of aspartate proteases and related to the pepsin family of proteases. It is inhibited by pepstatin, the natural inhibitor of pepsin. The structure of pepsin and its binding of pepstatin are known and this information forms the basis of a successful design of a HIV protease inhibitor by using computer models to identify the best possible inhibitor structure.

The active site of aspartate proteases contains a pair of aspartate residues in close proximity with a water molecule hydrogen bonded and oriented optimally to attack the scissile bond of the substrate. The aspartate pair is located at the domain interface in pepsin, a monomeric protein, and at the subunit interface in HIV protease, a homodimer. The catalytic site in viral and cellular aspartic proteases are very similar, but the importance lies in minute differences in symmetry relations at the interface of domains in pepsin and subunits in HIV protease.

HIV protease inhibitor

On the basis of the difference in symmetry at the active site of pepsin and HIV protease inhibitors have been designed that show a much higher affinity for the viral protein than for the host protease.

Ki(HIV) >> Ki(Pepsin)

What happens when an asymmetric substrate (peptide) interacts with a symmetric enzyme? The subunits in the homodimer of the HIV protease are able to distinguish whether they interact with the N- or C-terminal end of the substrate/inhibitor. Like serine proteases HIV PR contains a specificity pocket for the substrate sequence -P2-P1-P1'-P2'- with residue P1 being either Q or E and residue P2 any hydrophobic amino acid. A good inhibitor exhibits a high affinity for the specificity pocket and contains an non-hydrolysable 'scissile' bond P1-P1'. Substrate analog inhibitors have therefore been designed that function as peptido mimetic. The scissile amide bond of a peptide substrate is replaced by non-hydrolysable isosteres with tetrahedral geometry (that mimics the substrate intermediate tetrahedral geometry of a peptide substrate). The binding of a hydroxyethylene peptide mimetic is stabilized by the hydrogen bond formation of the hydroxyl of the backbone with the aspartates in the active site of the protease.

The development of protease inhibitors has been accelerated by successfully using the concept that the best inhibitors are those that mimic the transition state structure of the substrate of proteases.

PART III

Recognition of DNA and RNA

DNA binding molecules

Drugs affecting gene expression inhibit the action of hormone regulated nuclear receptors. These are DNA binding proteins which either activate or suppress the transcriptional activity. So called transcription factors are regulated by dimerization induced by ligand binding. Transcription factor inhibitors either block agonist binding, or prevent dimerization. Examples include the nuclear receptors for estrogen, thyroxin, glucocorticoids, and the morphogen retinoic acid. In plants the ripening process of harvested fruits can be delayed by inhibiting the expression of a gene involved in ethylene production, the causative agent of ripening. Other drugs affecting gene expression either directly bind to DNA (non-covalent) or chemically modify DNA by cross-linking or strand cleavage. Non- covalent interaction by small non-peptidic molecules is mediated by base intercalation. Aromatic flat molecules integrate themselves between the base pair stacks changing the conformation of the double helix. Examples include the antibiotics actinomycin D and proflavin. Cross linking agents form covalent bonds mostly to nitrogen groups on guanine bases changing the surface structure of DNA and thus blocking protein binding. Examples include aflatoxin and cis-platin. Other anti-fungal and antibacterial agents induce DNA strand cleavage, such as bleomycin, anthramycin, and tomaymycin, all of which are antibacterial and antitumor drugs.

Nuclear Receptors

Nuclear receptors are transcription factors controlling gene expression activity as activators or repressors. They form a superfamily of currently 69 members and made of seven families. Based on their ligand specificity, they are split into two groups, type I receptors that bind sterol based ligands (e.g. estrogen, glucocorticoid), and type II receptors that bind non-sterol based ligands (e.g. thyroxine, 9-cis-retinoic acid, vitamin D). Type I receptors form homodimers, while type II receptors form heterodimers, usually involving one retinoid X acid receptor (RXR) subunit, homodimers, or monomers (steroidogenic factor, nerve growth factor induced gene B). Many novel nuclear receptors being discovered are orphan receptors of type II, meaning that their natural ligand has not been identified yet, although it must be a non-steroid structure based on receptor family classification. Type I and II receptors are activated in different ways. Steroid hormone receptor in the absence of ligand are found in the cytoplasmic compartment complexed with heat shock protein subunits like hsp90, 70, or 56. Ligand binding causes dissociation of the heat shock subunits, dimerization of the receptor, and transport of the ligand-receptor complex into the nucleus. Type II receptors are localized exclusively in the nuclear compartments and often function as silencer in the absence of ligand by recruiting corepressors. Ligand binding releases the corepressor activating transcription.

Thyroxin

A thyroid hormone affecting metabolic rate, temperature adaptation in warm-blooded vertebrates, regulation of water and ion transport across membranes, regulation of cholesterol metabolism and nitrogen secretion, controls growth rate of mammalian and amphibian cells, is involved in the maturation of the central nervous system, controls amphibian metamorphosis, and regulates some mitochondrial enzymes important in energy metabolism.

Thyroxine is synthesized in the thyroid gland, secreted and transported by blood plasma proteins albumin (TBG) or transthyretin (TTR). Inside cells, thyroxine is bound to cytoplasmic binding proteins (CBPs) such as myocardial myoglobin or thyroxine peroxidases which catabolize the hormone after it is no longer used. The nuclear thyroid hormone receptor TR are encoded by two genes (alpha and beta) and differ in ligand recognition and the effects of ligand in binding coactivators and corepressors. The ligand binding difference is caused by a single amino acid substitution in the binding pocket (Asn in alpha, Ser in beta) of each receptor subtype.

Structure and function (thyroxine binding) of TTR are well characterized. The protein forms a tetrameric complex and binds one thyroxine molecule in a central channel formed of beta sheets. High resolution structures allowed the elucidation of the ligand protein interaction. The most likely interaction are two isosteric conformations. Antagonists to TTR can modulate abnormal growth conditions controlled by the thyroid gland. TTR is also an amyloidogenic protein. Human amyloid disorders. Familiar amyloid polyneuropathy and cardiomyopathy, and senile systemic amyloidosis are caused by insoluble TTR fibrils which deposit in peripheral nerves and heart tissue. Non steroidal anti-inflammatory drugs have been found to strongly inhibit fibril formation in vitro. The protein-drug interaction stabilizes the native tetrameric TTR conformation.

The availability of at least three different natural receptors for thyroxine (T3) allows for a comparative study of agonist and antagonist binding. Usually, different ligand structures are available, but only one receptor structure.

Antisense drugs

A class of drugs not involving any protein interactions are short synthetic oligonucleotides called antisense DNA or RNA strands. They will bind to either DNA or RNA stretches on chromosomes or RNA blocking gene expression and/or translation. Interestingly, anti-sense drugs have been improved by combining short oligonucleotides with polycyclic intercalating residues on each end (3' and 5') drastically increasing affinity of intercalating binding mechanisms while at the same time targeting this intercalating agents to short, gene specific sequences.

Small molecule ligand-DNA interaction

The small ligand drug approach offers a simple solution. The synthesis and screening of synthetic compounds that do not exist in nature, work much like pharmacological ligand for cell surface receptors in excitable tissue, and appear to be more readily delivered to cellular targets than large RNA or protein ligands. The lack of sequence specificity for intercalating molecules, however, does not allow to target specific genes, but rather certain cellular states or physiological and pathological conditions, like rapid cell growth and division that can be selectively suppressed as compared to non growing or slowly growing healthy tissue.

The following properties have been identified as important for the successful modeling of ligand-DNA interaction:

- degrees of freedom
- role of base pair sequence
- counter ion effects
- role of solvent

This problem is analogous to that of protein ligand interaction. The major requirement for intercalating agents is the planar aromatic ring structure. This structure fits between to adjacent base pair planes and can have some, although much restricted, rotational freedom within the plane of the ring. The ligand itself may have flexibility of structural parts outside the DNA binding site and may contain more than one intercalating sidechain:

The structure of the antibiotic triostin A shows the presence of two quinoxaline (double aromatic rings) units linked through a cyclic peptide structure which is stabilized at its center by a cystein pair (disulfhydryl covalent bond).

Triostatin A belongs to a family of antibiotics which are characterized by cross-linked octapeptide rings bearing two quinoxaline chromophores. Since the spacing between the chromophores is 3.5A, the intercalation process sandwiches two base pairs between the two quinoxalines. This phenomenon is called bis-intercalation and has first been described for echinomycin by showing that bis-intercalating drugs cause twice the DNA helix extension and unwinding seen as compared to single intercalating molecule like ethidium. The latter is a chromophor which is activated by UV light and is used by molecule biologists to label nucleic acids in gel electrophoresis or ion gradient centrifugation.

Role of base pair sequence Experimental evidence suggests that base pair sequence does not play a large role on the specific nature of most intercalating complexes. As the structure of triostatin A suggests, however, the linker peptide structure may well promote specific interaction with the DNA surface. The major group specific readout sequence of H-bond donor and acceptor could be involved in triostatin A binding. The table below shows the direct readout of the DNA base sequence on a double helical structure. The following characteristics of non covalent bond formation are associated with the binding sites indicated above (readout sequence of minor (S) and major groove (W) side as they are available for protein interaction.):

binding site	GC base pair	AT base pair
W1	H-bond acceptor	H-bond acceptor
W2	blank	blank
W3	H-bond acceptor	H-bond donor
W3'	blank	blank
W2'	H-bond donor	H-bond acceptor
W1'	C-H weak hydrophobic	CH3, strong hydrophobic

While the interaction on the major groove side is distinct for the direction of the base pair (e.g. AT vs. TA), there is no directionality at the minor groove side. Minor groove interaction can, however, distinguish GC content (e.g. TATA box binding protein recognizing AT rich sequences for RNA polymerase initiation complex).

The molecular basis of specific recognition between echinomycin and DNA is due to the hydrogen bonding between the ligand alanine carbonyl groups and the 2-amino group of guanine. This is consistent with the observation that the preferred binding site is the sequence CG

Counter ion effect DNA is a negatively charged polyanion attracting counter ions, positively charged Na+, or Ca++ and Mg++ ions as well as basic residues of proteins. The presence of small counter ion affect drug binding, since the counter ions can screen and shield the negative backbone surface allowing non electrolytes as well as positively charged ligand to interact more strongly with the DNA target. High ionic strength, however, reduces non covalent interaction mediated by hydrogen bonds and electrostatic interactions.

Role of solvent ligand-receptor binding There are three general classes of interactions that must be considered in solvated ligand-receptor binding (a) ligand solvent interaction (e.g. hydration shell), (b) receptor solvent interaction, and (c) ligand-DNA complex with solvent interaction. The three classes basically describe the sequence of events of free ligand interacting with its receptor and the change in overall solvent interaction before and after binding. We have seen that the hydrophobic effect is completely described by this system and the contribution of the entropy of free bulk water is the major driving force of hydrophobic ligand receptor interaction. This type of interaction is found in intercalating substrates because the hydrophobic, aromatic side chains interactive favorably with the aromatic environment of the base pair stacking. The total amount of surface bound water is reduced in the after complex formation.

Rational for drug design When a compound intercalates into nucleic acids, there are changes which occur in both the DNA and the compound during complex formation that can be used to study the ligand DNA interaction. The binding is of course an equilibrium process because no covalent bond formation is involved. The binding constant can be determined by measuring the free and DNA bound form of the ligand. Since many of the intercalating substrates are aromatic chromophores, this can be done spectroscopically. Also, DNA double helix structures are found to be more stable with intercalating agents present and show a reduced heat denaturation. Correlating these biophysical parameters with cytotoxicity is used to support the antitumor activity of these drugs as based on their ability to intercalate in DNA double helical structures.

Improvement of anticancer drugs based on intercalating activity is not only focused on DNA-ligand interaction, but also on tissue distribution and toxic side effects on the heart (cardiac toxicity) due to redox reduction of the aromatic rings and subsequent free radical formation. Free radical species are thought to induce destructive cellular events such as enzyme inactivation, DNA strand cleavage and membrane lipid peroxidation.

Modeling DNA-ligand interaction of minor groove binders

Hairpin minor grove binding molecules have been identified and synthesized that bind to GC rich nucleotide sequences. Hairpin polyamides are linked systems that exploit a set of simple recognition rules for DNA base pairs through specific orientation of imidazol (Im) and pyrrol (Py) rings. The hairpin polyamides originated from the discovery of the three-ring Im-Py-Py molecule that bound to minor groove DNA as an antiparallel side by side dimer.

The optimal goal of polyamide ligand design has been reached with finding structures able to recognize DNA sequences of specific genes. The structure shown above inhibits the expression of 5S RNA in fibroblast cells (skin cancer cells) by interfering with the transcription factor IIIA-binding site.

A new strategy of rational drug design exploits the combination of polyamides with bis-intercalating structures. WP631 is a dimeric analog of the clinically proven anthracycline antibiotic daunorobuicin.

This new synthetic compound shows an affinity of 10pM and also showed to be resistant against multidrug resistance mechanisms often encountered in antitumor therapy. Multidrug resistance is a phenomenon where small aromatic compounds are efficiently expelled from the cell by cell membrane transport proteins commonly referred to as ABC transporters (or ATP Binding Cassette Proteins).

Drugs that form covalent bonds with DNA targets

Drugs that interfere with DNA function by chemically modifying specific nucleotides are Mitomycin C, Cisplatin, and Anthramycin.

Mitomycin C is a well characterized antitumor antibiotic which forms a covalent interaction with DNA after reductive activation. The activated antibiotic forms a cross-linking structure between guanine bases on adjacent strands of DNA thereby inhibiting single strand formation (this is essential for mRNA transcription and DNA replication).

Anthramycin is an antitumor antibiotic which bind covalently to N-2 of guanine located in the minor groove of DNA. Anthramycin has a preference of purine-G-purine sequences (purines are adenine and guanine) with bonding to the middle G.

Cisplatin is a transition metal complex cis-diamine-dichloro-platinum and clinically used as anticancer drug. The effect of the drug is due to the ability to platinate the N-7 of guanine on the major groove site of DNA double helix. This chemical modification of platinum atom cross-links two adjacent guanines on the same DNA strand interfering with the mobility of DNA polymerases.

What is Life Home