Saturday, October 11, 2014

Variation evaluation using 3D modeling - ACADM with CN3D

In a previous post the impact of a single amino acid substitution was evaluated using the DNAstar software tools.  By loading a reference protein sequence and looking at the predicted structure, then simulating a substitution and evaluating the predicted structural changes that the variation would cause, we had hoped to observe a tangible change in the predicted secondary structure.  Unfortunately, substituting the amino acid LYS304GLU(LYS329GLU) into the ACADM sequence did not appear to cause a significant change in the protein.  In this post we will look at the 3 dimensional structure of the protein as predicted by crystallography, compare the prediction of DNAstar secondary structural yo determine the accuracy of the algorithmic structural prediction as well as philosophize about the possible impact of the variant.

To begin, an appropriate crystal structure needed to be selected.  Using the NCBI "structure" search tool and using 'ACADM' as the search term, 8 structures were returned.  By limiting to Taxonomy: Homo Sapiens, the first sequence was selected as it was the only option which did not contain a variant.  The PDB ID for the sequence is 1T9G, and has available structures for Cn3D and PDB.  The Cn3D file was download as was the Cn3D software version 4.3.1 .

An Image of the protein (space filling model) can be seen here:

Because the protein is large and takes up a great deal of space a tube model can be seen here:


The protein appears to have 7 subunits each with its own sequence.  Assuming the sequence is correct, this would help explain the difficulty in predicting a folding change caused by a single amino acid substitution.  Because of the complexity of many subunits interacting together to catalyze a reaction, a small change which may not appear to affect a single subunit, could have a subtle impact on the interaction on how the subunits folded into each other, resulting in the larger failure of the enzyme and the clinical phenotype presentation.

Seven subunits come together to from the larger enzyme complex, of these 7, 4 are identical and have sequence matching the protein sequence of ACADM.  A space filling model of a single unit can be seen here:

To evaluate the location of our amino acid substitution, the original sequence, without the substitution was used to find the identical region.  Here we see the original sequence on top and the same sequence below with the amino acid substitution added, the amino acid substitution which was evaluated previously can be seen below in red.


The reference sequence in yellow was used to identify the same location in the protein structure and it can be seen here highlighted in yellow:

To view the secondary structure of the protein the tube model can be seen below.  On the left is overall view of the protein as a tube model and on the right is a zoomed in view of the region of interest.

To refresh our memory from the last post here is the image of the DNAstar prediction of the secondary structure, with the amino acid of interest highlighted in black:

When comparing the structural prediction algorithms of DNAstar to the crystal structure, we can see that the algorithms, for the most part, are correct.  The Garnier-Robson, and the Chou-Fasman correctly predicted the alpha-turn helix structure at the location of interest while also correctly predicting the absence of the Beta-sheet or flex region.  The Eisenberg algorithm, however, both did not predict Alpha-helix and did predict a Beta-sheet in the area of the variation.  

The prediction of the protein secondary structure by DNAstar appeared to be correct when comparing to the crystal structure, however no change in the secondary structure was observed when the variation was substituted in and the structure was reevaluated.  As stated earlier, the active protein is actually composed of 7 subunits and 4 of these subunits are composed of our protein of interest ACADM.  By highlighted the region where our variation would occur across the same 4 subunits we can see it highlighted here in yellow (yellow arrows are used to point to the region of interest).

These regions appear to be in close proximity to the other proteins in the final form of the active protein complex.  Given the proximity of these residues, and the change in charge of the substitution it is possible that this substitution actually interfere with the forming of the complex itself.  It would be interesting to design an assay which would label the subunits while persevering the subunit interaction with each other.  A gel could be run to check for size separation, with the reference sequence appearing in two positions, a smaller size of the single subunits which have yet to be incorporated into a protein complex, and a second signal from the larger protein complex.  The assay could be repeated with cells containing the genetic mutation, which results in the amino acid substitution of interest, the smaller signal should still be detected, but if the substitution interferes with the complex formation there should be no signal from the larger protein complex.  

There are many ways the amino acid change could affect the protein, it could block binding of the substrate itself, or interfere in some complex way with the catalytic site of the larger protein complex, or there may even be other more abstract interaction which could cause a problem.  Based on the DNAstar results and the observations of the structural view, it doesn't appear to be structural changes in the single subunit.




Monday, October 6, 2014

ACADM gene and it's disease

My pet gene for this semester is ACADM . It is a located at 1:76,190,042-76,229,354 (39,313 base pairs(bp)), containing 12 exons (1263 bp), which translates to a 421 amino acid (45 kilodaltons) protein. Because only 1263 bp are exonic and adding 6 bp per exon ( 3 for each side ) for splicing bases totaling 1335 (=1263+(6*12)) means that only a sparse 0.033% of the bases inside of this genomic region actually 'do something' to make the protein.

The protein itself encodes a dehydrogenase enzyme that degrades medium-chain fatty acids. Mutations resulting in a deficiency of the enzyme cause the, cleverly name, disorder Medium-chain acyl-coenzyme A dehydrogenase deficiency. I could find no reports of over production of the enzyme interestingly. The deficiency results in an "intolerance to prolonged fasting, recurrent episodes of hypoglycemic coma with medium-chain dicarboxylic aciduria, impaired ketogenesis, and low plasma and tissue carnitine levels. The disorder may be severe, and even fatal, in young patients" (Matsubara et al., 1986). For the assignment we will be modeling the variation for allele .0001 MCAD DEFICIENCY LYS304GLU created by the genomic variation dbSNP:rs77931234. This mutation may also be known as LYS329GLU (K329E) because the protein sequence itself is a precursor protein, in fact it will be the K at 329 we will need to change to an E in the sequence we use.

We will start by modeling the secondary structure of the reference protein sequence NP_000007.1, then the amino acid change LYS304GLU will be inserted where appropriate and again model the secondary change and compare the differences. The rs77931234 mutation occurs towards the beginning of of exon11 as seen in these UCSC screen caps (with rs77931234 highlighted in black). 

The protein sequence was loaded into DNASTAR's Editseq applications and Leucine(K) at position 304 (highlighted in black below) was changed to a Glutamic Acid(E). 

Here is a quick screen shot of the sequence alignment following the sequence change.

It isn't particularly surprising that this single amino acid change has a significant impact on the protein function;  Lysine is a strongly basic amino acid and Glutamic acid, the substitution, is a strongly basic amino acid.  The first step in comparing the two sequence is to load them into SeqBuilder and generate some statistics. To do this, the protein sequence was loaded into SeqBuilder, the entire protein sequence was selected, then by opening the "sequence" menu and clicking on "Statistics", we can determine the Isolectric Point, and the Charge of the protein at a PH of 7.0. For the reference sequence the Isolectric Point is 8.369, and the Charge is 5.546, this changes for the protein following the substitution of the Leucine with the Glutamic Acid to an Isolectric Point of 8.055, and the Charge at pH 7.0 is 3.550. Using these simple stats we can see that the mutation would have an impact on the enzyme at a basic bio chemical level. Unfortunately these high level statistics were the only metric I found that differentiated between the sequences.  Below are the sequences as the appear when loaded into the Protean tool in DNASTAT.  The black bar highlights the location of the mutation in each view.   The first is the reference sequence, the second is the mutated sequence.

NORMAL

MUTATED(LYS329GLU)

Using the algorithms available in the Protean tool sweet, no detectable impact on proteins structure was observed.  In both a high probability of the region of the mutation being part of an Alpha structure in the reference was no impacted by the presents of the mutation.  The Kyte-Doolittle Hydrophobicity plot (and probability) was unchanged by the mutation as well.  Because of the change in the charge of the overall protein, a further inspection of the hydorphobicity using a the Kyte-Doolittle algorithms and plots was carried out with the following plots observed, again with region of change highlighted in black, reference on top and variation on the bottom.

Again no appreciable change was observed.  The Chou-Fasman algorithm was used to inspect the region as well with the results send here:
While subtle probability shifts appear at adjacent locations no major structural change can be detection using these algorithms.   

While clinical evidence has repeatedly observed this mutation in the presence of low enzymatic activity and pathologic phenotype, none of the tools or algorithms were able to detect a major change in the protein structure caused by this variant. 

There are many other possibilities as to how this mutation could impact the function of this protein. As mentioned previously this protein sequence is actually a precursor protein and requires further editing and manipulation, this amino acid change could impact this reaction by blocking the catalytic site meaning the protein is never able to take on its fully functional form.  It is also possible that the way the protein binds to the fat molecule itself may be altered just enough to prevent the catalicataly activity of the enzyme.

As I learn to use more of the DNAstar tools I will continue to evaluate this mutation and attempt to untangle is impact.