Thursday, January 26, 2012

Carver AP Biology, Friday 1/27 Homework

Homework:
1. Do the Lab Bench activity "Population Genetics and Evolution" at the link below.  Print and turn in for credit next class.
http://www.phschool.com/science/biology_place/labbench/lab8/intro.html
2. Pop Gen Fishbowl simulation:
http://faculty.etsu.edu/jonestc/Virtualecology/VE_Window_PopGen.htm  (click on “Pop Gen Fishbowl” link)  Do the assignment Testing the assumtions in the Hardy-Weinberg Law below.

Additional Resources :
A powerpoint on evolution powerpoint will be used in class; you may access an unedited version at the link below.  If you download it, ignore the first couple of slides, since they are not entirely appropriate.  


Note: Chapter 22 Reading Guide (given out in class Wed 1/18) will be collected Tues 1/30.

Testing the assumptions in the Hardy Weinberg Law
INTRODUCTION
The Hardy-Weinberg Law states that the allele frequencies for any gene in a population will not change over time if five conditions are met.  You will use the Pop Gen Fishbowl simulation to test the effect of each of these conditions. 

Begin by reading the “More Information” link on the original window, so you will know a little bit about how the simulation works.  Then play with it a bit before doing the activity below.

EXPERIMENTS
For the purposes of these experiments, the independent (experimental) variable will be set using the sliders.  To avoid confusion, always begin with the controls set as near to valid hardy-Weinberg conditions as possible, and from this, vary only one factor at a time.  The dependent variables will be the allele frequencies and/or genotype frequencies over time.  As the Hardy-Weinberg situation is approached, these frequencies should stabilize, changing little over time.  The graph will show you intuitively how stable these are.

Population size: you can control the population of your fishbowl with the Init-N (initial number) slider, and the Carrying Capacity slider.  Do runs with a low carrying capacity and initial number, and a run at maximum carrying capacity and initial number.  (If you begin with a low initial number of fish and a big carrying capacity, you will see the stability change as the population grows.)  Note that the maximum capacity of the tank is not very high.

Random Mating: The Non-Random Mating slider allows you to adjust from zero  preference (non-random mating), left to prefer a different colored mate or right to prefer a same colored mate.

No Mutations: The mutation default settings are zero mutations.  You can allow mutations in either direction (dominant to recessive, or the reverse).  In real life, it is more common to have mutation of a dominant allele into a recessive one, since most recessives are simply inactive due to SNPs.

Migration: Migration is set by two sliders, one that gives the likelihood that a member of the population will be replaced by a migrant, and the other giving the likelihood of the migrant having the dominant  

No Natural Selection:  The selection sliders allow you to control the fitness of each genotype independently.  What matters are not the absolute numbers, but how they compare to each other.  Because each genotype can be set separately, you can decide whether the inheritance pattern is Mendelian simple dominance—recessiveness, or incomplete dominance/codominance.

Describe the setting of each of your five experiments and discuss the results in at least five paragraphs.  Be sure to head each section with the condition being tested.

Sunday, January 22, 2012

Monday, January 16, 2012

Carver AP Bio 7: Issues and Emailed Questions

I am posting emailed questions and my answers here with the idea others may benefit. 

Issues
In trying out my own instructions to recreate a student's problem, I found I didn't get the same search results I did the first time--not sure why, but here's the fix: if searching for "rna polymerase" doesn't get the results I describe, search "dna-directed rna polymerase" instead.  When I did so, "dna-directed rna polymerase II A" appeared third in the list.  Proceed from there.

Questions and Answers
"I cannot seem to zoom in to see the single nucleotides. Either that or I just don't know what they look like."
Note that there are two sets of zoom buttons: one for zooming in, one out.  When I called up dna-directed rna polymerase II A, I found that clicking 10X twice got me the amino acid sequence, but not the base sequence.  An additional push of 1.5X got me bases.  They are caps right under the position number line at the top of the window.  Alternatively, push the "Bases" button to get directly there, with the center of the sequence shown.  I like using the zoom buttons better since they give me a more intuitive sense of the scale of things, and how big the total sequence must be.

When doing the assignment is it normal to get a lower score from the BLAT comparing to the base pairs to the chicken genome than comparing them to the lancelet genome?
I don't know about "normal"--it just is what it is!  I confess to being surprised myself.  (Think about why this might be surprising!)  One thing I learned poking around the "Homology" entry in Wikipedia last night (and no, I don't turn up my nose at Wikipedia) is that some level of matching between genes in different species is not "proof" that these genes descended from a common ancestor.  In other words, the bigger score for the lancelet may be a coincidence.  I just don't know.  Of course, I'm just a teacher and not a scientist, but one of the cool things about this software is that we might just be doing REAL SCIENCE.  It's a big universe in there, and NO ONE knows all the answers!



Saturday, January 14, 2012

Carver AP Biology 6: the exercise

Copy and paste this into a Word document and then procede.  Write if you have any problems!!

An Exercise in Comparitive Genomics

You will get DNA for an exon of an important gene and compare its sequence to those in species more and less closely-related to us.

1. Go to the UCSC Genome Browser (link on 2nd post on blog).  On the gateway page (Home, if you're already in the browser), press "Click here to reset" in order to clear what you have already done.  Now specify Human in the genome box, and in the gene box enter: RNA polymerase, and hit "Submit." 

2. In the UCSC Genes list at the top, find the last item: "DNA-directed RNA polymerase II A," and click it.

3. Find the RefSeq (reference sequence) track and click on its label on the left end.  From the beginning of the summary, what do you think the protein encoded by this gene does within a cell? _______________________________________________________________

4. Use the back-arrow to return to the browser window.  Find the RefSeq track again: it shows both ends of the gene, since you can see the untraslated part of the beginning and ending exons that are shown with a half-height bar.  How many exons does this gene contain?  _________   About what percentage of the whole gene would you say consists of exons, and actually encode a protein?  _________  Notice the "Mammal Cons" graph: it shows with a histogram how similar portions of this gene are to those in other mammals that have been sequenced.  Why do you suppose the exons show very strong similarity among species? ______________________________________________________________

Why do you suppose the introns (skinny line) are shown to vary much more (much less strongly-conserved) among species?
___________________________________________________________________________________

We will choose one exon that has been strongly conserved through time and see how similar it is in species closely-related to us, and some more distant.

5. At the top of the window, click on the scale at base position 7,405,000.  This will re-center the sequence at that position and zoom in 3X.  Re-center again at about 7,405,500.  You will see an exon at about 7,405,400 that has been almost entirely strongly conserved in mammals: the Mammal Cons histogram shows a nearly-continuous bar at that position. 

6. Now zoom-in using the 3X and 1.5X buttons and re-center, continuing until this exon almost fills the entire window.  You will notice that the POLR2A top track now shows the one-letter abbreviations for the amino acid sequence encoded by this exon.  Using the table on the 2nd entry of this blog, write out the first half-dozen amino acids, going in the direction of transcription shown by the track arrows.  ______________________________ 
7. We want the actual DNA sequence for this exon, and we will get it using position numbers.  Find the base positions of the ends of this exon.  One way to do this is to drag the track back and forth in the window and note the beginning and ending positions where each end of the exon hits the edge of the display window.  (You can write these numbers down, or simply write down one, Ctrl-c the position box, and correct the copy when you copy it.)  Write the position of this exon here: chr17:_________-__________. 

8. Click the DNA item in the blue menu bar at the top.  Enter the beginning and ending positions of the exon in the position box and click "Get DNA."  You will see the entire base sequence this exon.  (If you did this right, you will have four and a half lines of nucleotide bases--about 250 bases in all.)  Now Ctrl-c just these four and a half lines, hit the back-arrow, and open the BLAT software from the blue menu bar.  Paste the base sequence into the window.

Time to use BLAT to look for similar sequences in other species that are similar to the human exon we have been looking at. 

9. Change the genome in the window to "Chimp" and hit submit.  The chimpanzee is the animal long recognized as the living species most similar to us.  the "score" is the number of bases in the matching regions.  Look at the choice with the highest score.  How many bases in the human and chimp's exons match?  ________   See the matches by clicking "details" for that match region: the matches are shown in blue capitals.  The back-arrow will get you back to the previous screen.

10. Now hit the back arrow twice and change the genome to Mouse--still a mammal, but more distant in evolutionary relationship.  This may find more than one similar sequence; what is the highest score (base match) between humans and mice?  ______

11. Now try Chicken--separated from us by perhaps 300 million years, but still a tetrapod (descendant of the first land vertebrates).  Highest score: ______

12. A lancelet is a tiny marine creature so distantly related to us that it that all it barely has the rudiments of a backbone.  Score: ______

13. C. elegans (Caenorabdis elegans), a nematode worm, is from an entirely different phylum of the animal kingdom.  The score of this exon match is: ______

14. Finally, Saccharomyces cerviseae (S. cerviseae) is baker's yeast (which causes bread to rise by producing CO2 gas by cellular respiration).  This fungus, typically single-celled, is from a different kingdom, but is still a Eukaryote.  Its score for this exon is: ____ 

(Of course, a fungus like this yeast needs to transcribe its genes just as any other organism does, but it presumably does the job of this protein domain coded by exon with a quite different sequence.)

This use of the Genome Browser and BLAT search tool is a good segue into evolution!


Carver AP Biology 5

By dinner time today I will be able to tell you what I want you to do for a grade.  (I'm making up an exercise that will be more meaningful for this unit.)  Meanwhile, feel free to play with the browser, attemp the other exercises, and so on.  I have now fiddled with the thing for hours and have learned a bit by trial-and-error.  Here are some things I have picked up from the tutorial and my experiments.

Understanding the display:

track is wide at exons, half-width are exons at the untrancribed regions (UTRs) at the gene ends (that's how you know you're looking at a whole gene), and narrow lines are introns (with arrow heads showing direction in which the gene is transcribed).

A BLACK track means there exists a corresponding PDB (Protein Data Bank structure) entry for that transcript.  DARK BLUE indicates a reviewed or validated sequence (check info page to see which), while LIGHT BLUE is a non-RefSeq sequence.

The RefSeq track is the original sequence determined, to which others can be compared.  The other tracks are presumably those of other individuals of the same species, or or related partial sequences and RNA transcripts or amino acid sequences.

At the top of the screen is a colored cartoon of the entire chromosome, with a red bar showing where you are in it.  You can go zipping all over the chromosome by clicking where you want to go on this image.  I'm guessing the constriction in the image marks the centromere.

STS sequence tags and SNPs are a single vertical line marking a single site; a wider line is simply closely-spaced STS or SNPs.  (I haven't Googled STS yet.)

Conservation tracks show likely evolutionary relationships as tall bars, while single lines represent gaps where bases don't match up, and double lines indicate more complex situations.

A glance near the bottom of the window shows how the base sequence aligns with a variety of other species, one species at a time.

Friday, January 13, 2012

Carver AP Bio 4: Making sense of the first exercise


exercise 1: mouse BRCA1

Unpacking the exercise: "Find out if the mouse BRCA1 gene has non-synonymous SNPs, color them blue, and get external data about a codon-changing SNP."

 
What are synonymous and non-synonymous SNPs?  When a nucleotide differs between two different gene DNA sequences, sometimes the different codons code for the same amino acid.  For example, the amino acid threonome (thr) is encoded by four different RNA codons, each ending in a different nucleotide (see the genetic code in an earlier post).  So the DNA that is transcribed to make the RNA can also differ.  A non-synonymous SNP, on the other hand, DOES result in a difference in the amino acids in the protein.  In other words, synonymous SNPs don't "matter" to organisms, but non-synonymous SNPs might very well matter.  In the exercise, I assume "codon-changing SNP" means non-synonymous, since that's what we're after.

How am I supposed to know what "appears to be the real BRCA1?"
The tutorial chooses a choice I don't have listed: the nearest name ends in "pe.2" rather than "pe.1"  The two are alike in where they begin in the genome (chr11:101350078), but differ in their ending positions.  Also, the "pe.2" is called "breast cancer type 1 susceptibility PROTEIN," whereas I am looking for a gene.  I suspect that the choices have changed since the tutorial was made, but I can't say why one would disappear.  I note from the descriptions in my browser window that most of the list is proteins or subunits or other things I don't recognize as genes.  The first thing in the list, though, simply says "Full=Breast cancer 1;" so I will choose this.

Nope.  I'm not in the same place in the chromosome as the tutorial, the same region isn't visible; so I will manually input the coordinates from the tutorial.  There!

After you go to the Variations and Repeats section at the bottom of the page, and click the "SNPs (128) link, note that this is about Simple Nucleotide Polymorphisms rather than Single Nucleotide Polymorphisms.  Single-nucleotide polymorphisms are single-base differences between individuals of the same or different species that originally resulted from a mutation, while simple nucleotide polymorphisms is a broader category that includes single-nucleotide polymorphisms and also single-base insertions or deletions (which would change the reading frame if in a gene), as well as changes of a few bases length.

The packed SNP display in my browser doesn't look exactly like that in the tutorial, but I can locate the same snip the tutorial selects: it is the only blue one in the 2nd full-length column: rs28273098. 
 At the end of the exercise I wanted to look at the SNP in question.   Near the top of the SNP page you find that the reference base in this position is G, while the observed variant is A.  [If you go back and use the "Get DNA" button on the previous page, you can call up a text window with as much of the sequence around that SNP as you like (use the windows for adding bases "upsteam" and "downstream").  You will find that the reference strand is the one shown, since it shows the SNP as G.]  Another interesting piece of information is the Function: the variant is a missense mutation, which is defined here.  By the way, Wikipedia has very helpful definitions of many terms you may run into here.

If you've gotten this far, you're ready to try the next exercise!  Good luck!

Carver AP Biology 3

If, on the USCS Genome Bioinformatics home page here, you click on BLAT and then look down to the info at the bottom of the page, you will find an explanation of how the software works.  I had been confused by the terms 11-mer (for searching DNA) and 3-mer (for searching proteins).  These refer to the number of subunits, so the BLAT indexes non-overlapping segments of DNA of 11 bases in length, and non-overlapping segments of proteins of 3 amino acids length.  These indices are searched when we try to match a DNA or protein sequence to an existing genome (proteome).

I am still working my way through the guided example, and have not yet tackled the two unguided ones that follow in the downloadable Hands-on Exercises pdf found here.  On my first try at the guided exercise, the screen I got did not exactly match the tutorial's, and I want to figure out why.  I am still trying to decide exactly how far I want you to go on this, and what you will turn in for credit.  I will let you know the moment I do.  In the meantime, email with any questions you have--  (No--strike that: I have questions and confusions of my own!)  Email me as soon as you get stuck: that is, unable to figure out what to do next, totally lost.  My email is jmichalsbr@aol.com

Thursday, January 12, 2012

Carver AP Bio 2

A couple of things I forgot above. 
First, the links.
List of free tutorials
Open Helix tutorials
The UCSC Genome Browser introduction
The Genome Browser itself!
Next, after you enter TP53 in the "gene" box of the browser, you press the "submit" button.  (There is no "jump" button here.)

Once you have zoomed in to see the actual base sequence, you will see what the mysterious lines below it are: the amino acid sequence.  Each of the capital letters stands for an amino acid that would result if the three bases above were transcribed and then translated.  Here is the genetic code, for convenience, and then a list of amino acid abbreviations.  NOTE THAT YOU MUST READ IN THE DIRECTION OF THE ARROWS.  In this stretch of DNA, that is right-to-left.




A   alanine                                L    leucine                      W   tryptophan
B   aspartate or asparagine   M   methionine (start)   Y    tyrosine
C   cysteine                              N   asparagine                Z    glutamine or glutamate
D   aspartate                            P    proline                      X    any
E   glutamate                           Q   glutamine                  *    translation stop
F   phenylalanine                    R    arginine                   -    gap of indeterminate length
G   glycine                                S    serine
H   histidine                             T    threonine
I    isoleucine                           U   selenocysteine
K   lysine                                  V   valine

A couple more things you may wonder about.  EST stands for Expressed Sequence Tag, and is a short stretch of DNA that comes from from cDNA.  cDNA is not DNA that was sequenced directly, but inferred by reverse-engineering a bit of mRNA (that is, writing out its DNA complement).

Carver AP Biology 1

I haven't gotten that far myself, but I have one strong recommendation: reduce the tutorial window to a little less than half size, and open the "USCS Genome Browser" (link is at the beginning of the Open Helix window from which you launch the tutorial).  The Browser is the actual software you are learning to use.  Reduce this to a window that will fit conveniently next to the tutorial window.  Now you can pause the tutorial whenever you like and try out the controls, etc. that the tutorial is talking about.

Now go into the "gene" box near the top of the screen and type in "TP53" and hit the "Jump" button.  This will put you in the same place in the genome that the tutorial is using for an example.

When you try out the arrow buttons for moving along the genome, notice how far arrow button moves you (I keep track by choosing an easy-to-recognize pattern and see where it is agains the top buttons.  Also notice the position numbers at the top of the track.  Notice how many bases are represented between the numbered points.  Now use the single arrow buttons to put a recognizable pattern into the center of the viewer.  Play with the zoom buttons to see how well the pattern stays centered.  Notice that pushing the 10X button twice gets you down to single nucleotides.  See, they really are there!

More later.