| | | 
                | Frequently Asked Questions: Genome Browser Tracks |  
 | 
 | 
 
 
        | 
	    | 
	        | 
		    | List of tracks available for a specific assembly |   |  |  
	        |  | 
|---|
 |  | 
			Question: "How can I find out which tracks have been released for
			the assembly in which I'm interested?"
 
			Response:The Release Log
			contains lists of the published tracks and release dates 
			for the current set of genome assemblies available on our
			site. It also shows version 
			information for the assemblies of other species used in 
			comparative genomics tracks.
 |  |  
 |  |  
 
        | 
	    | 
	        | 
		    | Database/browser start coordinates differ by 1 base |   |  |  
	        |  | 
|---|
 |  | 
			Question: "I am confused about the start coordinates for items in the refGene table. 
			It looks like you need to add "1" to the starting point in order to get 
			the same start coordinate as is shown by the Genome Browser. Why is this the case?"
 
			Response:Our internal database representations of coordinates always have a
			zero-based start and a one-based end.  We add 1 to the start before 
			displaying coordinates in the Genome Browser. Therefore, they appear as 
			one-based start, one-based end in the graphical display.  The refGene.txt file
			is a database file, and consequently is based on the internal representation.
 
			We use this particular internal representation because it simplifies 
			coordinate arithmetic, i.e. it eliminates the need to add or subtract 1 at every step. 
			Unfortunately, it does create some confusion when the internal 
			representation is exposed or when we forget to add 1 before 
			displaying a start coordinate. However, it saves us from much trickier 
			bugs. If you use a database dump file but would prefer to 
			see the one-based start coordinates, you will always need to add 1 
			to each start coordinate.
			 
			If you submit data to the browser in position format (chr#:##-##), the browser assumes 
			this information is 1-based. If you submit data in any other format (BED (chr# ## ##) or 
			otherwise), the browser will assume it is 0-based. You can see this both in our liftOver 
			utility and in our search bar, by entering the same numbers in position or BED format 
			and observing the results. Similarly, any data returned by the browser in 
			position format is 1-based, while data returned in BED, wiggle, etc is 0-based.  
		     |  |  
 |  |  
 
        | 
	    | 
	        | 
		    | mRNA associated results |   |  |  
	        |  | 
|---|
 |  | 
			Question: "Someties when I type in the name of a gene -- e.g. DAO (D aminoacid oxidase) -- 
			the Genome Browser returns a list that includes the gene entry on the assembly, 
			but also contains links to several other genes and aligned mRNAs. What is the 
			relationship between 
			my gene of interest and these results?"
 
			Response:The gene search results are obtained from scanning the RefSeq and Known
			Genes tracks, which are typically based on non-redundant relatively high quality 
			mRNAs. A small fraction of RefSeqs are based on DNA level annotations. In most
			cases, there is a HUGO Gene Nomenclature Committee symbol or other biological name associated with the 
			gene. In the case of the RefSeq track, the association between these names and the 
			accession is maintained at NCBI and is also present in the refLink table.
 
			The mRNA search results are obtained by scanning data associated with the 
			GenBank record for mRNAs.  These 
			are often redundant, but occasionally contain something useful that has 
			not yet made it into RefSeq. The mRNA information is often useful 
			because the people who deposited the mRNA into GenBank are listed in the 
			record. Frequently these same people have written interesting articles on the 
			gene or may serve as a source of information on the gene. 
		     |  |  
 |  |  
 
        | 
	    | 
	        | 
		    | Correspondence of Genome Browser mRNA positions to those of OMIM genes |   |  |  
	        |  | 
|---|
 |  | 
			Question: "If I do a Genome Browser search for an mRNA sequence using its GenBank accession 
			number, will I always get the same cytogenetic location as that given by 
			OMIM for the gene?"
 
			Response:Not always. Sometimes the Genome Browser will return more than one location 
			when there are recent duplication or assembly problems in the human genome. 
			In these cases, usually one of the locations will agree with OMIM. In a few 
			rare instances involving not-quite-so-recent duplications in the genome, 
			UCSC will attempt to assign it uniquely, but OMIM will think it belongs 
			someplace just under our threshhold. A Blat search of the cDNA is very 
			informative in these cases. In rare cases, UCSC or NCBI may have made a data 
			processing error. For the vast majority of cases, however, the two sites do match.
 |  |  
 |  |  
 
        | 
	    | 
	        | 
		    | Position changes of features |   |  |  
	        |  | 
|---|
 |  | 
			Question: "Yesterday I was looking at a contig in a specific location, and today the 
			location has changed. What happened?
 
			Response:Check that you are using the same assembly version that you were using yesterday.
			Features may change positions within a genome between releases, particularly
			if they are located in an area of the genome that is still in draft form.
			See Coordinate changes between assemblies for 
			more information.
 |  |  
 |  |  
 
        | 
	    | 
	        | 
		    | Missing ESTs |   |  |  
	        |  | 
|---|
 |  | 
			Question: "The EST track in human Build 33 seems to be rather different from that of
			the previous assembly. Many EST accessions that were previously present on the track 
			are now missing. If I look at the missing ESTs in Blat, the sequences align exactly 
			as they did on the build where they do appear on the track."
 
			Response:Starting with Build 33, the EST track is being filtered a little more stringently 
			both with respect to percentage identity and repeat content than in the past.
 |  |  
 |  |  
 
        | 
	    | 
	        | 
		    | Evaluating possible alternative splices |   |  |  
	        |  | 
|---|
 |  | 
			Question: "When you view results from the Genome Browser, how do you determine whether 
			the alternative splice is real or if it is a sequencing artifact?"
 
			Response:It's a very good idea to click into the alignment and check that it 
			looks clean at a detailed level and that the splice sites are reasonable. 
			If you have an alternate exon, it is also good to Blat just that exon. 
			Occasionally you may encounter a recent tandem duplication event that encompasses a 
			single exon, which can masquerade as alternative splicing on the graphical 
			display. If it's an EST, check to see if it is from a 
			RAGE library. If so, alternative promoters are likely to be an artifact of 
			the RAGE process rather than biological. If the alternative splice still looks good 
			after these checks, the next step is to do some RT-PCR in the lab.
 |  |  
 |  |  
 
        | 
	    | 
	        | 
		    | Matching exons and protein sequence |   |  |  
	        |  | 
|---|
 |  | 
			Question: "I am working with alternatively spliced forms of an enzyme. How 
			can I use the database to identify exons and exactly match them to 
			protein databases (i.e. identify the exons based on a protein sequence and 
			vice versa)?"
 
			Response:If you have a protein sequence, you can use 
			Blat 
			to align your sequence to the desired 
			genome. In the ACTIONS column on the Blat search results page, click the details link 
			to view details of exons blocks. Alternatively, click the browser link to display
			the search results in the Genome Brower. Look for instances in which a gene from 
			the Blat query track aligns exactly or very similarly to an entry in the Known
			Genes track. Click on the entry to display details about the gene. The SWISS-PROT link
			on the details page will lead you to more details about this protein.
 
			Follow a similar procedure with an mRNA sequence.  If there is no 
			corresponding entry in the Known Genes or RefSeq track, then congratulations, 
			you may have found an unreported new gene. You may want 
			to doublecheck the results using NCBI BLAST. 
		     |  |  
 |  |  
 
        | 
	    | 
	        | 
		    | Cause of duplicated gene |   |  |  
	        |  | 
|---|
 |  | 
			Question: "I have found a gene that has two identical copies on different chromosomes within
			the Genome Browser. Is this possible?"
 
			Response:One of the copies may be an artifactual duplication resulting from 
			unavoidable compromises in the assembly process.  However, there do exist 
			very recent authentic duplication events. Frequently these are pericentromeric or 
			subtelomeric.
 
			There are several checks you can make to determine whether you are viewing an actual 
			duplication or an assembly process artifact. Create a Blat track from the gene's mRNA 
			and examine the details page for a match that is too perfect. Then, open the 
			Genome Browser with the duplication and gap tracks set to dense mode. Look for 
			problems in the flanking sequence in the duplication track. Also look for suspicious 
			placement of the gene, for example inside the intron of another gene.  You may also 
			want to follow the OMIM link to look for hand-curated experimental literature 
			summaries. BLASTing the mRNA against a more recent assembly may provide another line 
			of evidence. 
		     |  |  
 |  |  
 
        | 
	    | 
	        | 
		    | Protein doesn't begin with methionine |   |  |  
	        |  | 
|---|
 |  | 
			Question: "I am looking at a human protein that the Genome Browser associates with a particular
			gene. According to the Genome Browser, its amino acid sequence doesn't start with 
			M (methionine). I thought nuclear-encoded human proteins always began with methionine?"
 
			Response:The UCSC genome browser uses translated mRNA data exactly as supplied to 
			GenBank by the original sequencing authors. Any errors at GenBank propagate 
			through many other databases and tools. To work effectively in a bioinformatic 
			area subject to errors, it is a good idea to seek supporting data for any 
			unusual finding.
 
			To further investigate this example, you may want to use Blat or BLAST to recover 
			other close members of this gene family. By using comparative alignment, you may
			discover that the 5' UTR in the mRNA for this protein was likely misintepreted as 
			coding sequence and that the protein begins with methionine as expected. The
			error may also be caused by an underlying mRNA in GenBank that stops short of 
			the initiator methionine. In this case, you could use ESTs, other mRNAs, and Blat 
			or BLAST of paralogs against unfinished genome sequence to extend the mRNA to a more 
			plausible full-length sequence.
		     |  |  
 |  |  
 
        | 
	    | 
	        | 
		    | Doing an orthology track analysis of a protein |   |  |  
	        |  | 
|---|
 |  | 
			Question: "I am working on a lipase called hormone-sensitve lipase (HSL) gene ID 
			NM_010719. I am trying to see if there is any protein that has the same 
			domain organization as HSL. Will doing an orthology track 
			of the protein help me to get an answer? How do I do the orthology 
			track analysis?"
 
			Response:You can accomplish this by using Blat and the Genome Browser Superfamily track. 
			Blat the protein sequence from the NCBI RefSeq record, then choose the choose the 
			Browser display option to view your search results in the Genome Browser window.
			Set the RefSeq and Superfamily tracks to full display mode. The RefSeq track will 
			contain the entry LIPE, and you will find the corresponding entry ENSP00000244289 in 
			the Superfamily track. Click the Superfamily entry, and then click the Superfamily link
			on the details page that displays. This will open a browser for the Superfamily 
			site. Click "alpha/beta-Hydrolases" to open the Structural Classification
			of Proteins (SCOP) page. There you will 
			find multiple families listed under this Superfamily, including the lipase in which 
			you're interested.
 |  |  
 |  |  
 
        | 
	    | 
	        | 
		    | Overlap SNPs vs. random SNPs |   |  |  
	        |  | 
|---|
 |  | 
			Question: "Some assemblies have tracks for both overlap SNPs (snpNIH) and random SNPs (snpTsc). Where are these defined? Where do you get your SNP data?"
 
			Response:You can obtain detailed information about any track from
		        its associated 
			description page (opened by clicking on the mini-button to the left of
			the displayed track). Click the "View table
		        schema" link on the description page to display
			the schema and other information for the primary 
		       	table underlying the annotation track.
 
			The SNP tracks show single nucleotide polymorphisms, which are single nucleotide
			positions in the genomic sequence for which two or more alternative alleles are present 
			at appreciable frequency (traditionally  at least 1%) in the human population.  
			The Overlap SNPs occur only where two clones of different haplotypes overlap. 
			These SNPs can be useful in confirming the 
			assembly of putative overlaps. Random SNPs are the results of a large number of 
			reads taken from random positions in the genome. If you're trying to estimate the 
			rate of variation in a region, you'd use the TSC reads and perhaps normalize them 
			further by the number of reads actually read in that region. If you just want a 
			SNP to use as a marker, both sets are valuable.
			 
			The SNP tracks are third party tracks. The snpTsc data are obtained from the 
			SNP Consortium 
			and the random SNP data are obtained from 
			dbSNP. 
			Both  accession types start with rs, e.g. rs792507. The dispersal
			of the two types of SNPs may be viewed anywhere in the Genome Browser. They are
			disjoint as sets, but otherwise fully interdigitated. 
		     |  |  
 |  |  
 
        | 
	    | 
	        | 
		    | Quality benchmarks for predicted genes |   |  |  
	        |  | 
|---|
 |  | 
			Question: "Do you offer any benchmarks of quality and quantity of known and predicted genes 
			shown in the Acembly, Ensembl, Genscan, 
			Fgenesh++, Twinscan, and TIGR Gene Index gene prediction tracks?"
 
			Response:These tracks are contributed by institutional programs outside of UCSC. You can
			access links to their home pages and relevant publications from the description
			pages associated with the tracks (which can be viewed by clicking on the grey mini-button to the left of
			the track). You may also obtain supplemental information from the Users
			Guide and the Credits page.  Methods and quality checks are often described in 
			greater detail there. No uniform benchmarking system exists. Finished chromosomes 
			are commonly used, but even here the experimental work continues today on delineating 
			genes.
 
			UCSC does not provide summary statistics for these tracks. However, these may be 
			easily compiled from the appropriate tables in the 
			Table Browser. 
			The number of predicted genes 
			and exons are easily compared. Some quality checks can also easily be run, such as 
			how many of the predicted gene models are incomplete (e.g. the transcription start 
			coordinate is the same as the CDS start).
			 
			Looking at almost any coordinate position within the Genome Browser, you can see that 
			there are discrepancies between the predicted gene tracks, as well as further 
			inconsistencies with respect to experimental data tracks such as spliced ESTs. 
			The RefSeq track also contains genes of uncertain status, e.g. lack of initiator 
			methionine. Thus, it is not clear where one can obtain a gold standard for 
			measuring gene prediction quality. A reference set might be hand-curated out of 
			recent journal articles of exceptional thoroughness. UCSC does not currently maintain 
			such a resource.
		     |  |  
 |  |  
 
        | 
	    | 
	        | 
		    | Display conventions for gene prediction tracks |   |  |  
	        |  | 
|---|
 |  | 
			Question: "What is the significance of the thinner blocks displayed at the 
			beginning and end of a gene in the browser?"
 
			Response:The varying thickness of features in the Genome Browser gene tracks denotes the 
			various structural
			features of a gene, such as exons, introns, and untranslated regions (UTRs). 
			The thickest parts of the track indicate the coding exon regions within the gene. 
			The slightly thinner portions at the leading and trailing ends of the gene track show 
			the 5' and 3' UTRs. Introns are depicted as lines with arrows indicating the 
			direction of transcription.
 
			Some aspects of the graphical representation are inevitably lost upon rescaling.
			For example, coding exons are given preference at coarse scales. For single exon genes,
			there is no place to put the strand orientation wedges, and therefore the feature's 
			detail page must be consulted.
			 
			For more information about annotation track display conventions within the Genome
			Browser, consult the 
			User's Guide.
		     |  |  
 |  |  
 
        | 
	    | 
	        | 
		    | Viewing detailed displays in conservation tracks |   |  |  
	        |  | 
|---|
 |  | 
			Question: "When I click on a region in the Human/Mouse Evolutionary Conservation Score track, 
			it doesn't give me detailed information."
 
			Response:The track is defaulting to dense display mode because the size of the track's 
			displayed region is too large.  Unfortunately, this particular 
			track doesn't have good visual cues to show you when it's defaulting to dense 
			mode. If you zoom in on the region in which you're having the problem, you should be 
			able to display the details page.
 |  |  
 |  |  
 
        | 
	    | 
	        | 
		    | Negative strand coordinates in PSL files |   |  |  
	        |  | 
|---|
 |  | 
			Question: "I've noticed that the blatFugu table has two characters representing the strand. 
			Also, I've noticed that the starting/ending positions of the blocks don't fall within 
			the start/end positions of the chromosome target."
 
			Response:When the second character in the strand is "-", the coordinates of 
			the comma-separated list of tStarts are reverse-complemented relative to tStart, 
			much as qStarts behave when the first letter in the strand is '-'.
 |  |  
 |  |  
 
        | 
	    | 
	        | 
		    | Inconsistency in stop codon treatment in GTF tracks |   |  |  
	        |  | 
|---|
 |  | 
			Question: "I've been doing some comparative gene set analysis using the gene 
			annotation tracks and I believe I have run into an inconsistency in the way 
			that stop codons are treated in the annotations.  Looking at the Human 
			June 2002 assembly, the annotations for Ensembl, Twinscan, SGP, and 
			Geneid appear to exclude the stop codon in the coding region coordinates.  
			All of the other gene annotation sets include the stop codon as part of 
			the coding region. My guess is that this inconsistency is the result of 
			the gene sets being imported from different file formats. The 
			GTF2 format
			does not include the stop codon in the terminal exon, while the GenBank format 
			does, and the GFF format does not specify what to do.
 
			Response:Your guess is correct. We haven't gotten around to fixing this situation. 
			A while ago, the Twinscan folks made a GTF validator. It interpreted the 
			stop codon as not part of the coding region. Prior to that, all 
			GFF and GTF annnotations that we received did include the stop codon as 
			part of the coding region; therefore, we didn't have special code in our 
			database to enforce it. In response to the validator, Ensembl, SGP and 
			Geneid switched their handling of stop codons to the way that Twinscan 
			does it, hence the discrepancy.
 |  |  
 |  |  
 
        | 
	    | 
	        | 
		    | Obtaining clones referenced in Genome Browser |   |  |  
	        |  | 
|---|
 |  | 
			Question: "Is it possible to purchase the chromosome clones 
			referenced in the Genome Browser?"
 
			Response:You can find further information about a specific clone 
			by clicking on the clone name link on the details page 
			for the item. This links to the NCBI Clone Registry website, 
			which lists extensive details about the clone, 
			including distributor information.
 |  |  
 |  |  
 
        | 
	    | 
	        | 
		    | Locating centromeres and telomeres |   |  |  
	        |  | 
|---|
 |  | 
			Question: "How do I find the positions of the centromeres and 
			telomeres in a particular assembly?"
 
			Response:This information can be found in the "gap" 
			database table.  Use the Table Browser to extract it.
			To do this, select your assembly and the gap table, 
			then click the "filter Create" button. Set 
			the "type" field to 
			centromere telomere (separated by a space).
			For help using the Table Browser, visit the 
			User's Guide.
 |  |  
 |  |  
 
        | 
	    | 
	        | 
		    | Determining the table name for an annotation track |   |  |  
	        |  | 
|---|
 |  | 
			Question: "How do I find the name of the database table that 
                        contains the data for a particular annotation track?"
 
			Response:Each annotation track in the Genome Browser has one or
                        more database tables associated with it. To find the 
                        name of the primary table, navigate to the schema page.
                        You will find the schema page by pressing the 
                        'mini-button' to the left of the annotation track 
                        display, or clicking the hyper-linked track name in the 
                        track controls (below the display). From the resulting
                        description page, follow the "View table
                        schema" link.  Finally, on the schema page, you
                        will find the name of the database table near the top
                        of the page listed after the Primary Table label.
 |  |  
 |  |  |  |