| | | 
                | Frequently Asked Questions: Assembly Releases and Versions |  
 | 
 | 
 
 
        | 
	    | 
	        | 
		    | List of UCSC genome releases |   |  |  
	        |  | 
|---|
 |  | 
			Question: "How do UCSC's release numbers correspond to those of other organizations, such as NCBI?"
 
			Response:
 
             
		
		| SPECIES | UCSC VERSION | RELEASE DATE | RELEASE NAME | STATUS |  | VERTEBRATES |  |  |  |  |  | Human | hg19 | Feb. 2009 | Genome Reference Consortium GRCh37 | Available |  |  | hg18 | Mar. 2006 | NCBI Build 36.1 | Available |  |  | hg17 | May 2004 | NCBI Build 35 | Available |  |  | hg16 | Jul. 2003 | NCBI Build 34 | Available |  |  | hg15 | Apr. 2003 | NCBI Build 33 | Archived |  |  | hg13 | Nov. 2002 | NCBI Build 31 | Archived |  |  | hg12 | Jun. 2002 | NCBI Build 30 | Archived |  |  | hg11 | Apr. 2002 | NCBI Build 29 | Archived |  |  | hg10 | Dec. 2001 | NCBI Build 28 | Archived |  |  | hg8 | Aug. 2001 | UCSC-assembled | Archived |  |  | hg7 | Apr. 2001 | UCSC-assembled | Archived |  |  | hg6 | Dec. 2000 | UCSC-assembled | Archived |  |  | hg5 | Oct. 2000 | UCSC-assembled | Archived |  |  | hg4 | Sep. 2000 | UCSC-assembled | Archived |  |  | hg3 | Jul. 2000 | UCSC-assembled | Archived |  |  | hg2 | Jun. 2000 | UCSC-assembled | Archived (data set only) |  |  | hg1 | May 2000 | UCSC-assembled | Archived (data set only) |  | Cat | felCat4 | Dec. 2008 | NHGRI catChrV17e | Available |  |  | felCat3 | Mar. 2006 | Broad Institute Release 3 | Available |  | Chicken | galGal3 | May 2006 | WUSTL Gallus-gallus-2.1 | Available |  |  | galGal2 | Feb. 2004 | WUSTL Gallus-gallus-1.0 | Available |  | Chimp | panTro3 | Oct. 2010 | CGSC Build 2.1.3 | Available |  |  | panTro2 | Mar. 2006 | CGSC Build 2.1 | Available |  |  | panTro1 | Nov. 2003 | CGSC Build 1.1 | Available |  | Cow | bosTau6 | Nov.
2009 | University of Maryland v3.1 | Available |  |  | bosTau4 | Oct. 2007 | Baylor College of Medicine HGSC Btau_4.0 | Available |  |  | bosTau3 | Aug. 2006 | Baylor College of Medicine HGSC Btau_3.1 | Available |  |  | bosTau2 | Mar. 2005 | Baylor College of Medicine HGSC Btau_2.0 | Available |  |  | bosTau1 | Sep. 2004 | Baylor College of Medicine HGSC Btau_1.0 | Archived |  | Dog | canFam2 | May 2005 | Broad Institute v2.0 | Available |  |  | canFam1 | Jul. 2004 | Broad Institute v1.0 | Available |  | Elephant | loxAfr3 | Jul. 2009 | Broad loxAfr3 | Available |  | Fugu | fr2 | Oct. 2004 | JGI v4.0 | Available |  |  | fr1 | Aug. 2002 | JGI v3.0 | Available |  | Gibbon | nomLeu1 | Jan. 2010 | Gibbon Genome Sequencing Consortium Nleu1.0 | Available |  | Gorilla | gorGor3 | May 2011 | Wellcome Trust Sanger Institute gorGor3.1 | Available |  | Guinea pig | cavPor3 | Feb. 2008 | Broad cavPor3 | Available |  | Horse | equCab2 | Sep. 2007 | Broad EquCab2 | Available |  |  | equCab1 | Jan. 2007 | Broad EquCab1 | Available |  | Lamprey | petMar1 | Mar. 2007 | WUSTL v3.0 | Available |  | Lizard | anoCar2 | May 2010 | Broad AnoCar2 | Available |  |  | anoCar1 | Feb. 2007 | Broad AnoCar1 | Available |  | Marmoset | calJac3 | Mar. 2009 | WUSTL Callithrix_jacchus-v3.2 | Available |  |  | calJac1 | Jun. 2007 | WUSTL Callithrix_jacchus-v2.0.2 | Available |  | Medaka | oryLat2 | Oct. 2005 | NIG v1.0 | Available |  | Microbat | myoLuc2 | Jul. 2010 | Broad myoLuc2.0 | Available |  | Mouse | mm9 | Jul. 2007 | NCBI Build 37 | Available |  |  | mm8 | Feb. 2006 | NCBI Build 36 | Available |  |  | mm7 | Aug. 2005 | NCBI Build 35 | Available |  |  | mm6 | Mar. 2005 | NCBI Build 34 | Archived |  |  | mm5 | May 2004 | NCBI Build 33 | Archived |  |  | mm4 | Oct. 2003 | NCBI Build 32 | Archived |  |  | mm3 | Feb. 2003 | NCBI Build 30 | Archived |  |  | mm2 | Feb. 2002 | MGSCv3 | Archived |  |  | mm1 | Nov. 2001 | MGSCv2 | Archived |  | Opossum | monDom5 | Oct. 2006 | Broad Institute release MonDom5 | Available |  |  | monDom4 | Jan. 2006 | Broad Institute release MonDom4 | Available |  |  | monDom1 | Oct. 2004 | Broad Institute release MonDom1 | Available |  | Orangutan | ponAbe2 | Jul. 2007 | WUSTL Pongo_albelii-2.0.2 | Available |  | Panda | ailMel1 | Dec. 2009 | BGI-Shenzhen AilMel 1.0 | Available |  | Pig | susScr2 | Nov. 2009 | SGSC Sscrofa9.2 | Available |  | Platypus | ornAna1 | Mar. 2007 | WUSTL v5.0.1 | Available |  | Rabbit | oryCun2 | Apr. 2009 | Broad Institute release oryCun2 | Available |  | Rat | rn4 | Nov. 2004 | Baylor College of Medicine HGSC v3.4 | Available |  |  | rn3 | Jun. 2003 | Baylor College of Medicine HGSC v3.1 | Available |  |  | rn2 | Jan. 2003 | Baylor College of Medicine HGSC v2.1 | Archived |  |  | rn1 | Nov. 2002 | Baylor College of Medicine HGSC v1.0 | Archived |  | Rhesus | rheMac2 | Jan. 2006 | Baylor College of Medicine HGSC v1.0 Mmul_051212 | Available |  |  | rheMac1 | Jan. 2005 | Baylor College of Medicine HGSC Mmul_0.1 | Archived |  | Sheep | oviAri1 | Feb. 2010 | ISGC Ovis aries 1.0 | Available |  | Stickleback | gasAcu1 | Feb. 2006 | Broad Release 1.0 | Available |  | Tetraodon | tetNig2 | Mar. 2007 | Genoscope v7 | Available |  |  | tetNig1 | Feb. 2004 | Genoscope v7 | Available |  | Turkey | melGal1 | Dec.
2009 | Turkey Genome Consortium v2.01 | Available |  | X. tropicalis | xenTro3 | Nov. 2009 | JGI v.4.2 | Available |  |  | xenTro2 | Aug. 2005 | JGI v.4.1 | Available |  |  | xenTro1 | Oct. 2004 | JGI v.3.0 | Available |  | Zebra finch | taeGut1 | Jul. 2008 | WUSTL v3.2.4 | Available |  | Zebrafish | danRer7 | Jul. 2010 | Sanger Institute Zv9 | Available |  |  | danRer6 | Dec. 2008 | Sanger Institute Zv8 | Available |  |  | danRer5 | Jul. 2007 | Sanger Institute Zv7 | Available |  |  | danRer4 | Mar. 2006 | Sanger Institute Zv6 | Available |  |  | danRer3 | May 2005 | Sanger Institute Zv5 | Available |  |  | danRer2 | Jun. 2004 | Sanger Institute Zv4 | Archived |  |  | danRer1 | Nov. 2003 | Sanger Institute Zv3 | Archived |  |  |  |  |  |  |  | DEUTEROSTOMES |  |  |  |  |  | C. intestinalis | ci2 | Mar. 2005 | JGI v2.0 | Available |  |  | ci1 | Dec. 2002 | JGI v1.0 | Available |  | Lancelet | braFlo1 | Mar. 2006 | JGI v1.0 | Available |  | S. purpuratus | strPur2 | Sep. 2006 | Baylor College of Medicine HGSC v. Spur 2.1 | Available |  |  | strPur1 | Apr. 2005 | Baylor College of Medicine HGSC v. Spur_0.5 | Available |  |  |  |  |  |  |  | INSECTS |  |  |  |  |  | A. mellifera | apiMel2 | Jan. 2005 | Baylor College of Medicine HGSC v.Amel_2.0 | Available |  |  | apiMel1 | Jul. 2004 | Baylor College of Medicine HGSC v.Amel_1.2 | Available |  | A. gambiae | anoGam1 | Feb. 2003 | IAGP v.MOZ2 | Available |  | D. ananassae | droAna2 | Aug. 2005 | Agencourt Arachne release | Available |  |  | droAna1 | Jul. 2004 | TIGR Celera release | Available |  | D. erecta | droEre1 | Aug. 2005 | Agencourt Arachne release | Available |  | D. grimshawi | droGri1 | Aug. 2005 | Agencourt Arachne release | Available |  | D. melanogaster | dm3 | Apr. 2006 | BDGP Release 5 | Available |  | D. melanogaster | dm2 | Apr. 2004 | BDGP Release 4 | Available |  |  | dm1 | Jan. 2003 | BDGP Release 3 | Available |  | D. mojavensis | droMoj2 | Aug. 2005 | Agencourt Arachne release | Available |  |  | droMoj1 | Aug. 2004 | Agencourt Arachne release | Available |  | D. persimilis | droPer1 | Oct. 2005 | Broad Institute release | Available |  | D. pseudoobscura | dp3 | Nov. 2004 | Flybase Release 1.0 | Available |  |  | dp2 | Aug. 2003 | Baylor College of Medicine HGSC Freeze 1 | Available |  | D. sechellia | droSec1 | Oct. 2005 | Broad Release 1.0 | Available |  | D. simulans | droSim1 | Apr. 2005 | WUSTL Release 1.0 | Available |  | D. virilis | droVir2 | Aug. 2005 | Agencourt Arachne release | Available |  |  | droVir1 | Jul. 2004 | Agencourt Arachne release | Available |  | D. yakuba | droYak2 | Nov. 2005 | WUSTL Release 2.0 | Available |  |  | droYak1 | Apr. 2004 | WUSTL Release 1.0 | Available |  |  |  |  |  |  |  | NEMATODES |  |  |  |  |  | C. brenneri | caePb2 | Feb. 2008 | WUSTL 6.0.1 | Available |  |  | caePb1 | Jan. 2007 | WUSTL 4.0 | Available |  | C. briggsae | cb3 | Jan. 2007 | WUSTL Cb3 | Available |  |  | cb1 | Jul. 2002 | WormBase v. cb25.agp8 | Available |  | C. elegans | ce6 | May 2008 | WormBase v. WS190 | Available |  |  | ce4 | Jan. 2007 | WormBase v. WS170 | Available |  |  | ce2 | Mar. 2004 | WormBase v. WS120 | Available |  |  | ce1 | May 2003 | WormBase v. WS100 | Archived |  | C. japonica | caeJap1 | Mar. 2008 | WUSTL 3.0.2 | Available |  | C. remanei | caeRem3 | May 2007 | WUSTL 15.0.1 | Available |  |  | caeRem2 | Mar. 2006 | WUSTL 1.0 | Available |  | P. pacificus | priPac1 | Feb. 2007 | WUSTL 5.0 | Available |  |  |  |  |  |  |  | OTHER |  |  |  |  |  | Sea Hare | aplCal1 | Sep. 2008 | Broad Release Aplcal2.0 | Available |  | Yeast | sacCer3 | April 2011 | SGD April 2011 sequence | Available |  |  | sacCer2 | June 2008 | SGD June 2008 sequence | Available |  |  | sacCer1 | Oct. 2003 | SGD 1 Oct 2003 sequence | Available |  |  |  
 |  |  
 
        | 
	    | 
	        | 
		    | Initial assembly release dates |   |  |  
	        |  | 
|---|
 |  | 
			Question: "When will the next assembly be out?"
 
			Response:UCSC does not produce its own genome assemblies, but instead obtains them from 
			standard sources. For example, the human assembly is obtained
			from NCBI. Because of this, you can expect us to release a new version of a 
			genome soon after the assembling organization has released the version. 
			A new assembly release initially consists of the genome sequence and a
			small set of aligned annotation tracks. Additional annotation tracks are added 
			as they are obtained or generated. Bulk downloads of the data are typically
			available in the first week after the assembly is released in the browser.
 |  |  
 |  |  
 
        | 
	    | 
	        | 
		    | Data sources - UCSC assemblies |   |  |  
	        |  | 
|---|
 |  | 
			Question: "Where does UCSC obtain the assembly and annotation data
			 displayed in the Genome Browser?"
 
			Response:All the assembly data displayed in the UCSC Genome
			Browser are obtained from external sequencing centers.
			To determine the data source and version for a given 
			assembly, see the assembly's description on the Genome
			Browser Gateway page or
			the List of UCSC Genome Releases.
 
			The annotations accompanying an assembly are obtained
			from a variety of sources. The UCSC Genome Bioinformatics 
			Group generates several of the tracks; the remainder are
			contributed by collaborators at other sites. Each track
			has an associated description page that credits the
			authors of the annotation. 
			 
			For detailed information about the individuals and 
			organizations who contributed to a specific assembly,
			see the Credits
			page.
		     |  |  
 |  |  
 
        | 
	    | 
	        | 
		    | Comparison of UCSC and NCBI human assemblies |   |  |  
	        |  | 
|---|
 |  | 
			Question: "How do the human assemblies displayed in the UCSC 
			 Genome Browser differ from the NCBI human assemblies?
 
			Response:Recent human assemblies displayed in the Genome
		  	Browser (hg10 and higher) are identical to the NCBI
		  	assemblies.
 |  |  
 |  |  
 
        | 
	    | 
	        | 
		    | Differences between UCSC and NCBI mouse assemblies |   |  |  
	        |  | 
|---|
 |  | 
			Question: "Is the mouse genome assembly displayed in the UCSC
			Genome Browser the same as the one on the NCBI website?"
 
			Response:The mouse genome assemblies featured in the UCSC 
			Genome Browser are the same as 
			those on the NCBI web site with one difference: the UCSC 
			versions contain
			only the reference strain data (C57BL/6J). NCBI provides
			data for several additional strains in their builds.
 |  |  
 |  |  
 
        | 
	    | 
	        | 
		    | Accessing older assembly versions |   |  |  
	        |  | 
|---|
 |  | 
			Question: "I need to access an older version of a genome assembly that's no
			longer listed in the Genome Browser menu. What should I do?"
 
			Response:In addition to the assembly versions currently available in the Genome Browser, 
			you can access older versions of the browser through our archives. To view an
			older version, click the Archives link on the Genome Browser home page.
 |  |  
 |  |  
 
        | 
	    | 
	        | 
		    | Frequency of GenBank data updates |   |  |  
	        |  | 
|---|
 |  | 
			Question: "How frequently does UCSC update its databases with new 
			data from GenBank?"
 
			Response:Daily and weekly incremental updates of mRNA, RefSeq, 
			and EST data are in place for several of the 
			more recent Genome Browser assemblies. 
			Assemblies that are not on an incremental update 
			schedule are updated whenever we load a new assembly or
			make a major revision to a table.
 
			Data are updated on the following schedule:
			 
			
			Native and xeno mRNA and refSeq tracks: updated 
			daily
			
			EST data: updated weekly on Saturday morning
			
			Downloadable data files: updated weekly on Saturday 
			morning
			
			Outdated sequences - removed once per quarter
			 Mirror sites are not required to use an 
			incremental update process, and should not experience 
			problems as a result of these updates. 
		     |  |  
 |  |  
 
        | 
	    | 
	        | 
		    | Coordinate changes between assemblies |   |  |  
	        |  | 
|---|
 |  | 
			Question: "I noticed that the chromosomal coordinates for a particular gene that I'm
			looking at have changed since the last time I used your browser. What happened?"
 
			Response:A common source of confusion for users arises from 
			mixing up different assemblies. It is very 
			important to be aware of which assembly you are looking at. Within the Genome
			Browser display, assemblies are labeled by organism and date. To look up the 
			corresponding UCSC database name or NCBI build number, use the 
			release table.
 
			UCSC database labels are of the form hgn, 
			panTron, etc. The letters designate the organism,
			e.g. hg for human genome or panTro for
			Pan troglodytes. The number denotes the UCSC 
		   	assembly version for that organism.  For example, ce1 
			refers to the first UCSC assembly of the 
			C. elegans genome.
			 
			The coordinates of your favorite gene in one assembly may 
			not be the same as those in the next release of the 
			assembly unless the gene happens to lie on a completely 
			sequenced and unrevised chromosome. For information on 
			integrating data from one assembly into another, see the 
			Converting positions between
			assembly versions section. 
		     |  |  
 |  |  
 
        | 
	    | 
	        | 
		    | Converting positions between assembly versions |   |  |  
	        |  | 
|---|
 |  | 
			Question: "I've been researching a specific area of the human genome
			on the current assembly, and now you've just released a 
			new version. Is there an easy way to locate 
			my area of interest on the new assembly?"
 
			Response:See the section on converting coordinates for information on assembly migration tools.
 |  |  
 |  |  
 
        | 
	    | 
	        | 
		    | Missing annotation tracks |   |  |  
	        |  | 
|---|
 |  | 
			Question: "Why is my favorite annotation track missing from your latest release?"
 
			Response:The initial release of a new genome assembly typically contains a small subset
			of core annotation tracks. New tracks are added as they are generated. In many
			cases, our annotation tracks are contributed by scientists not affiliated with
			UCSC who must first obtain the sequence, repeatmasked data, etc. before they
			can produce their tracks. If you have need of an annotation that has not appeared
			on an assembly within a month or so of its release, feel free to send an inquiry
			to genome@soe.ucsc.edu.
			Messages sent to this address will be posted to the 
			moderated genome mailing list, which is archived on a public 
			Web-accessible pipermail archive.  This archive may be 
			indexed by non-UCSC sites such as Google.
 |  |  
 |  |  
 
        | 
	    | 
	        | 
		    | What next with the human genome? |   |  |  
	        |  | 
|---|
 |  | 
			Question: "Now that the human genome is "finished", will there be any more releases?"
 
			Response:Rest assured that work will continue. There will be updates to the assembly over the next 
			several years. This has been the case for all other finished (i.e. essentially complete) genome
			assemblies as gaps are closed. For example, the C. elegans genome has been 
			"finished" for several years, but small bits of sequence are still being 
			added and corrections are being made. NCBI will continue to coordinate the human
			genome assemblies in collaboration with the individual chromosome coordinators, and 
			UCSC will continue to QC the assembly in conjunction with NCBI (and, to a lesser extent,
			Ensembl). UCSC, NCBI, Ensembl, and others will display the new releases on their
			sites as they become available.
 |  |  
 |  |  
 
        | 
	    | 
	        | 
		    | Mouse strain used for  mouse genome sequence |   |  |  
	        |  | 
|---|
 |  | 
			Question: "What strain of mouse was used for the Mus musculus genome?"
 
			Response:C57BL/6J.
 |  |  
 |  |  
 
        | 
	    | 
	        | 
		    | UniProt (Swiss-Prot/TrEMBL) display changes |   |  |  
	        |  | 
|---|
 |  | 
			Question: "What has UCSC done to accommodate the changes to 
			 display IDs recently introduced by UniProt (aka
		 	 Swiss-Prot/TrEMBL)?"
 
			Response:Here is a detailed description of the database changes
			we have made to accommodate the UniProt changes. If 
			you are using the proteinID field in our 
			knownGene 
			table or the Swiss-Prot/TrEMBL display ID for indexing
			or cross-referencing other data, we strongly suggest 
			you transition to the UniProt accession number. 
		  	These changes will also affect anyone who is 
			mirroring our site.
 
			
			The latest UniProt Knowledgebase (Release 46.0, 
			Feb. 1st, 2005) was parsed and the results were 
			stored in a newly created database sp050201.
			
			A corresponding database, proteins050201, 
			was constructed based on data in sp050201 
			and other protein data sources.
			
			Two new symbolic database pointers, uniProt 
			and proteome, have been created to point to 
			the two new databases mentioned above. Some parts of 
			our programs use the data in these two 
			DBs.   uniProt  ---> sp050201
   proteome ---> proteins050201
			The existing protein symbolic database pointers, 
			swissProt and proteins remain 
			unchanged. Some parts of our programs still use these 
			two pointers and the data in their associated protein 
			databases.   swissProt ---> sp041115
   proteins  ---> proteins041115
			Two new tables, spOldNew and 
			uniProtAlias, have been added to the proteome
			database.
 The spOldNew table contains three columns:
 
			acc  -- primary accession number
    			oldDisplayId -- old display ID
    			newDisplayId -- new display ID
			 The uniProtAlias table contains four columns:
 
    			acc -- UniProt accession number
    			alias -- alias (could be acc, old and new
			display IDs, etc.)
    			aliasSrc -- source of the alias type
    			aliasSrcDate -- date of the source data
			 
    			The aliases include primary accessions, secondary 
			accessions new display IDs, old display IDs, and old 
			display IDs corresponding to new secondary accessions.
			
			Three new functions have been added to 
			kent/src/hg/spDb.c:    char *oldSpDisplayId(char *newSpDisplayId);
   /* Convert from new Swiss-Prot display ID to old display ID */
      
   char *newSpDisplayId(char *oldSpDisplayId); 
   /* Convert from old Swiss-Prot display ID to new display ID */
   char *uniProtFindPrimAcc(char *id);
   /* Return primary accession given an alias. */
   			The uniProtFindPrimAcc() function is enabled 
			by the new uniProtAlias table. 
			We anticipate additional changes down the road and 
			may eventually merge the two sets of protein DB 
			pointers into one set.
			 
			Currently, the proteinID field of the 
			knownGene table 
			for existing genome releases (hg15, hg16, 
			hg17, mm3, mm4, mm5, rn2, and rn3) uses old 
			Swiss-Prot/TrEMBL display IDs (pre-1 Feb. '05). In 
			the future, we may change this field to show the 
			UniProt accession number. Should we choose not to 
			change the content of the proteinID field, 
			we may consider adding a new field, 
			uniProtAcc.
			 
			If you have any questions about these changes and 
			their impact on your work, please email us at 
			genome@soe.ucsc.edu. 
			Mirror sites may send questions to 
			genome-mirror@soe.ucsc.edu.
			Messages sent to these addresses will be posted to the 
			moderated mailing lists, which are archived on a public 
			Web-accessible pipermail archive.  This archive may be 
			indexed by non-UCSC sites such as Google.
		     |  |  
 |  |  |  |