|  |  Following the release of the completed  human 
		  	  genome sequence in April 2003, the scientific 
			  community intensified its  efforts to mine the data 
			  for clues about how the body works in health and in 
			  disease. A basic requirement for this understanding of
			  human biology is the ability to identify and 
			  characterize sequence-based functional elements 
			  through experimentation and computational analysis. 
			  In September 2003, the  NHGRI introduced the ENCODE 
			  project to facilitate the identification and analysis 
			  of the complete set of functional elements in the 
			  human genome sequence. During the initial pilot and 
			  technology development phases of the project, 44 
			  regions—approximately 1% of the human 
			  genome—were targeted for analysis using a 
			  variety of experimental and computational methods with
		 	  the aim of assembling a comprehensive encyclopedia of 
			  the functional elements in these regions, showing 
			  their identity and precise location. The pilot project
			  established protocols for scaling up to full-genome 
			  coverage and  produced a wealth of data, elucidating 
			  elements such as protein-coding genes, transcription 
			  units, protein binding sites, conserved DNA elements, 
			  features of chromatin assembly and modification, and 
			  single nucleotide polymorphisms.  
			  During the pilot phase, UCSC collected, processed, 
			  and released more than 500 ENCODE data sets 
			  representing a broad range of experimental methods and
			  diverse tissues and cell lines. In addition to the two
			  designated ENCODE cell lines, HeLa cervical carcinoma 
			  and GM06990 lymphoblastoid, more than 40 cell types 
			  are represented. A substantial proportion of the data 
			  is the product of chromatin immunoprecipitation 
			  (ChIP-CHIP) experiments used to determine binding 
			  sites for transcription factors—eight groups 
			  have produced ChIP/CHIP data from four microarray 
			  platforms, investigating more than two dozen 
			  transcription  factors and histone modifications. 
			  Several experimental groups have provided time course 
			  data and varied cell treatments. Other notable 
			  experimental data include localization of RNA 
			  transcription starts, identification of regions of 
			  DNaseI hypersensitivity, and temporal profiling of DNA
			  replication. 
			  Accompanying the ENCODE experimental data, UCSC also 
			  hosts the ENCODE high-quality gene set, provided by 
			  the Gencode project, and a  variety of computationally
			  derived annotations, including gene predictions from  
			  the ENCODE Gene Annotation Assessment Project (EGASP),
			  pseudogene annotations from four projects, and RNA 
			  secondary structure predictions from two contributors.
			  The comparative 
			  genomics tracks include multiple alignments of 28 
			  vertebrate species in the ENCODE regions, produced 
			  with three sequence  alignment methods and four 
			  different conservation algorithms. The 
			  Genome  Browser provides a full set of genome-wide 
			  comparative genomics tracks that complement the 
			  ENCODE tracks, including a genome-wide multiple 
			  alignment covering nearly 30 vertebrate species.  
			  You can find more information about the ENCODE pilot 
			  phase at UCSC in the news 
			  archives. |  |