| 
The data for the working draft are organized hierarchically by
chromosome and by the sequenced-clone contigs within each chromosome.
At the top level there are 25 folders; 22 of these are for the
numbered chromosomes (autosomes), folders X and Y are for the sex
chromosomes, and Un is for clone contigs that cannot be placed
confidently on a chromosome.  Each of the 25 chromosomal folders
contains a separate clone contig folder for each of the clone contigs
for that chromosome.  
There are two primary files in each clone contig folder; these have
suffixes .fa and .agp respectively. The .fa files gives the working
draft sequence for the clone contig. The format is Fasta format,
e.g. >NT_077768
GAATTCTCTGTAACACTAAGCTCTCTTCCTCAAAACCAGAGGTAGATAGA
ATGTGTAATAATTTACAGAATTTCTAGACTTCAACGATCTGATTTTTTAA
ATTTATTTTTATTTTTTCAGGTTGAGACTGAGCTAAAGTTAATCTGTGGC
... 
The .agp file is a kind of index that tells how the .fa file is
built. It looks like 
17/NT_077768    1       6538    1       D       AC021317.18     122280  128817  -
17/NT_077768    6539    56206   2       D       AC021317.18     128918  178585  -
17/NT_077768    56207   56306   3       N       100     fragment        yes
17/NT_077768    56307   117971  4       D       AC021317.18     47188   108852  -
17/NT_077768    117972  170563  5       F       AC115992.13     23659   76250   +
17/NT_077768    170564  274979  6       D       AC124789.11     1       104416  -
... Each line represents either an actual sequence record or a gap 
(unless it begins with "#", in which case it is a comment.)  
If the line represents an actual sequence record then it has the form 
<chromosome/ctg>
<start-in-ctg>
<end-in-ctg>
<number>
<type>
<accession>.<version>
<start>
<end>
<orientation> and if it represents a gap it has the form <chromosome/ctg>
<start-in-ctg>
<end-in-ctg>
<number>
N
<number-of-Ns>
<kind>
<bridged?> 
The positions <start-in-ctg> and <end-in-ctg> are the
start and end positions for where the sequence is to be put in the .fa
file. For a sequence record, the positions <start> and
<end> are the start and end positions of where the sequence came
from in the GenBank record <accession>.<version>. The
field <orientation> tells whether or not the sequence must be
reverse complemented before it is inserted into its place in the .fa
file. For example, the records above mean that to build the .fa file
for clone contig NT_077768 from chromosome 17 you take 
AC021317 version 18, residues 122280 to 128817, reverse complemented, followed by 
AC021317 version 18, residues 128918 to 178585, reverse complemented, followed by 
a gap of 100 Ns, followed by 
AC021317 version 18, residues 47188 to 108852, reverse complemented, followed by 
AC115992 version 13, residues 23659 to 76250, followed by 
AC124789 version 11, residues 1 to 104416, reverse complemented, followed by 
... The joins perfectly abut.  In a sequence record, <type> can be       F - Finished, 
A - in Active finishing, 
D - Draft, 
P - PreDraft, 
O - Other sequence  and in a gap record it is always N.  
The <number> field just sequentially numbers the records. In a gap record, <number-of-Ns> is the size of the gap and 
<kind> is 
    fragment - a gap between two sequence contigs (also called a 
        "sequence gap") 
    split_finished - a special sized gap between two finished sequence 
        contigs 
    clone - a gap between two clones that do not overlap
    contig - a gap between clone contigs in the genome layout (also called 
        a "layout gap")
    centromere - a gap inserted for the centromere 
    short_arm - a gap inserted at the start of an acrocentric chromosome 
    heterochromatin - a gap inserted for an especially large region of 
        heterochromatin (may include the centromere as well.) 
    telomere - a gap inserted for a telomere
 <bridged?> is "yes" if there is a cDNA or BACend pair or 
plasmid end pair that spans the gap, else it is "no". We provide three ways you can 
download 
these .fa and .agp files:  
 
 full data set: the entire hierarchy in a zipped format.
 by chromosome: one zipped file for each chromosome containing all
     the sequence ordered along that chromosome.
 by individual clone contig: separate files, not zipped, for each 
     clone contig.
 
 |