GenBank/RefSeq Data Processing Step
    The data processing step extracts data from the downloaded GenBank files
    into a format that is ready for import into the database. 
    Algorithm
    
      - 
        Run the gbProcessStepscript, which:
          - Examines each full and daily download files to determine which
          files need to be created. For each set of source files, check to see
          if mrna.md5andest.*.md5files exist in
          the approriateprocessed/directory.
- For each missing *.md5file, run thegbProcessSeqsscript, which:
- Parse flat-files with gbToFaRainto data files that
          are used to update the browser databases. An index file
          (*.gbidx) is created to location the each sequence and
          version. All species remain grouped together; spliting by species at
          this step would generate a very large number of small files.
- Checksum (md5) the data files. The checksum file serves as
          indicator that the task completed successfuly.
 
      - 
        $gbRoot/data/processed/- data extracted from the NCBI
        flat-files
          - 
            genbank.${ver}/
              - 
                full/
                  - mrna.ra.gz- meta-data for mRNAs
- mrna.fa- fasta sequence data
- mrna.gbidx- index file
- mrna.md5- checksums of all mRNA files
- est.aa.ra.gz- files for ESTs accessions
                  starting with AA (case insensitive).
- est.aa.fa,- est.aa.gbidx,- est.aa.cksum
- est.ab.ra.gz,- est.ab.fa,- est.ab.gbidx,- est.ab.cksum
- ...
 
- 
                daily.${date}/
                  - mrna.ra.gz, ...
- est.aa.ra, ...
- ...
 
 
- 
            refseq.${ver}/
 
Genbank index file
    A GenBank index file is a tab-seperated file in the format: 
    acc version moddate organism
    The name of the file is either mrna.gbidx or
    est.*.gbidx and is associated with the a *.ra or
    *.fa files of the same name. The columns are: 
    
      - acc- GenBank or RefSeq accession
- version- Version number, not including the
      accession
- moddate- Modification date, in 2002-22-08 format
- organism- Organism name