GenBank/RefSeq Update Deployment
    This page describes how the GenBank/RefSeq update process is deployed. 
    
    This is a proposed setup, not currently implemented. The following
    system setup is required: 
    
      - Create a user genbankon the cluster, and the
      round-robin and GBDB server (hgnfs1).
      Enablesudotogenbankformarkd.
- There is currently sufficient disk space on
      /cluster/store5/for GenBank files and alignments for human,
      mouse and rat. However diskspace should be monitored and may need to be
      increased.
- Setup an rsync server on eieioaccessable from
      the GBDB server.
- Setup the /somewhere/genbank/directory on the
      GBDB server, owned bygenbank, preferably on the same
      filesytems as/gbdb/(but not under/gbdb/).
      NFS export and mount on the round-robin servers as/genbank/. I should also available as/genbank/on the GBDB server as well.
Download/Processing/Alignment (build)
    
      - These three steps are collectively know as the build
      phase.
- The GenBank root directory is currently at:
     /cluster/store5/genbank/
- 
        Estimates of disk space requirements: 
        
          - download/-
          50-75gb, depending on how many previous release are maintained. Once
          a new release is downloaded and processed (quarterly), old downloaded
          files can be archived.
- processed/- 25-50gb
          - processed files must be maintained as long as some database is
          using sequences from them.
- aligned/-
          ~3gb per release per genome assembly
- Cluster accessable, temporary work space - ~2gb,
 Note that these replace data currently kept in other locations, however
        the downloads it now include the HTG sequences, which add several
        gigabytes of data.
- 
        The download, processing, and alignment steps run on the GenBank
        build server, which should have the following attributes: 
        
          - Should have the GenBank root directories as local
          filesystem.
- Should have at least two CPUs.
- Must be able to rshtokkr1u00andkk.
 kkstoreis probably the best candidate.
- A dedicated user, genbank, allows multiple people to
      manage the jobs.
- A cron job will start the process daily at 1am.
Round-Robin Database Update
    
      - 
        In order to update the databases on the round-robin servers, each
        server must have acecss to the processed/andaligned/directories. FASTA files under theprocessed/directory must be copied into the/gbdb/genbank/directory. Since these directories are
        large, they are maintained on the GBDB server for access by
        the round-robin servers.
          - The GBDB server exports a /genbank/directory to the the round-robin servers, which contains theprocessed/andaligned/directories.
- If possible, the /gbdb/and/genbank/directories should be on the same physical file system on the GBDB
          server. This way, the FASTA file under the/gbdb/directory can be hard links to the ones under theprocessed/directory, saving significant disk space. If
          this is not possible, the FASTA files will be copied.
 
- A process running on the GBDB server must be able to rsyncfiles from the GenBank root. on the cluster
- 
        A cron job GBDB server polls (with rsync) the GenBank build server
        to determine if new alignments are ready. 
        
          - Copy new processed/andaligned/files
          to/genbank/hierarchy, in to passed, one to get the
          data files, and a second to get the index files.
- Update the /gbdb/genbank/hierarchy with the new
          FASTA files.  If/genbank/and/gbdb/are the same file system, these will be hardlinks.
- Flag copy as complete.
 
- 
        The each round-robin server periodically examines the the
        /genbank/to see if a copy has completed.