| 
	    | 
		    | Procedure for Creating a Mirror Site for the UCSC Genome Browser |  
		    |  | 
|---|
 |  | 
			    The following procedure provides you with step-by-step
			    instructions to incrementally create a mirror of the
			    UCSC Genome Browser. You may choose to set up either a
			    full mirror browser or a partial one, depending on your disk
			    space and needs.  See also:
			    Minimal Browser Installation
			    instructions on
			    genomewiki.
			     
                             
                            A license is required for commercial download 
                            and/or installation of the Genome Browser binaries 
                            and source code.
                            No license is needed for academic, nonprofit, and 
                            personal use.  To purchase a license, see our
                             License Instructions. 
                            Space required: 
			    The amount of data available in the Genome Browser is
			    growing constantly. To determine the size of any of
			    the download directories mentioned in these instructions,
			    use the rsync "-n" option on the directory prior
			    to actually transferring the data.  For instance, to find
			    the size of the /gbdb directory, run:
 
rsync -navP rsync://hgdownload.cse.ucsc.edu/gbdb/
The rsync options used in this command are: 
-n, --dry-run         show what would have been transferred
-a, --archive         archive mode, equivalent to -rlptgoD
-v, --verbose         increase verbosity
-P                    equivalent to --partial --progress
 
			    Mirror site questions may be directed to the 
			    mailing list genome-mirror@soe.ucsc.edu.
			    
			    Messages sent to this address will be posted to the 
			    moderated genome-mirror mailing list, which is 
			    archived on a public Web-accessible pipermail 
			    archive.  This archive may be indexed by non-UCSC 
			    sites such as Google. 
			     
			    Subscribe to the genome-mirror 
		 	    mailing list.  
			     |  |  
 |  
 
	    | 
		    |  | 
|---|
 |  | 
Install Apache server and MySQL server
Get all the html files and most of the text files 
Get the data for each individual genome assembly and install
databases  
Obtain the /gbdb data file area 
Set up database 
Get executable files
Set up Hgcentral session preference table
Create a $WEBROOT/trash apache user writable directory for temp files
Set up HgFixed database 
Set up the protein databases 
Set up the UniProt database 
Set up the visiGene database 
 
 
 
 
1. Install Apache server and MySQL server
 
We will not provide consulting on the operation or configuration
of Apache or MySQL. If you are not familiar with the setup of
Apache or MySQL, you will have to find someone at your site that
is. We use a reference platform of CentOS 5.5 for all the steps
described here.
Beginning early March 2005, the static html pages for the browser expect
Apache to be configured with the XbitHack option enabled.
#
# Executable files will be processed for SSI
#
XBitHack on
Find the location of your web pages. For example, on a CentOS Linux
system using a stock apache RPM, the web pages are stored in
/var/www/html. We will refer to this directory as
WEBROOT in this document and you should substitute your real
path for WEBROOT whenever you see that in our write-up. But, before
doing this, you MUST be aware of the following:
 
THE FOLLOWING ASSUMES YOU ARE NOT ALREADY SERVING UP
DATA USING YOUR APACHE SERVER. BEAR IN MIND THE SELECTION OF
THE WEBROOT WILL OVERWRITE ANY EXISTING DATA IN THAT AREA.
Find the location of your cgi-bin directory. It should be under
the parent directory of the WEBROOT directory. For example, on
a CentOS Linux system using a stock apache RPM, the path of
the cgi-bin directory is /var/www/cgi-bin. We will refer to
this directory as CGI_BIN in this document and you should substitute your
real path for CGI_BIN whenever you see that.
Next, find the location of your MySQL data. For example, on CentOS Linux system using stock RPM's, this is located in
/var/lib/mysql. We will refer to this directory as
MYSQLDATA in this document and you should substitute
your real path for MYSQLDATA whenever you see that.
 
 
2. Get all the html files and most of the text files
 
Obtain the software package rsync. If it isn't already
installed on your system, obtain it from
http://rsync.samba.org
 Test the rsync connection:
 
rsync -nav --progress rsync://hgdownload.cse.ucsc.edu
This should respond with: 
genome          UCSC Human Genome Downloads
htdocs          UCSC Human Genome Web Site Htdocs
goldenPath      UCSC Human Golden Path Downloads
cgi-bin         UCSC Human Genome Web Site CGI Binaries x86_64
cgi-bin-i386    UCSC Human Genome Web Site CGI Binaries i386
gbdb            UCSC Human Genome Browser Gbdb Config Files
archives        UCSC Human Genome Browser Archived Config Files
mysql           UCSC Human Genome Raw Mysql Tables
Determine the destination of the copy ($WEBROOT) and fire off the production copy (270 Mb)The trailing slash is important!
 
 
rsync -avP rsync://hgdownload.cse.ucsc.edu/htdocs/ $WEBROOT/
 
3. Get the data for each individual genome assembly and installing databases
 
Determine which of the databases you are going to mirror.  To see all
available databases, use the "SHOW DATABASES;" command on the 
public MySQL
server.
Get the data for each of the desired databases.  For instance, to get the
Human March 2006 full data set, do:
 
mkdir -p $WEBROOT/goldenPath/hg18/database/
rsync -avP --delete --max-delete=20 \
  rsync://hgdownload.cse.ucsc.edu/genome/goldenPath/hg18/database/ \
  $WEBROOT/goldenPath/hg18/database/
 
4. Obtain the /gbdb data file area
You will need the portions of /gbdb used by the browser:
 
rsync -avP --delete --max-delete=20 \
  rsync://hgdownload.cse.ucsc.edu/gbdb/ /gbdb/
 
5. Set up database
 Use the following table to identify the freeze
date ($FREEZEDATE) and the database version ($DBVERSION) for each genome
assembly:
 
 
| $FREEZEDATE |  | $DBVERSION |  | --------------- |  | --------------- |  | hg18 |  | hg18 |  | hg17 |  | hg17 |  | hg16 |  | hg16 |  | 10april2003 |  | hg15 |  | panTro2 |  | panTro2 |  | panTro1 |  | panTro1 |  | canFam2 |  | canFam2 |  | canFam1 |  | canFam1 |  | bosTau2 |  | bosTau2 |  | bosTau1 |  | bosTau1 |  | mm8 |  | mm8 |  | mm7 |  | mm7 |  | mm6 |  | mm6 |  | mm5 |  | mm5 |  | rn4 |  | rn4 |  | rnJun2003 |  | rn3 |  | rnJan2003 |  | rn2 |  | galGal3 |  | galGal3 |  | galGal2 |  | galGal2 |  | monDom4 |  | monDom4 |  | monDom1 |  | monDom1 |  | xenTro2 |  | xenTro2 |  | xenTro1 |  | xenTro1 |  | danRer4 |  | danRer4 |  | danRer3 |  | danRer3 |  | danRer2 |  | danRer2 |  | danRer1 |  | danRer1 |  | tetNig1 |  | tetNig1 |  | fr1 |  | fr1 |  | ci1 |  | ci1 |  | dm2 |  | dm2 |  | dm1 |  | dm1 |  | droYak1 |  | droYak1 |  | droAna1 |  | droAna1 |  | dp2 |  | dp2 |  | droVir1 |  | droVir1 |  | droMoj1 |  | droMoj1 |  | apiMel2 |  | apiMel2 |  | apiMel1 |  | apiMel1 |  | anoGam1 |  | anoGam1 |  | ce2 |  | ce2 |  | ceMay2003 |  | ce1 |  | cbJul2002 |  | cb1 |  | sacCer1 |  | sacCer1 |  | scApr2003 |  | sc1 |  
After connecting to the MySQL server, create a database called "$DBVERSION"
corresponding to $FREEZEDATE in the table above.
mysql> create database $DBVERSION;
Create tables for the "$DBVERSION" database.
Issue permission to the "$DBVERSION" database.Permission could be issued as follows:
 
 
mysql> grant SELECT, CREATE TEMPORARY TABLES on $DBVERSION.*
   to $USERNAME@$HOSTNAME identified by "$PASSWORD";
 To make this command work, you need to set up $USERNAME,$HOSTNAME and
$PASSWORD properly. In this example, public access to the "$DBVERSION"
database is restricted to read-only.
 6. Get executable files
Pre-compiled (by gcc 4.1.2 (Red Hat 4.1.2-50)) x86_64 64-bit
binaries can be fetched with the rsync command:
 rsync -avP rsync://hgdownload.cse.ucsc.edu/cgi-bin/ $CGI_BIN/
 
 There are a number of data files that are also used in this directory.
This rsync will fetch them all.  If you need i386 (x86) 32-bit binaries, please
use the following rsync in addition to and after the above rsync,
to replace the 64 bit binaries:
 
 rsync -avP rsync://hgdownload.cse.ucsc.edu/cgi-bin-i386/ $CGI_BIN/
 
 
In the CGI_BIN directory, make an 
hg.conf file, 
in which $USERNAME,$HOSTNAME and $PASSWORD used for the "$DBVERSION" database should be specified. 
Remember, variables should not be used in this file.
 
Set up environment variables needed by the Makefile to install the binaries for the website in the correct place.
setenv GLOBAL_CONFIG_FILE $CGI_BIN/hg.conf
setenv HGCGI $CGI_BIN
Download the released zipped version of the source files from 
here.
Follow the instructions for how to compile
the files.
 The source tree can also be obtained via Git.
 
Existing Mirror Sites: Please note the change history for the
hgcentral database.  Table structure changes have corresponding
structure changes in the source code.  The browser version can
be seen in the title window decoration of the genome browser display
page.  Or the source tree file  src/hg/inc/versionInfo.h 
since v62.
  Change history for hgcentral database tables: 
 2005-03-28 table liftOverChain structure change, several fields added.
	(browser version v102)
 2004-12-21 added two new tables: clade and genomeClade to support
	new pulldown menus in the gateway page.  (browser version v93)
 2004-08-02 table blatServers has canPcr field added
	(browser version v74)
 2004-06-01 table dbDb has hgNearOk, hgPbOk and sourceName fields added
	(browser version v66)
 2004-04-16 table liftOverChain added (browser version v60)
 
7. Set up the "hgcentral" tables 
Download the schema for the hgcentral database  here.
Create a hgcentral database 
 
   mysql> create database hgcentral
 
Add the hgcentral tables 
 
   mysql -youraccountoptions hgcentral < hgcentral.sql
Create a user/password with the ability to update and insert. Many folks
 start out with the same user that they use for reading browser databases
 and then later create a separate user with higher privilege. However, to
 get started, consider trying the same user.
Add that user to the hg.conf. A sample hg.conf is below:
###########################################################
# Config file for the UCSC Human Genome server.
#	This file specifies the host and user for MySQL
#	database access.
#
# the format is in the form of name/value pairs
# written 'name=value' (note that there is no space between
# the name and its value.)
###########################################################
# db.host is the name of the MySQL host to connect to
db.host=localhost
# db.user is the username is use when connecting to the host
#	This user only needs SELECT permissions for read-only access
db.user=myhguser
# this is the password to use with the above hostname
db.password=myhguserpassword
db.trackDb=trackDb
###########################################################
# central.host is the name of the host of the central MySQL
# database where items common to all versions of the genome
# and the user database is stored.  central.db is the name of
# the database to access on that host for this information.
central.host=localhost
central.db=hgcentral
#
# The central.user needs SELECT, INSERT, UPDATE, DELETE,
#	CREATE, DROP and ALTER permissions for hgcentral
#	to allow maintainence of the session and user Db's
#
central.user=myhguser
central.password=myhguserpassword
central.domain=.mydomain.edu
###########################################################
# backupcentral is used when the primary central DB fails.
#	This can be identical to the central entries as above.
backupcentral.host=localhost
backupcentral.db=hgcentral
backupcentral.user=myhguser
backupcentral.password=myhguserpassword
backupcentral.domain=.mydomain.edu
# Change this default documentRoot if different in your installation,
#       to allow some of the browser cgi binaries to find help text
#       files
browser.documentRoot=/usr/local/apache/htdocs
#  New browser function as of March 2007, allowing saved genome browser
#       sessions into genomewiki
wiki.host=genomewiki.ucsc.edu
wiki.userNameCookie=wikidb_mw1_UserName
wiki.loggedInCookie=wikidb_mw1_UserID
# New browser function as of March 2007.  Future browser code will
#       have this on by default, and can be turned off with =off
#   Initial release of this function requires it to be turned on here.
browser.indelOptions=on
#       personalize the background of the browser with a specified jpg
#       floret.jpg is the standard UCSC default
browser.background=/images/floret.jpg
#  new option for track reording functions, August 2006
hgTracks.trackReordering=on
#  New browser function as of April 2007, custom track data is kept
#       in a database instead of in trash files.  This function requires
#       several other factors to be in place before it will work.
#  In this first implementation, this is an optional feature, but
#       approximately by the end of the year 2007, this will be
#       required.
#
#       See also:
#       http://genomewiki.ucsc.edu/index.php?title=Using_custom_track_database
#  Uncomment these settings and provide host, user, and password
#  settings
# customTracks.host=<your specific host name>
# customTracks.user=<your specific MySQL user for this function>
# customTracks.password=<MySQL password for specified user>
# customTracks.useAll=yes
# customTracks.tmpdir=/data/tmp
#       tmpdir of /data/tmp is the default location if not specified
#       here
#       Set this to a directory as recommended in the genomewiki
#       discussion mentioned above.
 
8. Create a "trash" directory The cgi programs use a temporary area to create and store images
used by the browser. This directory is by default looked for in
$WEBROOT/trash. You should make this directory and allow the user
that runs the web server write access to it.
9. Create the hgFixed database
Download a copy of the dumped hgFixed tables.
rsync -avP --delete --max-delete=20 \
   rsync://hgdownload.cse.ucsc.edu/genome/goldenPath/hgFixed/ \
   $WEBROOT/goldenPath/hgFixed/
Create a hgFixed database
mysql> create database hgFixed
 Import the data in a similar fashion as listed above. 
 
10. Create the protein databases
Download a copy of the dumped protein tables.
rsync -avP --delete --max-delete=20 \
    rsync://hgdownload.cse.ucsc.edu/genome/goldenPath/proteinDB/ \
    $WEBROOT/goldenPath/proteinDB/
Get a list of all of the protein databases like so:
 
mysql> SELECT DISTINCT(proteomeDb) FROM hgcentral.gdbPdb ORDER BY proteomeDb;
Create a database for each one, e.g.:
mysql> create database proteins080707
mysql> create database proteins090821
and so on
Import the data from proteinDB/proteins*/database in a similar
	fashion as listed above.
Create a symlink to the most recent proteins database, e.g.:
	/var/lib/mysql/proteome -> proteins090821
 
11. Create the UniProt databases
Please note usage rights for the UniProt database.
Download a copy of the dumped UniProt tables.
rsync -avP --delete --max-delete=20 \
  rsync://hgdownload.cse.ucsc.edu/genome/goldenPath/uniProt/ \
  $WEBROOT/goldenPath/uniProt/
Get a list of all of the UniProt databases like so:
 
mysql> SHOW DATABASES LIKE 'sp%';
Create a database for each one, e.g.:
mysql> create database sp080707
mysql> create database sp090821
and so on
Import the data from sp*/database in a similar fashion as listed above.
Create a symlink to the most recent uniProt database, e.g.:
	/var/lib/mysql/uniProt  -> sp090821
 
12. Create the visiGene database
Download a copy of the dumped visiGene tables.
rsync -avP --delete --max-delete=20 \
   rsync://hgdownload.cse.ucsc.edu/genome/goldenPath/visiGene/ \
   $WEBROOT/goldenPath/visiGene/
Create a visiGene database
mysql> create database visiGene
 Import the data in a similar fashion as listed above. 
 Should you have any comments or questions, please contact
genome-mirror@cse.ucsc.edu
.
 This page last modified: .
 
 |  |  
 |  |