| 
      | 
          | 
	      | Genome Graphs User's Guide |  
 |  |  
     
      | 
          | 
	      |  | 
|---|
 |  | 
		Genome Graphs is a tool for displaying genome-wide data sets 
		such as the results of genome-wide SNP association studies, 
		linkage studies and homozygosity mapping. 
		Using the Genome Graphs tool, you can:
                 
                 upload several sets of genome-wide data and display
                     them simultaneously click on an area of interest and go directly to the
                     genome browser at that position set a significance threshold for your data and view
                     only regions that meet that threshold view the genes that exist in areas where your data
                     meet your significance threshold 
		To return to Genome Graphs from any other location on the 
		Genome Browser website, use your browser's Back button, or 
		press Home on the blue navigation bar, then 
		press the Genome Graphs link.
		 
		Note that only the "standard" chromosomes are 
		displayed in the Genome Graphs display; haplotype and 
		mitochondrial chromosomes are not displayed.
		 
		This User's Guide is aimed at both the novice Genome Graphs user
		as well as the advanced user. If you are new to the Genome Graphs 
		tool, read the Quick Start section to 
		learn about the basics using some sample data. Advanced users 
		may want to proceed directly to the section that addresses 
		a particular area of functionality in detail. |  |  
 |  |  
  
 | 
  | 
    | Formatting, Uploading & Importing Data |  
     |  | 
|---|
 |  | 
     Formatting Data Genome Graphs allows you to upload data from files that reside 
  on your computer. Several file formats are accepted by the 
  program.  For all formats there is a single line for each marker. 
  Each line starts with information on the marker, and ends with 
  the numerical values associated with that marker. The markers 
  can be of one of the following types: 
  | — | chromosome base:  e.g. chr1 130000 
  (Note that the first base in a chromosome is considered position 0.) |  | — | STS Marker: e.g. RH75228 |  | — | dbSNP rsID: e.g. rs12345 |  | — | Affymetrix 500k Gene Chip: e.g. SNP_A-1780270 |  | — | Affymetrix Genome-Wide SNP Array 6: e.g. SNP_A-8575125 |  | — | Affymetrix SNP Array 6 Structural-Variation: e.g. CN_47396 |  | — | Illumina HumanHap300 Bead Chip: e.g. rs3934834 |  | — | Illumina HumanHap550 Bead Chip: e.g. rs3094315 |  | — | Illumina HumanHap650 Bead Chip: e.g. rs3094315 |  | — | Agilent CGH 244A: e.g. A_14_P112718 |  The marker-value pairs in each line of the file can be separated 
  with a single space, a tab, or a comma. The file can 
  contain multiple values for each marker.  In that case, a 
  separate graph will be created for each value column in the input file. For example, chromosome base markers with only one 
  value associated with the marker would be entered like this:chrX 100000 1.23 dbSNP rsID markers with two values associated 
  with the marker would be entered like this:rs10218492 0.384 0.882 The Genome Graph program will map the marker IDs to the genome. 
  In cases where the marker maps to more than one location in 
  the genome, the value(s) in your input file will be associated 
  with each location. If the value associated with your marker is positive, do not 
  include a sign (e.g. '+').  Include a sign ('-') only if the value is
  negative. Note that markers can only be mapped to assemblies for which 
  there already exists a track of the type that contains your 
  marker type. You can not, for example, use dbSNP rsID markers 
  for the cow genome, as it does not have a SNP track. 
   Uploading Data
 Once you have created your input file, you must upload it to
  Genome Graphs. From the main Genome Graphs page, choose your
  clade, genome, and assembly to which your data pertains.
  If you are unsure of the UCSC assembly name, you can check
  this page.
  Now, press the upload button to go to the upload page. To upload a file in any of the supported formats, locate the 
  file on your computer using the controls next to file 
  name, and then submit. The other controls on this form are 
  optional, though filling them out will sometimes enhance the 
  display. In general the controls that default to "best 
  guess" can be left alone, since the guess is almost always 
  correct. The controls for display min and max values and connecting 
  lines can be set later via the configuration page as well. 
  Here is a description of each control. 
  name of data set: Displayed in graph drop-down in 
   Genome Graphs and as the track name in Genome Browser. Only the
   first 16 characters are visible in some contexts. For data 
   sets with multiple graphs, this is the first part of the 
   name, shared with all members of the data set.description:  A short sentence describing the 
   data set. Displayed in the Genome Graphs and Genome Browser 
   configuration pages, and as the center label in the Genome Browser.file format: Controls whether the upload file is 
   a tab-separated, comma-separated, or space separated table.markers are: Describes how to map the data to 
   chromosomes. The choices are that either the first column of the 
   file is an ID of some sort, or the first column is a chromosome 
   and the next a base. The IDs can be SNP rs numbers, STS marker 
   names or ID's from any of the supported genotyping platforms.column labels: Controls whether the first row of the 
   upload file is interpreted as labels or data. If the 
   first row contains text in the numerical fields, or if the 
   mapping fields are empty, it is interpreted by "best 
   guess" as labels. This is generally correct, but you can 
   override this interpretation by explicitly setting the control.display min value/max value: Set the range of the 
   data set that will be plotted. If left blank, the range will 
   be taken from the min/max values in the data set itself. For 
   all data sets to share the same scale, you will usually need 
   to set this.label values: A comma-separated list of numbers 
   for the vertical axis. If left blank, the axis will be 
   labeled at the 1/3 and 2/3 points of your data range.draw connecting lines: Lines are drawn connecting data points 
   that are separated by this number of bases or fewer.file name, or Paste URLs or data: Specify the uploaded data --
   enter either a file on your local computer; or a URL at which the data file can be
   found; or simply paste-in the data. If entries are made in both fields, the file name will take
   precedence.  Importing Data
 In addition to supplying your own genome-wide data files, you can also
import existing database tables from an assembly into the 
Genome Graphs tool.  Any table containing positional information can be 
imported.  This includes tables of the following types: BED, PSL, wiggle, 
MAF, and bedGraph.  Custom track tables can be imported 
as well.  The tables made by Genome Graphs (chromGraph) can not be imported 
as they are already in the format used by the tool, thus no conversion is 
necessary.  All tables imported into Genome Graphs will be converted into 
a custom track of type chromGraph using a window-size of 10,000 bases. To import a table or custom track, choose the group, track, and table 
from the lists, then press the submit button.  The other controls are optional, 
though completing them will enhance the display. The controls for display 
min and max values and connecting lines can be set later via the configuration 
page as well. Here is a description of each control. 
  name of data set: This will be displayed in the graph list 
  in the Genome Graphs tool and as the track name in the Genome Browser. 
  Only the first 16 characters are visible in some contexts. For data 
  sets with multiple graphs, this is the first part of the name, shared 
  with all members of the data set.description: Enter a short sentence describing the data set. 
  It will be displayed in the Genome Graphs tool and in the Genome Browser.display min value/max value: Set the range of the data set to 
  be plotted. If left blank, the range will be taken from the min and max 
  values in the data set itself. If you would like all of your data sets to 
  share the same scale, you will need to set this.label values: A comma-separated list of numbers for the vertical 
  axis. If left blank the axis will be labeled at the 1/3 and 2/3 point.draw connecting lines: Lines connecting data points separated 
  by no more than this number of bases are drawn.depth or coverage: When importing positional tables, you 
  can choose to convert those tables to the chromGraph format by using 
  either the depth or coverage conversion method. Both 
  conversion methods use a non-overlapping window size of 10,000 bases 
  when converting to the chromGraph format. In the depth method, 
  the weighted average for each 10,000 base window is assigned to a single
  point in the center of this window.  Whereas the coverage method 
  is binary &mdash if there is even one point in the input table in that 
  10,000 base window, the resulting graph will have a value of 1 for that 
  range. 
   |  |  
 |  |  
 
| 
 | 
   |  | 
|---|
 |  | 
   Use the examples in this section of the User's Guide to get a feel for how 
   the tool works. Refer to other sections 
   in this User's Guide for details and instructions for more advanced features.
    
   The Genome Graphs tool comes pre-loaded with sample data.  These sample 
   data sets are from real-world genome-wide studies.  Use these data sets to 
   quickly see what the tool looks like when data is displayed.  To
   view the sample data, choose a data set from the 
   graph drop-down list, then choose your desired display color from 
   the in drop-down list.  The tool will display the data set 
   directly above the chromosomes in Genome Graphs. 
   Read on to learn how to 
   customize the display.
    Example #1 — SNPs on chr22
 
   Follow these steps to display in Genome Graphs all of the highest quality 
   SNPs on chromosome 22 for the hg18 assembly whose predicted functional 
   role is "coding non-synonymous" 
   (where there is a change in the peptide for the allele with respect to 
   the reference assembly). Note that there are no SNPs on the p-arm of 
   chromosome 22. 
    
   This data set is formatted in the "marker 
   value" 
   style. The markers are dbSNP rsIDs. The associated 
   value is +1 if the SNP is on the positive strand, and 
   -1 if the SNP is on the negative strand. Here are the first 
   ten rows of the data file:
 
 
rs1007298       +1
rs1007863       +1
rs10154509      +1
rs10154678      +1
rs10154785      +1
rs1018448       +1
rs10212022      +1
rs1022478       +1
rs1042311       +1
rs1042435       +1
 Step 1. Upload the data into the Genome Graphs toolCopy the entire sample data set 
   into a text editor and save the file to your computer. This data 
   set is associated with the human assembly: hg18 
   (Mar. 2006). Be sure to 
   configure the Genome Graphs tool to use the hg18 assembly 
   like so:
 
 
 
clade:		Vertebrate
genome:		Human
assembly:	Mar. 2006
Upload the file into the Genome Graphs tool. 
   You can configure each control on the upload 
   page, or just leave them set to their default values. The upload process may take some time, as the program is actually 
   mapping each rsID in the input file to its location(s) in the genome.
 Step 2. Display the graph in Genome GraphsNow that your input file has been uploaded to the server, you will want 
   to display it in the Genome Graphs tool. To display your uploaded data, 
   simply choose the graph name from the 
   graph drop-down list, then choose your desired display color 
   from the in drop-down list. 
   Your graph will be displayed directly above the chromosomes 
   in Genome Graphs. You should see the data plotted directly 
   above chromosome 22.
 Step 3. View the graph in the Genome BrowserFrom the Genome Graphs display, press anywhere on the graph 
   or on chromosome 22 to open the
   Genome Browser for hg18 centered at that location on chr22. 
   The graph will be drawn as a track near the top of the Genome
   Browser display.
 
    
    |  |  
 |  |  
 
| 
 | 
  | Displaying Data in Genome Graphs |  
   |  | 
|---|
 |  | 
   Once you have uploaded your data, you will want to display it in the 
   Genome Graphs tool. To display your uploaded data, simply choose the 
   graph name from the 
   graph drop-down list, then choose the color in which you 
   would like it to be displayed from the in drop-down list. 
   Your graph will be displayed directly above the chromosomes 
   in Genome Graphs. Read on to learn how to 
   customize the display.
    
                 Configuring the Display
 
		Configuring the graphs displayTo go to the configuration page, press the configure button
		on the main Genome Graphs page.  This is the page from which you can
		configure many overall aspects of the Genome Graphs display.
		Individual graphs can also be configured (see the next section
		for help on that).
 
		On this page you will find the following controls:
		 
		image width - controls the overall width of the graphs
			display on the main Genome Graphs page.  The default is
			620 pixels.graph height - controls the height of the graph(s) in
			the space above each chromosome.  The default is
			27 pixels.graphs per line - controls how many graphs are displayed
			on each line in the space above each chromosome. For example,
			if you set this value to two, the display will superimpose
			two graphs on top of each other on one line.  The axis label
			for the first graph will appear on the left side of the display
			and the axis for the second graph on the right side.lines of graphs - controls how many sets of graphs will
			appear above each chromosome.  For example, if you set this
			value to 2, the display will make room for two lines of
			graphs (each at the graph height above) in the space
			above each chromosome.
		chromosome layout - controls how the chromosomes are laid
			out in the Genome Graphs display. You can choose to view one
			or two chromosomes on each horizontal line in the display. 
			Alternatively, you can set up the display such that all of the
			chromosomes appear in one long line. If you choose this layout,
			you may want to adjust the width of the image (image width 
			above).numerical labels - check this box if you would like to see
			axis labels to the right/left of the display. If you did not specify
			label values when you uploaded your file, the numerical
			labels will default to 1/3 and 2/3 of the max and min values in
			your data input file.highlight missing - check this box if you would like to
			see the areas in your graph where there is no data. Note that if
			you are displaying more than one graph, this attribute only
			pertains to the first graph.region padding - controls the size of the data 
                       regions. The data points in your graphs which exceed 
		       the significance threshold are padded by this number
		       of bases on either side.  
		       The default places 25,000 bases on each side. 
		When you have completed configuring the display, press the 
		submit
		button to return to the Genome Graphs display.
		 
		Configuring individual graphs
 
		Near the bottom of the Configuration page, you will see a list of
		the graphs that you have uploaded. Click on the hyperlinked graph
		name to configure that graph. This configuration pertains to 
		the Genome Graphs view. 
		 
		You can set the range of the display 
		by editing the display min/max value values. This will
		restrict the Genome Graphs display for this graph to that
		data range. The axis will be labeled at 1/3 and 2/3 of the 
		data range that you set.
		 
		If your data is sparse, you may want to draw lines between
		your data points. You can configure that by editing the
		draw connecting lines between markers separated by
		up to ... bases value.  The default value is 25,000,000
		bases.
		 
		When you have completed configuring the display, press the
		submit
		button twice to return to the Genome Graphs display.
		 
                 Setting a Significance Threshold
 
		Most genome-wide data has some amount of noise and is only interesting when 
		the data values are above a certain value.  You can set this value using the 
		significance threshold input box.  Enter a decimal number in this input box
		and press Enter.  The display will now have a light gray line across the 
		graph at this data value.  If you have more than one graph displayed, the 
		significance threshold only pertains to the graphs that contain the significance
		threshold in the displayed data range.
		 
		The significance threshold works in concert with the browse
		regions 
		and sort genes buttons; it will affect the regions 
		that are 
		displayed once you press either of these two buttons.
		 
		To open the Genome Browser with a view of all of the regions 
		in your graph 
		that include data points that pass the significance threshold,
		press the 
		browse regions button. This will open the Genome 
		Browser with a 
		navigation pane on the left side of the screen. This pane 
		will contain 
		links to all regions which pass your significance threshold.
		Note that if you are displaying more than one graph, the 
		significant regions are based only on the first graph in the 
		display list.
		 
		To view a list of genes which are in regions that pass the 
		significance 
		threshold, press the sort genes button.  This will 
		open the 
		Gene Sorter with only the genes that are in significant 
		locations 
		with respect to your data.
		 
		If you would rather view all of your regions without restricting the 
		output to only those regions that pass the significance threshold, simply 
		delete any values from the significance threshold input box and press 
		Enter before pressing browse regions.
		 
                 Setting a Data Region
 
		The data region is the span of bases that will be added to either side
		of the data points in your graphs which exceed the significance 
		threshold. Set the data region by editing the region 
		padding value on the configuration page. 
		The combination of setting the data region and the significance 
		threshold will affect two things:
		 
		the regions displayed in the 
			Genome Browser 
			after you press the browse regions button,the genes displayed in the 
			Gene Sorter 
			after you press the sort genes button. 
		For example, take a data set that contains the following data:
		 
		chr2 100100000 2.3chr2 100100500 4.5
 chr2 100101000 1.2
 
 If you set the significance threshold at 4.0, one data 
		point
		in the data set passes that threshold. If you then set the data 
		range to 200, then the one significant data point will be 
		padded on each side by 200 base pairs.  In that case, the 
		only resulting significant data region will be 
		chr2:100,100,300-100,100,700.
 If instead you set the data range
		to 2,000, then the one significant data point will be
		padded on each side by 2,000 base pairs.  In that case, the 
		resulting significant data region will be 
		chr2:100,098,500-100,102,500.
 
	       |  |  
 |  |  
     
      | 
          | 
	      | Viewing Data in the Genome Browser |  
	      |  | 
|---|
 |  | 
		To view your graphs in the Genome Browser, press the 
		browse regions button. This will open the Genome
		Browser with your graph(s) displayed as track(s). You can
		configure and edit your track 
		as you can any other track in the Genome Browser. 
		In addition to the Genome Browser, you will also see a pane 
		on the left-hand side, which contains links to all of the 
		significant regions in your data.
		Please note that if you are displaying more than one graph in 
		Genome Graphs, the significant regions are based only on the 
		first graph in the display list. 
		 
		You can also navigate to the Genome Browser by clicking directly 
		on a graph or chromosome in Genome Graphs. The Genome Browser 
		will open with a 1,000,000 bp window centered on the location 
		on which you clicked.
		 
	       |  |  
 |  |  
     
      | 
          | 
	      | Viewing Data in the Gene Sorter |  
	      |  | 
|---|
 |  | 
		To view the set of genes that are in 
		significant regions in your data, 
		press the sort genes button. This will open the Gene
		Sorter with a filter to include only genes that are located 
		in regions in your input data that are above the significance 
		threshold. Please note that if you are displaying more than 
		one graph in Genome Graphs, the significant genes are based 
		only on the first graph in the display list.
		 
		If the graph was uploaded using markers, then a custom 
		Gene Sorter column with the same name as the graph 
		will be created.  This column will list all markers for each
                gene that contain values above the significance threshold.
		 
	       |  |  
 |  |  
     
      | 
          | 
	      |  | 
|---|
 |  | 
		There are several ways to delete your data once it has 
		been uploaded. If you are viewing your data as a track in 
		the Genome Browser, you can click on the mini-button 
		or track control for the track and delete the track 
		using the Remove custom track button. You can also
		choose to reset your cart which will reset the browser 
		interface settings to their defaults, as well as delete 
		all custom tracks and data.  Do this by visiting the 
		gateway page and pressing the hyper link: "Click 
		here to reset".
		 
		Your data will be saved on our server for at least 48 hours 
		from the time you last access it, unless it is saved in a
                Session.
		 
	       |  |  
 |  |  
     
      | 
          | 
	      |  | 
|---|
 |  | 
		To calculate how well correlated with one another your data 
		sets are, press the correlate button. This will 
		calculate and display the correlation coefficient (R) 
		among each of your data sets. R, also known as Pearson's 
		correlation coefficient, is a measure of the extent that two 
		graphs move together. The value of R ranges between 
		-1 and 1. A positive R indicates that the graphs 
		tend to move in the same direction, while a negative R 
		indicates that they tend to move in opposite directions.
		R-Squared (which is indeed just R*R) measures 
		how much of the variation in one graph can be explained by 
		a linear dependence on the other graph. R-Squared 
		ranges between 0 when the two graphs are independent to 1 
		when the graphs are completely dependent.
		 
		To return to the Genome Graphs, press the return to 
		graphs button.
		 
	       |  |  
 |  |  
   |