---
title: "Using ggpicrust2"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Using ggpicrust2}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 7,
  fig.height = 5
)
```

## Introduction

`ggpicrust2` provides a practical workflow for PICRUSt2 downstream analysis:

- convert KO profiles to pathway-level abundance when needed
- run differential abundance analysis with multiple methods
- annotate and visualize pathway-level results
- inspect taxa-level contribution using PICRUSt2 per-sequence outputs
- perform GSEA when pathway-set analysis is more appropriate than single-feature testing

This vignette focuses on the general package workflow. For a deeper GSEA walkthrough, see the dedicated `gsea_analysis` vignette.

## Installation

```{r installation, eval = FALSE}
# install.packages("devtools")
devtools::install_github("cafferychen777/ggpicrust2")

library(ggpicrust2)
library(tibble)
```

## One-command workflow

If you want a fast end-to-end run from abundance data to annotated differential abundance output, start with `ggpicrust2()`:

```{r one-command, eval = FALSE}
data("ko_abundance")
data("metadata")

results <- ggpicrust2(
  data = ko_abundance,
  metadata = metadata,
  group = "Environment",
  pathway = "KO",
  daa_method = "LinDA",
  ko_to_kegg = TRUE,
  order = "pathway_class",
  p_values_bar = TRUE,
  x_lab = "pathway_name"
)

# Access the main outputs
results[[1]]$plot
head(results[[1]]$results)
```

Use this route when you want a fast default analysis. Use the stepwise workflow below when you need more control over data preparation or visualization.

## Stepwise pathway workflow

### Convert KO abundance to KEGG pathway abundance

```{r ko-to-kegg, eval = FALSE}
kegg_pathway_abundance <- ko2kegg_abundance(data = ko_abundance)
head(kegg_pathway_abundance[, 1:3])
```

### Run differential abundance analysis

```{r daa, eval = FALSE}
daa_results <- pathway_daa(
  abundance = kegg_pathway_abundance,
  metadata = metadata,
  group = "Environment",
  daa_method = "ALDEx2"
)

head(daa_results)
```

### Annotate pathway results

```{r annotation, eval = FALSE}
annotated_daa <- pathway_annotation(
  pathway = "KO",
  daa_results_df = daa_results,
  ko_to_kegg = TRUE
)

head(annotated_daa)
```

### Visualize pathway-level results

```{r errorbar, eval = FALSE}
pathway_errorbar(
  abundance = kegg_pathway_abundance,
  daa_results_df = annotated_daa,
  Group = "Environment"
)
```

```{r heatmap-pca, eval = FALSE}
sig_pathways <- annotated_daa$feature[annotated_daa$p_adjust < 0.05]

if (length(sig_pathways) > 0) {
  pathway_heatmap(
    abundance = kegg_pathway_abundance[sig_pathways, , drop = FALSE],
    metadata = metadata,
    group = "Environment"
  )
}

pathway_pca(
  abundance = kegg_pathway_abundance,
  metadata = metadata,
  group = "Environment"
)
```

## Taxa contribution workflow

PICRUSt2 can output per-sequence contribution files that explain which taxa are driving predicted functional shifts. `ggpicrust2` now supports a full contribution workflow.

### Read PICRUSt2 contribution files

```{r contrib-readers, eval = FALSE}
# For pred_metagenome_contrib.tsv
contrib_data <- read_contrib_file("pred_metagenome_contrib.tsv")

# For pred_metagenome_strat.tsv
strat_data <- read_strat_file("pred_metagenome_strat.tsv")
```

### Aggregate to a taxonomic level

```{r contrib-aggregate, eval = FALSE}
taxa_contrib <- aggregate_taxa_contributions(
  contrib_data = contrib_data,
  taxonomy = your_taxonomy_table,
  tax_level = "Genus",
  top_n = 10,
  daa_results_df = daa_results
)

head(taxa_contrib)
```

`aggregate_taxa_contributions()` accepts either:

- normalized contribution data from `read_contrib_file()`
- stratified abundance data from `read_strat_file()`

Use `daa_results_df` or `pathway_ids` when you want to focus only on pathways that were significant in your pathway-level analysis.

### Visualize taxa-level drivers

```{r contrib-plots, eval = FALSE}
taxa_contribution_bar(
  contrib_agg = taxa_contrib,
  metadata = metadata,
  group = "Environment",
  facet_by = "function"
)

taxa_contribution_heatmap(
  contrib_agg = taxa_contrib,
  n_functions = 20
)
```

This step is useful when pathway-level significance is not enough and you need to identify which taxa are contributing to the change.

## GSEA workflow

Use GSEA when you want pathway-set level inference from KO or EC abundance rather than testing each pathway independently.

```{r gsea, eval = FALSE}
gsea_results <- pathway_gsea(
  abundance = ko_abundance %>% column_to_rownames("#NAME"),
  metadata = metadata,
  group = "Environment",
  pathway_type = "KEGG",
  method = "camera"
)

annotated_gsea <- gsea_pathway_annotation(
  gsea_results = gsea_results,
  pathway_type = "KEGG"
)

visualize_gsea(
  gsea_results = annotated_gsea,
  plot_type = "barplot",
  n_pathways = 15
)
```

For a method-by-method GSEA explanation, covariate adjustment, and comparison with DAA, see the `gsea_analysis` vignette.

## Summary

The package is easiest to use when you choose the shortest path that matches your question:

- use `ggpicrust2()` for a fast default pathway workflow
- use the stepwise DAA functions when you need more control
- use the taxa contribution workflow when you need taxon-level attribution
- use `pathway_gsea()` when pathway-set enrichment is the primary question
