---
title: "scMetaTraj workflow"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{scMetaTraj workflow}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 6,
  fig.height = 4.5
)
```

## Overview

`scMetaTraj` models metabolism as a continuous state space derived from
pathway-level scores rather than a secondary annotation layered onto
transcriptomic clustering. The package supports:

- pathway/module scoring from a Seurat object or expression matrix
- metabolic state space embedding
- metabolic subclustering
- metabolic pseudotime inference
- trend and switchpoint analysis along metabolic pseudotime

This vignette uses a small simulated example so that it remains portable and
does not depend on local files or large external datasets.

## Simulate a small expression matrix

```{r}
library(scMetaTraj)

set.seed(2026)

expr <- matrix(
  rexp(14 * 100, rate = 1),
  nrow = 14,
  ncol = 100,
  dimnames = list(
    c(
      "HK1", "PFKP", "LDHA", "GPI", "CS", "ACO2", "IDH3A",
      "NDUFA1", "COX4I1", "ATP5F1A", "G6PD", "PGD", "ACLY", "FASN"
    ),
    paste0("Cell", seq_len(100))
  )
)

gene_sets <- list(
  Glycolysis = c("HK1", "PFKP", "LDHA", "GPI"),
  TCA = c("CS", "ACO2", "IDH3A"),
  OXPHOS = c("NDUFA1", "COX4I1", "ATP5F1A"),
  PPP = c("G6PD", "PGD"),
  Lipid = c("ACLY", "FASN")
)
```

## Score metabolic modules

```{r}
scores <- scMetaTraj_score(
  x = expr,
  gene_sets = gene_sets,
  method = "mean",
  min_genes = 2,
  scale = FALSE
)

dim(scores)
colnames(scores)
```

## Embed cells in metabolic space

`scMetaTraj_embed()` returns PCA coordinates for analysis or UMAP coordinates
for visualization.

```{r}
emb_pca <- scMetaTraj_embed(scores, method = "PCA", n_pcs = 4)
emb_umap <- scMetaTraj_embed(scores, method = "UMAP", n_pcs = 4)

head(emb_pca)
head(emb_umap)
```

## Identify metabolic subclusters

```{r}
clusters <- scMetaTraj_cluster(
  embedding = emb_pca,
  k = 12,
  method = "louvain"
)

table(clusters)
```

Cluster-level summaries can be generated with `scMetaTraj_cluster_profile()`.

```{r}
profile_df <- scMetaTraj_cluster_profile(scores, clusters, stat = "mean")
head(profile_df)
```

## Infer metabolic pseudotime

```{r}
traj <- scMetaTraj_infer(
  embedding = emb_pca,
  k = 12,
  root_mode = "pc1_min"
)

summary(traj$mPT)
traj$root
```

The mPT distribution helper prepares ordered cluster labels along the
trajectory:

```{r}
dist_df <- scMetaTraj_mPT_distribution(traj$mPT, clusters)
head(dist_df)
```

## Track module trends along mPT

```{r}
gly_trend <- scMetaTraj_trend(
  scores = scores[, "Glycolysis"],
  mPT = traj$mPT,
  n_bins = 20
)

head(gly_trend)
```

To compare several modules at once:

```{r}
multi_res <- scMetaTraj_trend_multi(
  score_mat = scores,
  mPT = traj$mPT,
  modules = c("Glycolysis", "TCA", "OXPHOS"),
  n_bins = 20
)

head(multi_res$trend_long)
multi_res$switchpoints
```

```{r fig.cap="Example trend plot for several metabolic modules."}
scMetaTraj_plot_trend_multi(
  multi_res$trend_long,
  multi_res$switchpoints
)
```

## Interpret results

The workflow above illustrates the intended package logic:

1. summarize gene expression into curated metabolic modules
2. analyze cells in module-defined space rather than transcriptome-wide space
3. reconstruct graph-based metabolic pseudotime
4. quantify where module activity changes along the inferred trajectory

In real analyses, the same workflow can be applied to Seurat objects and
larger curated metabolic gene set collections, while keeping the vignette
itself lightweight and fully reproducible.