% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/createClusterMST.R
\name{createClusterMST}
\alias{createClusterMST}
\alias{connectClusterMST}
\alias{orderClusterMST}
\title{Minimum spanning trees on cluster centroids}
\usage{
createClusterMST(centers, outgroup = FALSE, outscale = 3)

connectClusterMST(centers, mst, combined = TRUE)

orderClusterMST(x, ids, centers, mst, start = NULL)
}
\arguments{
\item{centers}{A numeric matrix of cluster centroids where each \emph{row} represents a cluster 
and each column represents a dimension (usually a PC or another low-dimensional embedding).
Each row should be named with the cluster name.}

\item{outgroup}{A logical scalar indicating whether an outgroup should be inserted to split unrelated trajectories.
Alternatively, a numeric scalar specifying the distance threshold to use for this splitting.}

\item{outscale}{A numeric scalar specifying the scaling to apply to the median distance between centroids
to define the threshold for outgroup splitting.
Only used if \code{outgroup=TRUE}.}

\item{mst}{A \link{graph} object containing a MST, typically the output of \code{createClusterMST(centers)}.
For \code{connectClusterMSTNodes}, the MST may be computed from a different \code{centers}.}

\item{combined}{Logical scalar indicating whether a single data.frame of edge coordinates should be returned.}

\item{x}{A numeric matrix of per-cell coordinates where each \emph{row} represents a cell
and each column represents a dimension (again, usually a low-dimensional embedding).}

\item{ids}{A character vector of length equal to the number of cells,
specifying the cluster to which each cell is assigned.}

\item{start}{A character vector specifying the starting node from which to compute pseudotimes in each component of \code{mst}.
Defaults to an arbitrarily chosen node of degree 1 or lower in each component.}
}
\value{
\code{createClusterMST} returns a \link{graph} object containing an MST computed on \code{centers}.

\code{connectClusterMST} returns, by default, a data.frame containing the start and end coordinates of segments representing all the edges in \code{mst}.
If \code{combined=FALSE}, a list of two data.frames is returned where corresponding rows represent the start and end coordinates of the same edge.

\code{orderClusterMST} returns a numeric matrix containing the pseudotimes of all cells (rows) across all paths (columns) through \code{mst}.
}
\description{
Perform basic trajectory analyses with minimum spanning trees (MST) computed on cluster centroids,
based on the methodology in the \pkg{TSCAN} package.
These functions are now deprecated as they have been moved to the \pkg{TSCAN} package itself.
}
\details{
These functions represent some basic utilities for a simple trajectory analysis 
based on the algorithm in the \pkg{TSCAN} package.

\code{createClusterMST} builds a MST where each node is a cluster centroid and 
each edge is weighted by the Euclidean distance between centroids.
This represents the most parsimonious explanation for a particular trajectory
and has the advantage of being directly intepretable with respect to any pre-existing clusters.

\code{connectClusterMST} provides the coordinates of the start and end of every edge.
This is mostly useful for plotting purposes in \code{\link{segments}} or the equivalent \pkg{ggplot2} functionality.
We suggest using \code{\link{aggregateAcrossCells}} to obtain \code{centers} for multiple low-dimensional results at once.

\code{orderClusterMST} will map each cell to the closest edge involving the cluster to which it is assigned.
(Here, edges are segments terminated by their nodes, so some cells may simply be mapped to the edge terminus.)
It will then calculate the distance of that cell along the MST from the starting node specified by \code{start}.
This distance represents the pseudotime for that cell and can be used in further quantitative analyses.
}
\section{Introducing an outgroup}{

If \code{outgroup=TRUE}, we add an outgroup to avoid constructing a trajectory between \dQuote{unrelated} clusters.
This is done by adding an extra row/column to the distance matrix corresponding to an artificial outgroup cluster,
where the distance to all of the other real clusters is set to \eqn{\omega/2}.
Large jumps in the MST between real clusters that are more distant than \eqn{\omega} will then be rerouted through the outgroup,
allowing us to break up the MST into multiple subcomponents by removing the outgroup.

The default \eqn{\omega} value is computed by constructing the MST from the original distance matrix,
computing the median edge length in that MST, and then scaling it by \code{outscale}.
This adapts to the magnitude of the distances and the internal structure of the dataset
while also providing some margin for variation across cluster pairs.
Alternatively, \code{outgroup} can be set to a numeric scalar in which case it is used directly as \eqn{\omega}.
}

\section{Confidence on the edges}{

For the MST, we obtain a measure of the confidence in each edge by computing the distance gained if that edge were not present.
Ambiguous parts of the tree will be less penalized from deletion of an edge, manifesting as a small distance gain.
In contrast, parts of the tree with clear structure will receive a large distance gain upon deletion of an obvious edge.

For each edge, we divide the distance gain by the length of the edge to normalize for cluster resolution.
This avoids overly penalizing edges in parts of the tree involving broad clusters
while still retaining sensitivity to detect distance gain in overclustered regions.
As an example, a normalized gain of unity for a particular edge means that its removal
requires an alternative path that increases the distance travelled by that edge's length.

The normalized gain is reported as the \code{"gain"} attribute in the edges of the MST from \code{\link{createClusterMST}}.
Note that the \code{"weight"} attribute represents the edge length.
}

\section{Interpreting the pseudotime matrix}{

The pseudotimes are returned as a matrix where each row corresponds to cell in \code{x} 
and each column corresponds to a path through the MST from \code{start} to all nodes of degree 1.
(If \code{start} is itself a node of degree 1, then paths are only considered to all other such nodes.)
This format is inspired by that from the \pkg{slingshot} package and provides a compact representation of branching events.

Each branching event in the MST results in a new path and thus a new column in the pseudotime matrix.
For any given row in this matrix, entries are either \code{NA} or they are identical.
This reflects the fact that multiple paths will share a section of the MST for which the pseudotimes are the same.

The starting node in \code{start} is \emph{completely arbitrarily chosen} by \code{orderClusterMST},
as directionality is impossible to infer from the expression matrix alone.
However, it is often possible to use prior biological knowledge to pick an appropriate cluster as the starting node.
}

\examples{
# Mocking up a Y-shaped trajectory.
centers <- rbind(c(0,0), c(0, -1), c(1, 1), c(-1, 1))
rownames(centers) <- seq_len(nrow(centers))
clusters <- sample(nrow(centers), 1000, replace=TRUE)
cells <- centers[clusters,]
cells <- cells + rnorm(length(cells), sd=0.5)

# Creating the MST first:
mst <- createClusterMST(centers)
plot(mst)

# Also plotting the MST on top of existing visualizations:
edges <- connectClusterMST(centers, mst, combined=FALSE)
plot(cells[,1], cells[,2], col=clusters)
segments(edges$start$dim1, edges$start$dim2, edges$end$dim1, 
     edges$end$dim2, lwd=5)

# Finally, obtaining pseudo-time orderings.
ordering <- orderClusterMST(cells, clusters, centers, mst)
unified <- rowMeans(ordering, na.rm=TRUE)
plot(cells[,1], cells[,2], col=topo.colors(21)[cut(unified, 21)], pch=16)

}
\references{
Ji Z and Ji H (2016).
TSCAN: Pseudo-time reconstruction and evaluation in single-cell RNA-seq analysis.
\emph{Nucleic Acids Res.} 44, e117
}
\seealso{
\code{\link{quickPseudotime}}, a wrapper to quickly perform these calculations.
}
\author{
Aaron Lun
}
