Package 'CIDER' reference manual

Title:	Meta-Clustering for scRNA-Seq Integration and Evaluation
Description:	A workflow of (a) meta-clustering based on inter-group similarity measures and (b) a ground-truth-free test metric to assess the biological correctness of integration in real datasets. See Hu Z, Ahmed A, Yau C (2021) <doi:10.1101/2021.03.29.437525> for more details.
Authors:	Zhiyuan Hu [aut, cre] , Christopher Yau [aut] , Ahmed Ahmed [aut]
Maintainer:	Zhiyuan Hu <[email protected]>
License:	MIT + file LICENSE
Version:	0.99.4
Built:	2025-03-10 03:37:33 UTC
Source:	https://github.com/zhiyuan-hu-lab/cider

Calculate Distance Matrix Using a Single Model

Description

This function computes a similarity matrix by utilising a single linear model for differential expression analysis.

Usage

calculateDistMatOneModel(
  matrix,
  metadata,
  verbose = TRUE,
  method = "voom",
  additional.variate = NULL
)
calculateDistMatOneModel(
  matrix,
  metadata,
  verbose = TRUE,
  method = "voom",
  additional.variate = NULL
)

Arguments

`matrix`	A count matrix with rows representing genes or features and columns representing samples or cells.
`metadata`	A data frame containing metadata corresponding to the samples or cells. Each row should match a column in `matrix`.
`verbose`	Logical. If `TRUE`, the function displays progress messages and a progress bar. The default is `TRUE`.
`method`	A character string specifying the method for differential expression analysis. Options are "voom" or "trend", with "trend" as the default.
`additional.variate`	A character vector of additional variates to include in the linear model for regression.

Value

A similarity matrix.

Downsampling Cells

Description

Downsamples cells from each group for IDER-based similarity calculation.

Usage

downsampling(
  metadata,
  n.size = 35,
  seed = NULL,
  include = FALSE,
  replace = FALSE,
  lower.cutoff = 3
)
downsampling(
  metadata,
  n.size = 35,
  seed = NULL,
  include = FALSE,
  replace = FALSE,
  lower.cutoff = 3
)

Arguments

`metadata`	A data frame containing at least two columns: one for group labels and one for batch information. Each row corresponds to a single cell. Required.
`n.size`	Numeric value specifying the number of cells to use in each group. Default is `35`.
`seed`	Numeric value to set the random seed for sampling. Default is `12345`.
`include`	Logical value indicating whether to include groups that have fewer cells than `n.size`. Default is `FALSE`.
`replace`	Logical value specifying whether to sample with replacement if a group is smaller than `n.size`. Default is `FALSE`.
`lower.cutoff`	Numeric value indicating the minimum group size required for inclusion. Default is `3`.

Value

A list of numeric indices (or cell names) for cells to be kept for downstream computation.

Examples

  # 'meta' is a data frame with columns 'label' and 'batch'
  meta <- data.frame(
    label = c(rep("A", 40), rep("A", 35), rep("B", 20)),
    batch = c(rep("X", 40), rep("Y", 35), rep("X", 20))
  )
  keep_cells <- downsampling(meta, n.size = 35, seed = 12345)
  
  # Display the selected indices
  print(keep_cells)

# 'meta' is a data frame with columns 'label' and 'batch'
  meta <- data.frame(
    label = c(rep("A", 40), rep("A", 35), rep("B", 20)),
    batch = c(rep("X", 40), rep("Y", 35), rep("X", 20))
  )
  keep_cells <- downsampling(meta, n.size = 35, seed = 12345)
  
  # Display the selected indices
  print(keep_cells)

Estimate the Empirical Probability of Whether Two Set of Cells from Distinct Batches Belong to the Same Population

Description

This function computes the empirical probability that two sets of cells from distinct batches belong to the same population, based on the output of getIDEr.

Usage

estimateProb(
  seu,
  ider,
  batch.var = "Batch",
  n_size = 40,
  n.perm = 5,
  verbose = FALSE
)
estimateProb(
  seu,
  ider,
  batch.var = "Batch",
  n_size = 40,
  n.perm = 5,
  verbose = FALSE
)

Arguments

`seu`	A Seurat object.
`ider`	A list returned by the `getIDEr` function.
`batch.var`	Character string specifying the metadata column that contains batch information. Default is "Batch".
`n_size`	Numeric value indicating the number of cells per group used to compute the similarity. Default is 40.
`n.perm`	Numeric value specifying the number of permutations to perform.
`verbose`	Logical. If `TRUE`, progress messages are printed. Default is `FALSE`.

Value

A Seurat object with additional columns for the IDER-based similarity and the empirical probability of rejection.

Final Clustering Step for Meta-Clustering

Description

This function merges initial clusters into final clusters based on the IDEr similarity matrix.

Usage

finalClustering(
  seu,
  dist,
  cutree.by = "h",
  cutree.h = 0.45,
  cutree.k = 3,
  hc.method = "complete"
)
finalClustering(
  seu,
  dist,
  cutree.by = "h",
  cutree.h = 0.45,
  cutree.k = 3,
  hc.method = "complete"
)

Arguments

`seu`	A Seurat object that has undergone the `getIDEr` step. Required.
`dist`	A list output from the `getIDEr` function. Required.
`cutree.by`	Character string specifying whether to cut the dendrogram by height ("h") or by a fixed number of clusters ("k"). Default is "h".
`cutree.h`	Numeric value between 0 and 1 indicating the height at which to cut the dendrogram. This parameter is ignored if `cutree.by = "k"`. Default is 0.45.
`cutree.k`	Numeric value specifying the number of clusters to generate if `cutree.by = "k"`. This parameter is ignored if `cutree.by = "h"`. Default is 3.
`hc.method`	Character string specifying the method to be used in hierarchical clustering (passed to `hclust`).

Value

A Seurat object with the final clustering results stored in the CIDER_clusters column of its meta.data.

Gather Initial Cluster Names

Description

Merge initial clustering results from a list of Seurat objects into a single Seurat object.

Usage

gatherInitialClusters(seu_list, seu)
gatherInitialClusters(seu_list, seu)

Arguments

`seu_list`	A list containing Seurat objects with initial clustering results. Required.
`seu`	A Seurat object to which the merged initial cluster information will be added.

Value

A Seurat object containing the initial clustering results in the initial_cluster column of its meta.data.

Calculate the Similarity Matrix

Description

Compute the IDER-based similarity matrix for a list of Seurat objects. This function does not regress out batch effects and is designed for use during the initial clustering step.

Usage

getDistMat(
  seu_list,
  verbose = TRUE,
  tmp.initial.clusters = "seurat_clusters",
  method = "trend",
  batch.var = "Batch",
  additional.variate = NULL,
  downsampling.size = 35,
  downsampling.include = TRUE,
  downsampling.replace = TRUE
)
getDistMat(
  seu_list,
  verbose = TRUE,
  tmp.initial.clusters = "seurat_clusters",
  method = "trend",
  batch.var = "Batch",
  additional.variate = NULL,
  downsampling.size = 35,
  downsampling.include = TRUE,
  downsampling.replace = TRUE
)

Arguments

`seu_list`	A list containing Seurat objects. Required.
`verbose`	Logical. If `TRUE`, progress messages and a progress bar are displayed. Default is `TRUE`.
`tmp.initial.clusters`	Character string specifying one of the column names from `[email protected]` that denotes groups, e.g., initial clusters. Default is "seurat_clusters".
`method`	Character string specifying the method for differential expression analysis. Options are "voom" or "trend" (default is "trend").
`batch.var`	Character string specifying the metadata column containing batch information. Default is "Batch".
`additional.variate`	Character vector of additional variates to include in the linear model for regression.
`downsampling.size`	Numeric value indicating the number of cells to use per group. Default is 35.
`downsampling.include`	Logical. Whether to include groups with fewer cells than `downsampling.size`. Default is `TRUE`.
`downsampling.replace`	Logical. Whether to sample with replacement for groups smaller than `downsampling.size`. Default is `TRUE`.

Value

A list of similarity matrices.

Calculate IDER-Based Similarity Between Two Groups

Description

This function calculates the IDER-based similarity between two groups using a linear model.

Usage

getGroupFit(logCPM, design, contrast_m)
getGroupFit(logCPM, design, contrast_m)

Arguments

`logCPM`	A numeric matrix of log-transformed counts per million.
`design`	A design matrix for the differential expression analysis.
`contrast_m`	A contrast matrix specifying the comparison between the two groups.

Value

A numeric value representing the IDER-based similarity between the two groups.

Compute IDER-Based Similarity

Description

Calculate the similarity matrix based on Inter-group Differential Expression (IDER) metrics with the selected batch effects regressed out.

Usage

getIDEr(
  seu,
  group.by.var = "initial_cluster",
  batch.by.var = "Batch",
  verbose = TRUE,
  use.parallel = FALSE,
  n.cores = 1,
  downsampling.size = 40,
  downsampling.include = TRUE,
  downsampling.replace = TRUE
)
getIDEr(
  seu,
  group.by.var = "initial_cluster",
  batch.by.var = "Batch",
  verbose = TRUE,
  use.parallel = FALSE,
  n.cores = 1,
  downsampling.size = 40,
  downsampling.include = TRUE,
  downsampling.replace = TRUE
)

Arguments

`seu`	A Seurat S4 object that includes an `initial_cluster` column in its `meta.data`. Required.
`group.by.var`	Character string specifying the column in `[email protected]` that defines initial clusters (batch-specific groups). Default is "initial_cluster".
`batch.by.var`	Character string specifying the metadata column that indicates batch information. Default is "Batch".
`verbose`	Logical. If `TRUE`, progress messages and a progress bar are displayed. Default is `TRUE`.
`use.parallel`	Logical. If `TRUE`, parallel computation is used (requires `doParallel`); in this case, no progress bar will be shown. Default is `FALSE`.
`n.cores`	Numeric. The number of cores to use for parallel computing. Default is 1.
`downsampling.size`	Numeric. The number of cells representing each group. Default is 40.
`downsampling.include`	Logical. Whether to include groups with fewer cells than `downsampling.size`. Default is `FALSE`.
`downsampling.replace`	Logical. Whether to sample with replacement if a group is smaller than `downsampling.size`. Default is `FALSE`.

Value

A list of objects: a similarity matrix, a numeric vector recording the cells used, and a data frame of the group combinations included.

Initial Clustering for Evaluating Integration

Description

This function applies HDBSCAN, a density-based clustering algorithm, to the corrected dimension reduction of a Seurat object.

Usage

hdbscan.seurat(
  seu,
  batch.var = "Batch",
  reduction = "pca",
  dims = seq_len(15),
  minPts = 25
)
hdbscan.seurat(
  seu,
  batch.var = "Batch",
  reduction = "pca",
  dims = seq_len(15),
  minPts = 25
)

Arguments

`seu`	A Seurat object containing integrated or batch-corrected data (e.g. PCA results).
`batch.var`	Character string specifying the metadata column that contains batch information. Default is "Batch".
`reduction`	Character string specifying the name of the dimension reduction to use (e.g. "PCA"). Default is "PCA".
`dims`	Numeric vector indicating the dimensions to be used for initial clustering. Default is 1:15.
`minPts`	Integer specifying the minimum number of points required to form a cluster. This value is passed to the `hdbscan` function. Default is 25.

Value

A Seurat object with two additional columns in its meta.data: dbscan_cluster and initial_cluster.

Initial Clustering

Description

Perform batch-specific initial clustering on a Seurat object.

Usage

initialClustering(
  seu,
  batch.var = "Batch",
  cut.height = 0.4,
  nfeatures = 2000,
  additional.vars.to.regress = NULL,
  dims = seq_len(14),
  resolution = 0.6,
  downsampling.size = 50,
  verbose = FALSE
)
initialClustering(
  seu,
  batch.var = "Batch",
  cut.height = 0.4,
  nfeatures = 2000,
  additional.vars.to.regress = NULL,
  dims = seq_len(14),
  resolution = 0.6,
  downsampling.size = 50,
  verbose = FALSE
)

Arguments

`seu`	A Seurat object. Required.
`batch.var`	Character string specifying one of the column names in `[email protected]` used to partition the object into subsets. Default is "Batch".
`cut.height`	Numeric value specifying the height at which to cut hierarchical trees. Default is 0.4.
`nfeatures`	Numeric value indicating the number of high-variance genes to use. Default is 2000.
`additional.vars.to.regress`	Character vector of additional variable names from `[email protected]` to regress out. Optional. Default is `NULL`.
`dims`	Numeric vector specifying the dimensions to be used for clustering (passed to Seurat). Default is 1:14.
`resolution`	Numeric value for clustering resolution (passed to Seurat). Default is 0.6.
`downsampling.size`	Numeric value indicating the number of cells representing each group. Default is 40.
`verbose`	Logical. If `TRUE`, a progress bar is displayed. Default is `FALSE`.

Value

A Seurat S4 object with initial cluster assignments stored in the initial_cluster column of its meta.data.

Merge Initial Clusters

Description

Merge initial clusters based on a provided similarity matrix and hierarchical clustering.

Usage

mergeInitialClusters(
  seu_list,
  dist_list,
  use = "coef",
  method = "hc",
  hc.method = "average",
  cutree.by = "h",
  cutree.h = 0.6,
  cutree.k = 3,
  batch.var = "Batch"
)
mergeInitialClusters(
  seu_list,
  dist_list,
  use = "coef",
  method = "hc",
  hc.method = "average",
  cutree.by = "h",
  cutree.h = 0.6,
  cutree.k = 3,
  batch.var = "Batch"
)

Arguments

`seu_list`	A list of Seurat objects containing the single-cell data. This parameter is required.
`dist_list`	A list of similarity matrices as returned by `getDistMat()`. The order of matrices should correspond to that of the Seurat objects in `seu_list`.
`use`	A string specifying the similarity measure to use. Currently, only "coef" is supported. Default is "coef".
`method`	A string specifying the clustering method to employ. The default is "hc" for hierarchical clustering.
`hc.method`	A string passed to the `method` parameter of `hclust()`. Default is "average".
`cutree.by`	A character indicating whether to cut the dendrogram by height ("h", default) or by a set number of clusters ("k").
`cutree.h`	A numeric value defining the height at which to cut the tree if `cutree.by = "h"`. Default is 0.6.
`cutree.k`	A numeric value specifying the number of clusters to generate if `cutree.by = "k"`. Default is 3.
`batch.var`	A character string representing the metadata column name that contains batch information. Default is "Batch".

Details

This function accepts a list of Seurat objects and a corresponding list of similarity matrices, and then merges the initial clusters using a hierarchical clustering approach. The updated cluster assignments are stored within each Seurat object.

Value

A list of Seurat objects in which the initial clustering has been updated. The new cluster assignments are stored in the inicluster field of each Seurat object, whilst the original assignments are preserved in the inicluster_tmp field.

Pancreas Metadata

Description

This dataset provides cell-level metadata for the human and mouse pancreatic data used in the study.

Usage

data(pancreas_meta)
data(pancreas_meta)

Format

A data frame with 10127 rows and 3 columns:

Batch: Species information (human or mouse).
Group: Cell type annotation.
Sample: Donor information.

Details

Cell-level metadata for cross-species pancreatic data.

Source

The metadata were downloaded alongside the count matrix from NCBI GEO accession GSE84133. Reference: Baron M, Veres A, Wolock SL, Faust AL, Gaujoux R, Vetere A, et al. A single-cell transcriptomic map of the human and mouse pancreas reveals inter- and intra-cell population structure. Cell Syst. 2016;3:346–360.e4.

Plot Similarity Matrix with pheatmap

Description

This function creates a heatmap of the similarity matrix computed by getDistMat().

Usage

plotDistMat(dist.list, use = "coef")
plotDistMat(dist.list, use = "coef")

Arguments

`dist.list`	A list representing the similarity matrix output by `getDistMat()`. Required.
`use`	Character string specifying the similarity measure to use. Default is "coef". No other option is currently available.

Value

A pheatmap object displaying the similarity matrix.

Plot Heatmap for the IDER-Based Similarity Matrix

Description

This function generates a heatmap that visualises the similarity between shared groups across batches, as computed by getIDEr.

Usage

plotHeatmap(seu, ider, batch.var = "Batch")
plotHeatmap(seu, ider, batch.var = "Batch")

Arguments

`seu`	A Seurat object.
`ider`	The output list from the `getIDEr` function.
`batch.var`	Character string specifying the metadata column that contains batch information. Default is "Batch".

Value

A heatmap displaying the similarity between shared groups across batches.

Plot Network Graph

Description

Visualise the network based on an IDER-based similarity matrix. The vertexes are initial clusters, and the edge width denotes the similarity between two initial clusters.

Usage

plotNetwork(
  seu,
  ider,
  batch.var = "Batch",
  colour.by = NULL,
  weight.factor = 6.5,
  col.vector = NULL,
  vertex.size = 1
)
plotNetwork(
  seu,
  ider,
  batch.var = "Batch",
  colour.by = NULL,
  weight.factor = 6.5,
  col.vector = NULL,
  vertex.size = 1
)

Arguments

`seu`	Seurat S4 object after the step of `getIDER`, containing `initial_cluster` and `Batch` in its meta.data. Required.
`ider`	A list. Output of 'getIDER'. Required.
`batch.var`	Character. Metadata colname containing batch information. (Default: `Batch`)
`colour.by`	Character. It should be one of the colnames of Seurat object meta.data.It is used to colour the vertex of the network graph. (Default: `NULL`)
`weight.factor`	Numerical. Adjust the thickness of the edges. (Default: 6.5)
`col.vector`	A vector of Hex colour codes. If no value is given (default), a vector of 74 colours will be used.
`vertex.size`	Numerical. Adjsut the size of vertexes. (Default: 1)

Value

An igraph object

Scatterplot by a selected feature

Description

Scatterplot of a Seurat object based on dimension reduction.

Usage

scatterPlot(
  seu,
  reduction,
  colour.by,
  colvec = NULL,
  title = NULL,
  sort.by.numbers = TRUE,
  viridis_option = "B"
)
scatterPlot(
  seu,
  reduction,
  colour.by,
  colvec = NULL,
  title = NULL,
  sort.by.numbers = TRUE,
  viridis_option = "B"
)

Arguments

`seu`	Seurat S4 object after the step of `getIDER`. Required.
`reduction`	Character. The dimension reduction used to plot. Common options: `"pca"`, `"tsne"`, `"umap"`. The availability of dimension reduction can be checked by `Reductions(seu)`.
`colour.by`	Character. One of the column names of `[email protected]`. Can be either discreet or continuous variables.
`colvec`	A vector of Hex colour codes. If no value is given (default), a vector of 74 colours will be used.
`title`	Character. Title of the figure.
`sort.by.numbers`	Boolean. Whether to sort the groups by the number of cells.(Default: `True`)
`viridis_option`	viridis_option. (Default: `B`)

Value

A ggplot2 scatter plot

Package 'CIDER'

Help Index

Calculate Distance Matrix Using a Single Model

Description

Usage

Arguments

Value

See Also

Downsampling Cells

Description

Usage

Arguments

Value

Examples

Estimate the Empirical Probability of Whether Two Set of Cells from Distinct Batches Belong to the Same Population

Description

Usage

Arguments

Value

See Also

Final Clustering Step for Meta-Clustering

Description

Usage

Arguments

Value

See Also

Gather Initial Cluster Names

Description

Usage

Arguments

Value

See Also

Calculate the Similarity Matrix

Description

Usage

Arguments

Value

See Also

Calculate IDER-Based Similarity Between Two Groups

Description

Usage

Arguments

Value

Compute IDER-Based Similarity

Description

Usage

Arguments

Value

See Also

Initial Clustering for Evaluating Integration

Description

Usage

Arguments

Value

See Also

Initial Clustering

Description

Usage

Arguments

Value

See Also

Merge Initial Clusters

Description

Usage

Arguments

Details

Value

See Also

Pancreas Metadata

Description

Usage

Format

Details

Source

Plot Similarity Matrix with pheatmap

Description

Usage

Arguments

Value

See Also