Package 'Spectrum' reference manual

Title:	Fast Adaptive Spectral Clustering for Single and Multi-View Data
Description:	A self-tuning spectral clustering method for single or multi-view data. 'Spectrum' uses a new type of adaptive density aware kernel that strengthens connections in the graph based on common nearest neighbours. It uses a tensor product graph data integration and diffusion procedure to integrate different data sources and reduce noise. 'Spectrum' uses either the eigengap or multimodality gap heuristics to determine the number of clusters. The method is sufficiently flexible so that a wide range of Gaussian and non-Gaussian structures can be clustered with automatic selection of K.
Authors:	Christopher R John, David Watson
Maintainer:	Christopher R John <[email protected]>
License:	AGPL-3
Version:	1.1
Built:	2025-02-18 04:57:10 UTC
Source:	https://github.com/crj32/spectrum

8 blob like structures

Description

A simulated dataset of 8 Gaussian blobs. Simulated using the 'clusterlab' CRAN package.

Usage

blobs
blobs

Format

A data frame with 10 rows and 800 variables

A dataset containing The Cancer Genome Atlas expression data. From this publication https://tcga-data.nci.nih.gov/docs/publications/lgggbm_2016/. The first data frame is a 5133X150 RNA-seq data matrix, the second is a 262X150 miRNA-seq data matrix, the third is 45X150 protein array data matrix. The data was all pre-normalised then subject to log transform.

Usage

brain
brain

Format

A list of data frames

Source

https://gdac.broadinstitute.org/

Three concentric circles

Description

Simulated data using the 'clusterSim' CRAN package.

Usage

circles
circles

Format

A data frame with 2 rows and 540 variables

cluster_similarity: cluster a similarity matrix using the Ng method

Description

This function performs clustering of a similarity matrix following the method of Ng or of Melia. We recommend using the Ng method with GMM to cluster the eigenvectors instead of k-means.

Usage

cluster_similarity(A2, k = k, clusteralg = "GMM", specalg = "Ng")
cluster_similarity(A2, k = k, clusteralg = "GMM", specalg = "Ng")

Arguments

`A2`	Data frame or matrix: a similarity matrix
`k`	Numerical value: the number of clusters
`clusteralg`	Character value: GMM or km clustering algorithm (suggested=GMM)
`specalg`	Character value: Ng or Melia variant of spectral clustering (default=Ng)

Value

A numeric vector of cluster assignments

References

Ng, Andrew Y., Michael I. Jordan, and Yair Weiss. "On spectral clustering: Analysis and an algorithm." Advances in neural information processing systems. 2002.

Meila, Marina, et al. "Spectral Clustering: a Tutorial for the 2010’s." Handbook of Cluster Analysis. CRC Press, 2016. 1-23.

Examples

ng_similarity <- cluster_similarity(missl[[1]],k=8)
ng_similarity <- cluster_similarity(missl[[1]],k=8)

CNN_kernel: fast adaptive density-aware kernel

Description

CNN_kernel: fast adaptive density-aware kernel

Usage

CNN_kernel(mat, NN = 3, NN2 = 7)
CNN_kernel(mat, NN = 3, NN2 = 7)

Arguments

`mat`	Matrix: matrix should have samples as columns and rows as features
`NN`	Numerical value: the number of nearest neighbours to use when calculating local sigma
`NN2`	Numerical value: the number of nearest neighbours to use when calculating common nearest neighbours

Value

A kernel matrix

Examples

CNN_kern <- CNN_kernel(blobs[,1:50])
CNN_kern <- CNN_kernel(blobs[,1:50])

estimate_k: estimate K using the eigengap or multimodality gap heuristics

Description

This function will try to estimate K given a similarity matrix. Generally the maximum eigengap is preferred, but on some data examining the distribution of the eigenvectors as in the multimodality gap heuristic may be beneficial.

Usage

estimate_k(A2, maxk = 10, showplots = TRUE)
estimate_k(A2, maxk = 10, showplots = TRUE)

Arguments

`A2`	Data frame or matrix: a similarity matrix
`maxk`	Numerical value: maximum number of K to be considered
`showplots`	Character value: whether to show the plot on the screen

Value

A data frame containing the eigenvalues and dip-test statistics of the eigenvectors of the graph Laplacian

Examples

k_test <- estimate_k(missl[[1]])
k_test <- estimate_k(missl[[1]])

harmonise_ids: works on a list of similarity matrices to add entries of NA where there are missing observations between views

Description

Simply adds a column and row of NA with the missing ID for data imputation. The similarity matrix requires row and column IDs present for this to work.

Usage

harmonise_ids(l)
harmonise_ids(l)

Arguments

`l`	A list of similarity matrices: those to be harmonised.

Value

A list of harmonised similarity matrices.

Examples

h_test <- harmonise_ids(missl) 
h_test <- harmonise_ids(missl)

integrate_similarity_matrices: integrate similarity matrices using a tensor product graph linear combination and diffusion technique

Description

Given a list of similarity matrices this function will integrate them running the Shu algorithm, also can reduce noise if the input is a list consisting of a single matrix.

Usage

integrate_similarity_matrices(kernellist, KNNs_p = 10,
  diffusion_iters = 4, method = "TPG")
integrate_similarity_matrices(kernellist, KNNs_p = 10,
  diffusion_iters = 4, method = "TPG")

Arguments

`kernellist`	A list of similarity matrices: those to be integrated
`KNNs_p`	Numerical value: number of nearest neighbours for KNN graph (default=10, suggested=10-20)
`diffusion_iters`	Numerical value: number of iterations for graph diffusion (default=4, suggested=2-6)
`method`	Character: either TPG (see reference below) or mean (default=TPG)

Value

An integrated similarity matrix

References

Shu, Le, and Longin Jan Latecki. "Integration of single-view graphs with diffusion of tensor product graphs for multi-view spectral clustering." Asian Conference on Machine Learning. 2016.

Examples

i_test <- integrate_similarity_matrices(misslfilled,method='mean')
i_test <- integrate_similarity_matrices(misslfilled,method='mean')

kernel_pca: A kernel pca function

Description

kernel_pca: A kernel pca function

Usage

kernel_pca(datam, labels = FALSE, axistextsize = 18,
  legendtextsize = 18, dotsize = 3, similarity = TRUE)
kernel_pca(datam, labels = FALSE, axistextsize = 18,
  legendtextsize = 18, dotsize = 3, similarity = TRUE)

Arguments

`datam`	Dataframe or matrix: a data frame with samples as columns, rows as features, or a kernel matrix
`labels`	Factor: to label the plot with colours
`axistextsize`	Numerical value: axis text size
`legendtextsize`	Numerical value: legend text size
`dotsize`	Numerical value: dot size
`similarity`	Logical flag: whether the input is a similarity matrix or not

Value

A kernel PCA plot

Examples

ex_kernel_pca <- kernel_pca(blobs[,1:50], similarity=FALSE)
ex_kernel_pca <- kernel_pca(blobs[,1:50], similarity=FALSE)

mean_imputation: mean imputation function for multi-view spectral clustering with missing data

Description

Works on a list of similarity matrices to impute missing values using the mean from the other views.

Usage

mean_imputation(l)
mean_imputation(l)

Arguments

`l`	A list of data frames: all those to be included in the imputation.

Value

A list of completed data frames.

Examples

m_test <- mean_imputation(misslfilled)
m_test <- mean_imputation(misslfilled)

A list of the blob data as similarity matrices with a missing entry in one

Description

Two copies of a simulated dataset of 8 Gaussian blobs in a list converted to a similarity matrix, but one has a missing observation.

Usage

missl
missl

Format

A list of two data frames

A list of the blob data as similarity matrices with a missing entry in one filled with NAs

Description

Two copies of a simulated dataset of 8 Gaussian blobs in a list converted to a similarity matrix, but one has a missing observation filled with NAs.

Usage

misslfilled
misslfilled

Format

A list of two data frames

ng_kernel: Kernel from the Ng spectral clustering algorithm

Description

This is the kernel from the Ng spectral clustering algorithm. It takes a global sigma which requires tuning for new datasets in most cases. It is possible to use the sigma_finder function to find a sigma for a dataset. Sigma is assumed to be squared already.

Usage

ng_kernel(data, sigma = 0.1)
ng_kernel(data, sigma = 0.1)

Arguments

`data`	Data frame or matrix: with points as columns, features as rows
`sigma`	Numerical value: a global sigma that controls the drop off in affinity

Value

A similarity matrix of the input data

References

Ng, Andrew Y., Michael I. Jordan, and Yair Weiss. "On spectral clustering: Analysis and an algorithm." Advances in neural information processing systems. 2002.

Examples

ng_similarity <- ng_kernel(brain[[1]])
ng_similarity <- ng_kernel(brain[[1]])

pca: A pca function

Description

pca: A pca function

Usage

pca(mydata, labels = FALSE, dotsize = 3, axistextsize = 18,
  legendtextsize = 18)
pca(mydata, labels = FALSE, dotsize = 3, axistextsize = 18,
  legendtextsize = 18)

Arguments

`mydata`	Data frame or matrix: matrix or data frame with samples as columns, features as rows
`labels`	Factor: to label the plot with colours
`dotsize`	Numerical value: dot size
`axistextsize`	Numerical value: axis text size
`legendtextsize`	Numerical value: legend text size

Value

A pca plot object

Examples

ex_pca <- pca(blobs[,1:50])
ex_pca <- pca(blobs[,1:50])

rbfkernel_b: fast self-tuning kernel

Description

rbfkernel_b: fast self-tuning kernel

Usage

rbfkernel_b(mat, K = 3, sigma = 1)
rbfkernel_b(mat, K = 3, sigma = 1)

Arguments

`mat`	Matrix: matrix should have samples as columns and rows as features
`K`	Numerical value: the number of nearest neighbours to use when calculating local sigma
`sigma`	Numerical value: a global sigma, usually left to 1 which has no effect

Value

A kernel matrix

Examples

stsc_kern <- rbfkernel_b(blobs[,1:50])
stsc_kern <- rbfkernel_b(blobs[,1:50])

sigma_finder: heuristic to find sigma for the Ng kernel

Description

This is a heuristic to find the sigma for the kernel from the Ng spectral clustering algorithm. It returns a global sigma. It uses the mean K nearest neighbour distances of all samples to determine sigma.

Usage

sigma_finder(mat, NN = 3)
sigma_finder(mat, NN = 3)

Arguments

`mat`	Data frame or matrix: with points as columns, features as rows
`NN`	Numerical value: the number of nearest neighbours to use (default=3)

Value

A global sigma

Examples

sig <- sigma_finder(blobs)
sig <- sigma_finder(blobs)

Spectrum: Fast Adaptive Spectral Clustering for Single and Multi-view Data

Description

Spectrum is a self-tuning spectral clustering method for single or multi-view data. Spectrum uses a new type of adaptive density-aware kernel that strengthens connections between points that share common nearest neighbours in the graph. For integrating multi-view data and reducing noise a tensor product graph data integration and diffusion procedure is used. Spectrum analyses eigenvector variance or distribution to determine the number of clusters. Spectrum is well suited for a wide range of data, including both Gaussian and non-Gaussian structures.

Usage

Spectrum(data, method = 1, silent = FALSE, showres = TRUE,
  diffusion = TRUE, kerneltype = c("density", "stsc"), maxk = 10,
  NN = 3, NN2 = 7, showpca = FALSE, frac = 2, thresh = 7,
  fontsize = 18, dotsize = 3, tunekernel = FALSE,
  clusteralg = "GMM", FASP = FALSE, FASPk = NULL, fixk = NULL,
  krangemax = 10, runrange = FALSE, diffusion_iters = 4,
  KNNs_p = 10, missing = FALSE)
Spectrum(data, method = 1, silent = FALSE, showres = TRUE,
  diffusion = TRUE, kerneltype = c("density", "stsc"), maxk = 10,
  NN = 3, NN2 = 7, showpca = FALSE, frac = 2, thresh = 7,
  fontsize = 18, dotsize = 3, tunekernel = FALSE,
  clusteralg = "GMM", FASP = FALSE, FASPk = NULL, fixk = NULL,
  krangemax = 10, runrange = FALSE, diffusion_iters = 4,
  KNNs_p = 10, missing = FALSE)

Arguments

`data`	Data frame or list of data frames: contains the data with points to cluster as columns and rows as features. For multi-view data a list of dataframes is to be supplied with the samples in the same order.
`method`	Numerical value: 1 = default eigengap method (Gaussian clusters), 2 = multimodality gap method (Gaussian/ non-Gaussian clusters), 3 = no automatic method (see fixk param)
`silent`	Logical flag: whether to turn off messages
`showres`	Logical flag: whether to show the results on the screen
`diffusion`	Logical flag: whether to perform graph diffusion to reduce noise (default=TRUE)
`kerneltype`	Character string: 'density' (default) = adaptive density aware kernel, 'stsc' = Zelnik-Manor self-tuning kernel
`maxk`	Numerical value: the maximum number of expected clusters (default=10). This is data dependent, do not set excessively high.
`NN`	Numerical value: kernel param, the number of nearest neighbours to use sigma parameters (default=3)
`NN2`	Numerical value: kernel param, the number of nearest neighbours to use for the common nearest neigbours (default = 7)
`showpca`	Logical flag: whether to show pca when running on one view
`frac`	Numerical value: optk search param, fraction to find the last substantial drop (multimodality gap method param)
`thresh`	Numerical value: optk search param, how many points ahead to keep searching (multimodality gap method param)
`fontsize`	Numerical value: controls font size of the ggplot2 plots
`dotsize`	Numerical value: controls the dot size of the ggplot2 plots
`tunekernel`	Logical flag: whether to tune the kernel, only applies for method 2 (default=FALSE)
`clusteralg`	Character string: clustering algorithm for eigenvector matrix (GMM or km)
`FASP`	Logical flag: whether to use Fast Approximate Spectral Clustering (for v. high sample numbers)
`FASPk`	Numerical value: the number of centroids to compute when doing FASP
`fixk`	Numerical value: if we are just performing spectral clustering without automatic selection of K, set this parameter and method to 3
`krangemax`	Numerical value: the maximum K value to iterate towards when running a range of K
`runrange`	Logical flag: whether to run a range of K or not (default=FALSE), puts Kth results into Kth element of list
`diffusion_iters`	Numerical value: number of diffusion iterations for the graph (default=4)
`KNNs_p`	Numerical value: number of KNNs when making KNN graph (default=10, suggested=10-20)
`missing`	Logical flag: whether to impute missing data in multi-view analysis (default=FALSE)

Value

A list, containing: 1) cluster assignments, in the same order as input data columns 2) eigenvector analysis results (either eigenvalues or dip test statistics) 3) optimal K 4) final similarity matrix 5) eigenvectors and eigenvalues of graph Laplacian

Examples

res <- Spectrum(brain[[1]][,1:50])
res <- Spectrum(brain[[1]][,1:50])

Two spirals wrapped around one another

Description

Simulated data using the 'mlbench' CRAN package.

Usage

spirals
spirals

Format

A data frame with 2 rows and 180 variables

Package 'Spectrum'

Help Index

8 blob like structures

Description

Usage

Format

A brain cancer dataset

Description

Usage

Format

Source

Three concentric circles

Description

Usage

Format

cluster_similarity: cluster a similarity matrix using the Ng method

Description

Usage

Arguments

Value

References

Examples

CNN_kernel: fast adaptive density-aware kernel

Description

Usage

Arguments

Value

Examples

estimate_k: estimate K using the eigengap or multimodality gap heuristics

Description

Usage

Arguments

Value

Examples

harmonise_ids: works on a list of similarity matrices to add entries of NA where there are missing observations between views

Description

Usage

Arguments

Value

Examples

integrate_similarity_matrices: integrate similarity matrices using a tensor product graph linear combination and diffusion technique

Description

Usage

Arguments

Value

References

Examples

kernel_pca: A kernel pca function

Description

Usage

Arguments

Value

Examples

mean_imputation: mean imputation function for multi-view spectral clustering with missing data

Description

Usage

Arguments

Value

Examples

A list of the blob data as similarity matrices with a missing entry in one

Description

Usage

Format

A list of the blob data as similarity matrices with a missing entry in one filled with NAs

Description

Usage

Format

ng_kernel: Kernel from the Ng spectral clustering algorithm

Description

Usage

Arguments

Value

References

Examples

pca: A pca function

Description

Usage

Arguments

Value

Examples