scmorph API#

Import scmorph as:

import scmorph as sm

Reading and writing data: io#

scmorph can read data from a variety of sources, including data exported by CellProfiler. Once loaded in, all data is treated as an AnnData object. This has the advantage of being a fast, standard format that can be used with many existing single-cell tools, such as scanpy.

Note

If you would like to learn more about the h5ad file format, please see anndata, which is used to read and write these files.

Note

scmorph only processes continuous, non-radial features, i.e. features like number of nearest neighbors (discrete), X/Y coordinates (discrete and unfinformative) and BoundingBox (rotation-sensitive) are discarded. You may see a warning message about this: consider this an information rather than as an error.

read(filename, **kwargs)

Read csv, h5ad or sql files.

read_cellprofiler_csv(filename[, n_headers, ...])

Read a matrix from a .csv file created with CellProfiler

read_cellprofiler_batches(path, output_file)

Read CellProfiler data from directories

read_sql(filename[, backup_url])

Read sql files.

make_AnnData(df[, meta_cols, feature_delim])

Make annotated data matrix from DataFrame

split_feature_names(features[, feature_delim])

Split feature names into a DataFrame

Preprocessing: pp#

Preprocessing tools that do not produce output, but modify the data to prepare it for downstream analysis.

Basic Preprocessing#

drop_na(adata[, feature_threshold, ...])

Drop features with many NAs, then drop cells with any NAs (or infinite values)

scale(adata[, treatment_key, control, chunked])

Scale data to unit variance per feature while maintaining a low memory footprint (operates in-place).

scale_by_batch(adata, batch_key[, ...])

Scale data to zero-center and unit variance per batch in-place.

Batch Effects#

Tools to remove batch effects from single-cell morphological data.

remove_batch_effects(adata[, bio_key, ...])

Remove batch effects

Feature Selection#

Tools to reduce number of features based on correlations.

select_features(adata[, method, cor_cutoff, ...])

Feature selection

corr(X[, Y, method, M])

Compute pairwise correlations

Aggregation#

Tools to compare aggregate profiles. Additionally, different distance metrics are available. For a simple aggregation, use aggregate. For a statistically robust distance metric, use aggregate_mahalanobis.

aggregate(adata[, well_key, group_keys, ...])

Aggregate single-cell measurements into well-level profiles

aggregate_ttest(adata[, treatment_key, ...])

Measure per-feature distance between groups using t-statistics.

tstat_distance(tstats)

Summarize t-statistics into per group.

aggregate_pc(adata[, treatment_key, ...])

Measure distance between groups using principle components weighted by variance explained

aggregate_mahalanobis(adata[, ...])

Measure distance between groups using mahalanobis distance

Dimensionality-reduction#

Tools to perform dimensionality-reduction.

pca(adata[, n_comps, whiten, copy, ...])

Principal component analysis [Pedregosa et al., 2011].

neighbors(adata[, n_neighbors, n_pcs, ...])

Compute a neighborhood graph of observations using the PCA representation.

umap(adata, **kwargs)

Embed the neighborhood graph using UMAP [McInnes et al., 2018].

Quality Control: qc#

Tools to filter cells and images based on quality control metrics and morphological profiles. For cells, unsupervised filtering is done using pyod through filter_outliers. For images, semi-supervised filtering is done using machine-learning methods trained on image-level data and a subset of labels with qc_images.

While the former can be performed on any dataset, it is likely not as accurate and may remove underrepresented cell types.

filter_outliers(adata[, outliers, fraction, ...])

Filter outlier observations from an AnnData object.

read_image_qc(filename[, meta_cols, ...])

Read image metrics from csv file

qc_images(adata, qc[, classifier, ...])

Perform cell-QC based on image metrics, if needed using a classifier and a subset of labeled images.

Visualization: pl#

Tools to plot data, often from dimensionality-reduction techniques. Most of these functions are wrappers around scanpy functions.

pca(adata[, annotate_var_explained])

Principal component analysis [Pedregosa et al., 2011].

umap(adata, **kwargs)

Uniform Manifold Approximation and Projection [McInnes et al., 2018].

cumulative_density(adata, x[, layer, color, ...])

Plot cumulative densities of variables in AnnData

ridge_plot(adata, x, y[, layer, n_col])

Plot features as ridge plot.

Datasets: datasets#

Datasets that are included with scmorph for testing and demonstration purposes. Currently, this includes various versions of the data in [Rohban et al., 2017].

rohban2017([backed])

Load a large multi-plate experiment by Rohban et al. [2017] with ~1.2M cells

rohban2017_minimal([backed])

Load a subset of a multi-plate experiment by Rohban et al. [2017] with ~12,000 cells

rohban2017_imageQC([backed])

Load image-level data for a multi-plate experiment by Rohban et al. [2017]

rohban2017_minimal_csv()

Provides a minimal csv file in CellProfiler format, data from Rohban et al. [2017]