scmorph API#
Import scmorph as:
import scmorph as sm
Reading and writing data: io#
scmorph can read data from a variety of sources, including data exported by CellProfiler.
Once loaded in, all data is treated as an AnnData object.
This has the advantage of being a fast, standard format that can be used with many
existing single-cell tools, such as scanpy.
Note
If you would like to learn more about the h5ad file format, please see
anndata, which is used to read and write these files.
Note
scmorph only processes continuous, non-radial features, i.e. features like number of nearest neighbors (discrete), X/Y coordinates (discrete and unfinformative) and BoundingBox (rotation-sensitive) are discarded. You may see a warning message about this: consider this an information rather than as an error.
|
Read csv, h5ad or sql files. |
|
Read a matrix from a .csv file created with CellProfiler |
|
Read CellProfiler data from directories |
|
Read sql files. |
|
Make annotated data matrix from |
|
Split feature names into a |
Preprocessing: pp#
Preprocessing tools that do not produce output, but modify the data to prepare it for downstream analysis.
Basic Preprocessing#
|
Drop features with many NAs, then drop cells with any NAs (or infinite values) |
|
Scale data to unit variance per feature while maintaining a low memory footprint (operates in-place). |
|
Scale data to zero-center and unit variance per batch in-place. |
Batch Effects#
Tools to remove batch effects from single-cell morphological data.
|
Remove batch effects |
Feature Selection#
Tools to reduce number of features based on correlations.
|
Feature selection |
|
Compute pairwise correlations |
Aggregation#
Tools to compare aggregate profiles.
Additionally, different distance metrics are available.
For a simple aggregation, use aggregate. For a statistically robust distance
metric, use aggregate_mahalanobis.
|
Aggregate single-cell measurements into well-level profiles |
|
Measure per-feature distance between groups using t-statistics. |
|
Summarize t-statistics into per group. |
|
Measure distance between groups using principle components weighted by variance explained |
|
Measure distance between groups using mahalanobis distance |
Dimensionality-reduction#
Tools to perform dimensionality-reduction.
|
Principal component analysis [Pedregosa et al., 2011]. |
|
Compute a neighborhood graph of observations using the PCA representation. |
|
Embed the neighborhood graph using UMAP [McInnes et al., 2018]. |
Quality Control: qc#
Tools to filter cells and images based on quality control metrics and morphological profiles.
For cells, unsupervised filtering is done using pyod through filter_outliers.
For images, semi-supervised filtering is done using machine-learning methods trained on
image-level data and a subset of labels with qc_images.
While the former can be performed on any dataset, it is likely not as accurate and may remove underrepresented cell types.
|
Filter outlier observations from an AnnData object. |
|
Read image metrics from csv file |
|
Perform cell-QC based on image metrics, if needed using a classifier and a subset of labeled images. |
Visualization: pl#
Tools to plot data, often from dimensionality-reduction techniques. Most of these functions are wrappers around scanpy functions.
|
Principal component analysis [Pedregosa et al., 2011]. |
|
Uniform Manifold Approximation and Projection [McInnes et al., 2018]. |
|
Plot cumulative densities of variables in AnnData |
|
Plot features as ridge plot. |
Datasets: datasets#
Datasets that are included with scmorph for testing and demonstration purposes.
Currently, this includes various versions of the data in [Rohban et al., 2017].
|
Load a large multi-plate experiment by Rohban et al. [2017] with ~1.2M cells |
|
Load a subset of a multi-plate experiment by Rohban et al. [2017] with ~12,000 cells |
|
Load image-level data for a multi-plate experiment by Rohban et al. [2017] |
Provides a minimal csv file in CellProfiler format, data from Rohban et al. [2017] |