Preprocessing: pp#

Preprocessing tools that do not produce output, but modify the data to prepare it for downstream analysis.

Basic Preprocessing#

drop_na(adata[, feature_threshold, ...])

Drop features with many NAs, then drop cells with any NAs (or infinite values)

scale(adata[, treatment_key, control, chunked])

Scale data to unit variance per feature while maintaining a low memory footprint (operates in-place).

scale_by_batch(adata, batch_key[, ...])

Scale data to zero-center and unit variance per batch in-place.

Batch Effects#

Tools to remove batch effects from single-cell morphological data.

remove_batch_effects(adata, batch_key[, ...])

Remove batch effects

Feature Selection#

Tools to reduce number of features based on correlation or confounder association.

select_features(adata[, method, cor_cutoff, ...])

Feature selection based on correlation metrics.

kruskal_test(adata[, test_column, progress])

Perform Kruskal-Wallis H-test for each feature across batches.

kruskal_filter(adata[, test_column, sigma, ...])

Filter features based on Kruskal-Wallis H-test statistics.

Aggregation#

Tools to compare aggregate profiles. Additionally, different distance metrics are available. For a simple aggregation, use aggregate. For a statistically robust distance metric, use aggregate_mahalanobis.

aggregate(adata, group_keys[, method, progress])

Aggregate single-cell measurements into well-level profiles

aggregate_ttest(adata, treatment_key[, ...])

Measure per-feature distance between groups using t-statistics.

tstat_distance(tstats)

Summarize t-statistics into per group.

aggregate_pc(adata, treatment_key[, ...])

Measure distance between groups using principle components weighted by variance explained

aggregate_mahalanobis(adata, treatment_key)

Measure distance between groups using mahalanobis distance

Dimensionality-reduction#

Tools to perform dimensionality-reduction.

pca(adata[, n_comps, scale_by_var, copy, ...])

Principal component analysis [Pedregosa et al., 2011].

neighbors(adata[, n_neighbors, n_pcs, ...])

Compute a neighborhood graph of observations using the PCA representation.

umap(adata, **kwargs)

Embed the neighborhood graph using UMAP [McInnes et al., 2018].