Differential Abundance Analysis of the Human Tumor Microbiome

2025| DESeq2 ALDEx2 ANCOM-BC2 MaAsLin2

Benchmarking of four differential abundance tools applied to SMURF-normalised 16S abundance data from Nejman et al. (Science, 2020), comparing normal adjacent tissue (NAT) vs. tumour across three cancer types using paired patient samples. Data from Table S2 were rounded to integers and analysed at genus level across three data conditions (raw, mbImpute, BMDD) with 5% and 10% prevalence filters.

1 Data

Source and study context

The data originate from Nejman et al. (Science, 2020), a comprehensive characterisation of the intratumoral microbiome across 1,526 tumour samples and normal adjacent tissues from seven cancer types. The original study demonstrated that each tumour type harbours a distinct microbial community and that breast tumours have the richest and most diverse microbiome. From this dataset, three cancer types were selected: breast, lung, and ovary, the only types with both NAT and tumour samples from the same patients.

308

Breast samples

456

Lung samples

Ovary samples

Analysis conditions

Paired patient design

Each sample comes from a paired patient; the same individual contributed both a NAT sample and a tumour sample. Only patients with both sample types were retained; unpaired patients were excluded. This within-patient design controls for inter-individual variation in baseline microbiome composition. The paired structure was incorporated explicitly into all four tools.

NAT vs. Tumour

NAT (Normal Adjacent Tissue) is tissue sampled from the region immediately surrounding the tumour in the same patient, not from a healthy individual. It is the within-patient reference throughout this analysis. A genus labelled Tumour-enriched is more abundant in tumour relative to that same patient's NAT; NAT-enriched means the reverse.

Data preparation

Abundance data were extracted from Supplementary Table S2 of Nejman et al., which contains SMURF-normalised read counts. SMURF (Short MUltiple Regions Framework) is a multiplexed 5-amplicon 16S sequencing method developed by the original authors to improve resolution from low-biomass formalin-fixed paraffin-embedded samples. Values were rounded to the nearest integer to satisfy the integer-count requirements of count-based DA tools. Taxonomy was aggregated to genus level for all analyses. 16S rRNA sequencing cannot reliably discriminate species within the same genus in low-biomass samples, and genus-level aggregation produces more stable count distributions for modelling.

Note

The input data are SMURF-normalised, not raw sequencing counts. DESeq2 and MaAsLin2 assume raw integer counts and apply their own normalisation internally; running them on pre-normalised data violates this assumption and likely inflates false positive rates, particularly for DESeq2. Results from compositionally-aware tools (ALDEx2, ANCOM-BC2) are more appropriate for this data type. Findings consistent across multiple tools carry the most confidence.

Analysis pipeline

Source data extraction

Genus-level integer counts from Table S2; paired clinical metadata (Group: NAT / Tumour; Patient_ID)

Paired patient restriction

Retain only patients with both NAT and Tumour samples; align count matrix columns to metadata rows

Imputation (optional)

mbImpute (frequentist regression) and BMDD (Bayesian) applied independently to impute probable sampling zeros; raw data also analysed without imputation

Prevalence filtering

External filter before all tools: 5% and 10% thresholds tested separately; internal filters within all tools disabled

DA analysis: four tools in parallel

DESeq2, ALDEx2, ANCOM-BC2, MaAsLin2 run independently; significance threshold q < 0.05 (Benjamini-Hochberg)

Cross-tool comparison

Genera significant in more than one tool treated as consensus findings; single-tool findings noted separately

2 Methods

A key motivation for using multiple tools is the well-documented discordance between DA methods. Nearing et al. (Nature Communications, 2022) benchmarked DA methods across 38 microbiome datasets and found average inter-tool overlap as low as 20–30%, with DESeq2 consistently the most liberal and ALDEx2 / ANCOM-BC among the most conservative. This analysis applies the same multi-tool framework to observe how discordance manifests in tumour microbiome data specifically.

DA tools

Tool	Statistical framework	Paired design	Sensitivity
DESeq2	Negative binomial GLM; empirical Bayes dispersion shrinkage; median-of-ratios normalisation; LFC shrinkage via apeglm	`~ Patient_ID + Group`	Most liberal
ALDEx2	Compositionally aware; CLR transformation over Monte Carlo Dirichlet instances; Welch's t-test + Wilcoxon rank-sum test; effect size = between-group / within-group CLR difference	`paired.test = TRUE`	Most conservative
ANCOM-BC2	Corrects for unknown sampling fraction bias; standard (`diff`) and robust sensitivity (`diff_robust`) flags; random intercept for pairing	`(1 \| Patient_ID)`	Conservative
MaAsLin2	Per-taxon GLM; TSS normalisation + log transform; BH correction; positive coefficient = Tumour-enriched	`random_effects = Patient_ID`	Moderate

Significance threshold

All tools applied a uniform threshold of q < 0.05 (Benjamini-Hochberg FDR). For ALDEx2, significance was called when the minimum of the Welch BH (we.eBH) and Wilcoxon BH (wi.eBH) q-values fell below 0.05. For ANCOM-BC2, both standard and robust flags were recorded; all findings in this analysis were standard-significant but not robust-significant (see Discussion).

Imputation

mbImpute uses a regularised regression framework to predict whether zeros are sampling artefacts and imputes those values. BMDD uses a Bayesian posterior framework for the same purpose. BMDD requires sufficient sample size for stable estimation and could not be applied to the ovary dataset (n = 24). Each imputed dataset was analysed in parallel with raw data to assess whether imputation changed the DA results.

3 Code

All scripts follow the same structure across cancer types and data conditions; only input file paths and the prevalence cutoff differ between runs. Internal filters within each tool are disabled; filtering is applied externally and uniformly before all tools.

Setup, data loading, and paired patient restriction

Rshared across all scripts

# Parameters
alpha    <- 0.05    # FDR threshold (q < 0.05)
prev_cut <- 0.05    # 5% prevalence (set to 0.10 for 10% filter)

# Load abundance matrix (taxa x samples)
otu <- read.csv("abundanceint.csv", check.names = FALSE)
rownames(otu) <- otu[["genus"]]
otu[["genus"]] <- NULL
counts_mat <- as.matrix(otu)

# Load clinical metadata
meta <- read.csv("clinical.csv", check.names = FALSE)

# Restrict to paired patients (both NAT and Tumour)
tab_pair   <- table(meta[["Patient_ID"]], meta[["Group"]])
paired_ids <- names(which(rowSums(tab_pair > 0) == 2))

meta_paired   <- meta[meta[["Patient_ID"]] %in% paired_ids, ]
counts_paired <- counts_mat[, meta_paired[["Sample_ID"]], drop = FALSE]

# External prevalence filter (applied before all tools)
min_nonzero <- ceiling(prev_cut * ncol(counts_paired))
keep_taxa   <- rowSums(counts_paired > 0) >= min_nonzero
counts_filt <- counts_paired[keep_taxa, , drop = FALSE]

DESeq2

Rdesign: ~ Patient_ID + Group

## Paired blocking: Patient_ID absorbs between-patient variance
dds <- DESeqDataSetFromMatrix(
  countData = round(counts_filt),    # integer counts required
  colData   = meta_paired,
  design    = ~ Patient_ID + Group
)
dds <- estimateSizeFactors(dds, type = "poscounts")  # sparse-data safe
dds <- DESeq(dds)

## Extract Tumour vs NAT contrast with LFC shrinkage
res_shrunk <- lfcShrink(dds, coef = "Group_Tumor_vs_NAT", type = "apeglm")

res_deseq <- as.data.frame(res_shrunk) %>%
  mutate(
    significant = !is.na(padj) & padj < alpha,
    direction   = case_when(
      !significant        ~ "ns",
      log2FoldChange > 0  ~ "Tumor_enriched",
      log2FoldChange < 0  ~ "NAT_enriched"
    )
  )

ALDEx2

Rpaired.test = TRUE, mc.samples = 128

## Samples ordered NAT then Tumour per patient for paired test
meta_ord     <- meta_paired %>% arrange(Patient_ID, Group)
counts_aldex <- counts_filt[, meta_ord[["Sample_ID"]], drop = FALSE]
conds        <- as.character(meta_ord[["Group"]])

set.seed(123)
x_clr <- aldex.clr(counts_aldex, conds, mc.samples = 128, denom = "all")
x_tt  <- aldex.ttest(x_clr, paired.test = TRUE)   # within-patient pairing
x_eff <- aldex.effect(x_clr)

res_aldex <- data.frame(taxon = rownames(x_tt), x_tt, x_eff) %>%
  mutate(
    p_min       = pmin(we.eBH, wi.eBH, na.rm = TRUE),
    significant = p_min < alpha,
    direction   = case_when(
      !significant ~ "ns",
      diff.btw > 0 ~ "Tumor_enriched",
      diff.btw < 0 ~ "NAT_enriched"
    )
  )

ANCOM-BC2

Rrand_formula = "(1 | Patient_ID)", pseudo_sens = TRUE

set.seed(123)
out_ancom <- ancombc2(
  data          = counts_filt,
  taxa_are_rows = TRUE,
  meta_data     = meta_paired,
  fix_formula   = "Group",
  rand_formula  = "(1 | Patient_ID)",    # random intercept for pairing
  p_adj_method  = "BH",
  pseudo_sens   = TRUE,                   # enables robust sensitivity flag
  prv_cut = 0, lib_cut = 0,             # no internal filters
  alpha = alpha
)

res_ancom_tbl <- out_ancom$res %>%
  mutate(
    significant        = !is.na(q_GroupTumor) & q_GroupTumor < alpha,
    significant_robust = diff_robust_GroupTumor   # sensitivity analysis flag
  )

MaAsLin2

Rrandom_effects = Patient_ID, normalization = TSS

fit_maaslin <- Maaslin2(
  input_data      = counts_filt,
  input_metadata  = meta_paired,
  output          = "maaslin2_output",
  fixed_effects   = "Group",          # NAT vs Tumour
  random_effects  = "Patient_ID",     # random intercept per patient
  normalization   = "TSS",
  transform       = "LOG",
  analysis_method = "LM",
  correction      = "BH",
  min_prevalence  = 0               # no internal filter
)

res_maaslin <- read_tsv("maaslin2_output/all_results.tsv") %>%
  filter(metadata == "Group") %>%
  mutate(
    significant = !is.na(qval) & qval < alpha,
    direction   = case_when(
      !significant ~ "ns",
      coef > 0     ~ "Tumor_enriched",
      coef < 0     ~ "NAT_enriched"
    )
  )

4 Results

Results show genera significant in more than one tool across any data condition, meeting the multi-tool consensus threshold. Single-tool findings (predominantly DESeq2) are described in the Discussion. DESeq2 identified up to 71 significant genera (lung, raw 5% filter) while ALDEx2 and ANCOM-BC2 found zero in the same condition, consistent with Nearing et al. (2022).

Breast cancer

Breast n = 308 | 154 paired patients

Genus	Direction	DESeq2	ALDEx2	MaAsLin2	ANCOM-BC2	Conditions
Staphylococcus	Tumor-enriched	✓	✓	✓	·	Raw, mbImpute, BMDD; 5% & 10%
Prevotella	Tumor-enriched	·	✓	✓	·	BMDD & mbImpute; 5% & 10%
Afipia	NAT-enriched	✓	·	✓	·	Raw, mbImpute; 5% & 10%
Kocuria	NAT-enriched	✓	·	✓	·	Raw, mbImpute; multiple filters
Sphaerobacter	NAT-enriched	✓	·	✓	·	Raw, mbImpute; multiple filters
Aureimonas	NAT-enriched	✓	·	✓	·	mbImpute; 5% & 10%
Actinobacillus	NAT-enriched	✓	·	✓	·	mbImpute; 5% & 10%

Lung cancer

Lung n = 456 | 228 paired patients

Genus	Direction	DESeq2	ALDEx2	MaAsLin2	ANCOM-BC2	Conditions
Bacillus	NAT-enriched	✓	·	✓	·	Raw 10%; mbImpute 5% & 10%
Devosia	NAT-enriched	✓	·	✓	·	Raw 10%; mbImpute 5% & 10%
Friedmanniella	Tumor-enriched	·	✓	✓	·	BMDD 5% only (imputation-dependent)
Actinobacillus	Tumor-enriched	✓	·	✓	·	mbImpute 5% & 10%

Ovarian cancer

Ovary n = 24 | 12 paired patients | severely underpowered

Genus	Direction	DESeq2	ALDEx2	MaAsLin2	ANCOM-BC2	Note
Staphylococcus	Tumor-enriched	✓	·	·	✓	Raw + mbImpute; prevalence essentially flat in Nejman Table S7
Escherichia / Shigella	Tumor-enriched	✓	·	·	✓	Not supported by Nejman prevalence data; directional inconsistency across conditions

Genera identified by more than one tool

Genus	Cancer type	Direction	Tools
Staphylococcus	Breast	Tumor-enriched	DESeq2, ALDEx2, MaAsLin2
Prevotella	Breast	Tumor-enriched	ALDEx2, MaAsLin2
Afipia	Breast	NAT-enriched	DESeq2, MaAsLin2
Kocuria	Breast	NAT-enriched	DESeq2, MaAsLin2
Sphaerobacter	Breast	NAT-enriched	DESeq2, MaAsLin2
Bacillus	Lung	NAT-enriched	DESeq2, MaAsLin2
Devosia	Lung	NAT-enriched	DESeq2, MaAsLin2
Friedmanniella	Lung	Tumor-enriched	ALDEx2, MaAsLin2
Staphylococcus	Ovary	Tumor-enriched	DESeq2, ANCOM-BC2
Escherichia / Shigella	Ovary	Tumor-enriched	DESeq2, ANCOM-BC2

5 Discussion

Staphylococcus in breast: most robust finding

Staphylococcus tumour enrichment in breast is the most reliable result of this analysis. It is significant across all three tools that detected it (DESeq2, ALDEx2, MaAsLin2), across all data conditions (raw, mbImpute, BMDD), and at both prevalence thresholds. ALDEx2, the most conservative tool, identified very few genera overall, making its consistent detection of Staphylococcus particularly meaningful.

This corroborates Nejman et al. (Table S7), which showed an 11.4% prevalence enrichment of the genus in breast tumours (binomial FDR < 0.0001) using an independent prevalence-based test. At the species level, the signal is driven primarily by S. haemolyticus (+8.8%, binomial FDR ≈ 0) with a secondary contribution from S. pasteuri; S. aureus is essentially flat. Independent corroboration exists in Urbaniak et al. (2016) and Hieken et al. (2016), both of which reported Staphylococcus enrichment in breast tumour tissue from separate cohorts.

Tool sensitivity: confirming Nearing et al. (2022)

The disparity in sensitivity between tools directly reproduces the central finding of Nearing et al., who found average inter-tool overlap as low as 20–30% across 38 datasets. DESeq2 finding 51 significant genera versus ALDEx2 finding 1 in the same condition (breast, raw 5% filter) is a stark illustration. This disparity is likely amplified here by the pre-normalised input, as DESeq2's internal normalisation operates on values that have already been normalised. The practical implication is that single-tool analyses of microbiome data can be highly misleading; multi-tool consensus is necessary to distinguish robust from fragile findings.

ANCOM-BC2: standard significant, robust not significant

Across all cancer types and conditions, ANCOM-BC2 findings were significant under the standard test (diff = TRUE) but not the robust test (diff_robust = FALSE). This pattern reflects sensitivity to the reference frame assumption, that most taxa are unchanged between conditions, which ANCOM-BC2 uses for bias correction. In tumour microbiome data where the entire microbial community may shift, this assumption is commonly violated. The result is an informative property of the data, not a methodological failure.

Effect of imputation

Friedmanniella in lung emerged as significant (ALDEx2 + MaAsLin2) only after BMDD imputation, not in any raw data condition. Cross-referencing Nejman Table S7 shows that Friedmanniella does have a lung prevalence signal (binomial FDR = 0.017, +5.4% tumour enrichment), suggesting BMDD recovered a genuine signal masked by sampling zeros. Findings unique to imputed data should generally be treated as hypothesis-generating, but this case has supporting prevalence evidence from the original study.

Ovary: underpowered

With only 24 samples (12 paired patients), the ovary dataset is substantially below the recommended minimum for reliable DA analysis. ALDEx2 and MaAsLin2 found zero significant genera in all conditions. The Escherichia / Shigella tumour enrichment called by DESeq2 and ANCOM-BC2 is not supported by Nejman's prevalence data (0.4% effect size, p = 0.93) and is likely a false positive driven by the small sample size. BMDD could not be applied to the ovary data due to insufficient sample size for stable Bayesian estimation.

Limitations

Pre-normalised input. SMURF-normalised counts were rounded to integers, violating the raw-count assumptions of DESeq2 and MaAsLin2. Compositionally-aware methods (ALDEx2, ANCOM-BC2) are more appropriate for this data type. Genus-level resolution. Aggregating to genus level is standard for low-biomass 16S data but loses species-level specificity. For example, S. haemolyticus rather than S. aureus is the primary driver of the breast signal. Sample size. The ovary cohort is severely underpowered, so the ovary-specific findings should not be treated as reliable. Low biomass. Tumour microbiome studies are susceptible to contamination; Nejman et al. implemented extensive controls, but this caveat applies to all downstream analyses.

References

Nejman D, et al. The human tumor microbiome is composed of tumor type-specific intracellular bacteria. Science. 2020;368(6494):973–980. doi:10.1126/science.aay9189

Nearing JT, et al. Microbiome differential abundance methods produce different results across 38 datasets. Nature Communications. 2022;13:342. doi:10.1038/s41467-022-28034-z

Lin H, Peddada SD. Analysis of compositions of microbiomes with bias correction. Nature Communications. 2020;11:3514.

Fernandes AD, et al. Unifying the analysis of high-throughput sequencing datasets. Microbiome. 2014;2:15.

Mallick H, et al. Multivariable association discovery in population-scale meta-omics studies. PLOS Computational Biology. 2021;17(11):e1009442.

Jiang R, Li WV, Li JJ. mbImpute: an accurate and robust imputation method for microbiome data. Genome Biology. 2021;22:192.

Urbaniak C, et al. The microbiota of breast tissue and its association with breast cancer. Applied and Environmental Microbiology. 2016;82(16):5039–5048.