Differential Abundance Analysis of the Human Tumor Microbiome
Benchmarking of four differential abundance tools applied to SMURF-normalised 16S abundance data from Nejman et al. (Science, 2020), comparing normal adjacent tissue (NAT) vs. tumour across three cancer types using paired patient samples. Data from Table S2 were rounded to integers and analysed at genus level across three data conditions (raw, mbImpute, BMDD) with 5% and 10% prevalence filters.
1 Data
Source and study context
The data originate from Nejman et al. (Science, 2020), a comprehensive characterisation of the intratumoral microbiome across 1,526 tumour samples and normal adjacent tissues from seven cancer types. The original study demonstrated that each tumour type harbours a distinct microbial community and that breast tumours have the richest and most diverse microbiome. From this dataset, three cancer types were selected: breast, lung, and ovary, the only types with both NAT and tumour samples from the same patients.
Paired patient design
Each sample comes from a paired patient; the same individual contributed both a NAT sample and a tumour sample. Only patients with both sample types were retained; unpaired patients were excluded. This within-patient design controls for inter-individual variation in baseline microbiome composition. The paired structure was incorporated explicitly into all four tools.
NAT (Normal Adjacent Tissue) is tissue sampled from the region immediately surrounding the tumour in the same patient, not from a healthy individual. It is the within-patient reference throughout this analysis. A genus labelled Tumour-enriched is more abundant in tumour relative to that same patient's NAT; NAT-enriched means the reverse.
Data preparation
Abundance data were extracted from Supplementary Table S2 of Nejman et al., which contains SMURF-normalised read counts. SMURF (Short MUltiple Regions Framework) is a multiplexed 5-amplicon 16S sequencing method developed by the original authors to improve resolution from low-biomass formalin-fixed paraffin-embedded samples. Values were rounded to the nearest integer to satisfy the integer-count requirements of count-based DA tools. Taxonomy was aggregated to genus level for all analyses. 16S rRNA sequencing cannot reliably discriminate species within the same genus in low-biomass samples, and genus-level aggregation produces more stable count distributions for modelling.
The input data are SMURF-normalised, not raw sequencing counts. DESeq2 and MaAsLin2 assume raw integer counts and apply their own normalisation internally; running them on pre-normalised data violates this assumption and likely inflates false positive rates, particularly for DESeq2. Results from compositionally-aware tools (ALDEx2, ANCOM-BC2) are more appropriate for this data type. Findings consistent across multiple tools carry the most confidence.
Analysis pipeline
2 Methods
A key motivation for using multiple tools is the well-documented discordance between DA methods. Nearing et al. (Nature Communications, 2022) benchmarked DA methods across 38 microbiome datasets and found average inter-tool overlap as low as 20–30%, with DESeq2 consistently the most liberal and ALDEx2 / ANCOM-BC among the most conservative. This analysis applies the same multi-tool framework to observe how discordance manifests in tumour microbiome data specifically.
DA tools
| Tool | Statistical framework | Paired design | Sensitivity |
|---|---|---|---|
| DESeq2 | Negative binomial GLM; empirical Bayes dispersion shrinkage; median-of-ratios normalisation; LFC shrinkage via apeglm | ~ Patient_ID + Group | Most liberal |
| ALDEx2 | Compositionally aware; CLR transformation over Monte Carlo Dirichlet instances; Welch's t-test + Wilcoxon rank-sum test; effect size = between-group / within-group CLR difference | paired.test = TRUE | Most conservative |
| ANCOM-BC2 | Corrects for unknown sampling fraction bias; standard (diff) and robust sensitivity (diff_robust) flags; random intercept for pairing | (1 | Patient_ID) | Conservative |
| MaAsLin2 | Per-taxon GLM; TSS normalisation + log transform; BH correction; positive coefficient = Tumour-enriched | random_effects = Patient_ID | Moderate |
Significance threshold
All tools applied a uniform threshold of q < 0.05 (Benjamini-Hochberg FDR). For ALDEx2,
significance was called when the minimum of the Welch BH (we.eBH) and Wilcoxon BH
(wi.eBH) q-values fell below 0.05. For ANCOM-BC2, both standard and robust flags were
recorded; all findings in this analysis were standard-significant but not robust-significant
(see Discussion).
Imputation
mbImpute uses a regularised regression framework to predict whether zeros are sampling artefacts and imputes those values. BMDD uses a Bayesian posterior framework for the same purpose. BMDD requires sufficient sample size for stable estimation and could not be applied to the ovary dataset (n = 24). Each imputed dataset was analysed in parallel with raw data to assess whether imputation changed the DA results.
3 Code
All scripts follow the same structure across cancer types and data conditions; only input file paths and the prevalence cutoff differ between runs. Internal filters within each tool are disabled; filtering is applied externally and uniformly before all tools.
Setup, data loading, and paired patient restriction
# Parameters alpha <- 0.05 # FDR threshold (q < 0.05) prev_cut <- 0.05 # 5% prevalence (set to 0.10 for 10% filter) # Load abundance matrix (taxa x samples) otu <- read.csv("abundanceint.csv", check.names = FALSE) rownames(otu) <- otu[["genus"]] otu[["genus"]] <- NULL counts_mat <- as.matrix(otu) # Load clinical metadata meta <- read.csv("clinical.csv", check.names = FALSE) # Restrict to paired patients (both NAT and Tumour) tab_pair <- table(meta[["Patient_ID"]], meta[["Group"]]) paired_ids <- names(which(rowSums(tab_pair > 0) == 2)) meta_paired <- meta[meta[["Patient_ID"]] %in% paired_ids, ] counts_paired <- counts_mat[, meta_paired[["Sample_ID"]], drop = FALSE] # External prevalence filter (applied before all tools) min_nonzero <- ceiling(prev_cut * ncol(counts_paired)) keep_taxa <- rowSums(counts_paired > 0) >= min_nonzero counts_filt <- counts_paired[keep_taxa, , drop = FALSE]
DESeq2
## Paired blocking: Patient_ID absorbs between-patient variance dds <- DESeqDataSetFromMatrix( countData = round(counts_filt), # integer counts required colData = meta_paired, design = ~ Patient_ID + Group ) dds <- estimateSizeFactors(dds, type = "poscounts") # sparse-data safe dds <- DESeq(dds) ## Extract Tumour vs NAT contrast with LFC shrinkage res_shrunk <- lfcShrink(dds, coef = "Group_Tumor_vs_NAT", type = "apeglm") res_deseq <- as.data.frame(res_shrunk) %>% mutate( significant = !is.na(padj) & padj < alpha, direction = case_when( !significant ~ "ns", log2FoldChange > 0 ~ "Tumor_enriched", log2FoldChange < 0 ~ "NAT_enriched" ) )
ALDEx2
## Samples ordered NAT then Tumour per patient for paired test meta_ord <- meta_paired %>% arrange(Patient_ID, Group) counts_aldex <- counts_filt[, meta_ord[["Sample_ID"]], drop = FALSE] conds <- as.character(meta_ord[["Group"]]) set.seed(123) x_clr <- aldex.clr(counts_aldex, conds, mc.samples = 128, denom = "all") x_tt <- aldex.ttest(x_clr, paired.test = TRUE) # within-patient pairing x_eff <- aldex.effect(x_clr) res_aldex <- data.frame(taxon = rownames(x_tt), x_tt, x_eff) %>% mutate( p_min = pmin(we.eBH, wi.eBH, na.rm = TRUE), significant = p_min < alpha, direction = case_when( !significant ~ "ns", diff.btw > 0 ~ "Tumor_enriched", diff.btw < 0 ~ "NAT_enriched" ) )
ANCOM-BC2
set.seed(123) out_ancom <- ancombc2( data = counts_filt, taxa_are_rows = TRUE, meta_data = meta_paired, fix_formula = "Group", rand_formula = "(1 | Patient_ID)", # random intercept for pairing p_adj_method = "BH", pseudo_sens = TRUE, # enables robust sensitivity flag prv_cut = 0, lib_cut = 0, # no internal filters alpha = alpha ) res_ancom_tbl <- out_ancom$res %>% mutate( significant = !is.na(q_GroupTumor) & q_GroupTumor < alpha, significant_robust = diff_robust_GroupTumor # sensitivity analysis flag )
MaAsLin2
fit_maaslin <- Maaslin2( input_data = counts_filt, input_metadata = meta_paired, output = "maaslin2_output", fixed_effects = "Group", # NAT vs Tumour random_effects = "Patient_ID", # random intercept per patient normalization = "TSS", transform = "LOG", analysis_method = "LM", correction = "BH", min_prevalence = 0 # no internal filter ) res_maaslin <- read_tsv("maaslin2_output/all_results.tsv") %>% filter(metadata == "Group") %>% mutate( significant = !is.na(qval) & qval < alpha, direction = case_when( !significant ~ "ns", coef > 0 ~ "Tumor_enriched", coef < 0 ~ "NAT_enriched" ) )
4 Results
Results show genera significant in more than one tool across any data condition, meeting the multi-tool consensus threshold. Single-tool findings (predominantly DESeq2) are described in the Discussion. DESeq2 identified up to 71 significant genera (lung, raw 5% filter) while ALDEx2 and ANCOM-BC2 found zero in the same condition, consistent with Nearing et al. (2022).
Breast cancer
| Genus | Direction | DESeq2 | ALDEx2 | MaAsLin2 | ANCOM-BC2 | Conditions |
|---|---|---|---|---|---|---|
| Staphylococcus | Tumor-enriched | ✓ | ✓ | ✓ | · | Raw, mbImpute, BMDD; 5% & 10% |
| Prevotella | Tumor-enriched | · | ✓ | ✓ | · | BMDD & mbImpute; 5% & 10% |
| Afipia | NAT-enriched | ✓ | · | ✓ | · | Raw, mbImpute; 5% & 10% |
| Kocuria | NAT-enriched | ✓ | · | ✓ | · | Raw, mbImpute; multiple filters |
| Sphaerobacter | NAT-enriched | ✓ | · | ✓ | · | Raw, mbImpute; multiple filters |
| Aureimonas | NAT-enriched | ✓ | · | ✓ | · | mbImpute; 5% & 10% |
| Actinobacillus | NAT-enriched | ✓ | · | ✓ | · | mbImpute; 5% & 10% |
Lung cancer
| Genus | Direction | DESeq2 | ALDEx2 | MaAsLin2 | ANCOM-BC2 | Conditions |
|---|---|---|---|---|---|---|
| Bacillus | NAT-enriched | ✓ | · | ✓ | · | Raw 10%; mbImpute 5% & 10% |
| Devosia | NAT-enriched | ✓ | · | ✓ | · | Raw 10%; mbImpute 5% & 10% |
| Friedmanniella | Tumor-enriched | · | ✓ | ✓ | · | BMDD 5% only (imputation-dependent) |
| Actinobacillus | Tumor-enriched | ✓ | · | ✓ | · | mbImpute 5% & 10% |
Ovarian cancer
| Genus | Direction | DESeq2 | ALDEx2 | MaAsLin2 | ANCOM-BC2 | Note |
|---|---|---|---|---|---|---|
| Staphylococcus | Tumor-enriched | ✓ | · | · | ✓ | Raw + mbImpute; prevalence essentially flat in Nejman Table S7 |
| Escherichia / Shigella | Tumor-enriched | ✓ | · | · | ✓ | Not supported by Nejman prevalence data; directional inconsistency across conditions |
Genera identified by more than one tool
| Genus | Cancer type | Direction | Tools |
|---|---|---|---|
| Staphylococcus | Breast | Tumor-enriched | DESeq2, ALDEx2, MaAsLin2 |
| Prevotella | Breast | Tumor-enriched | ALDEx2, MaAsLin2 |
| Afipia | Breast | NAT-enriched | DESeq2, MaAsLin2 |
| Kocuria | Breast | NAT-enriched | DESeq2, MaAsLin2 |
| Sphaerobacter | Breast | NAT-enriched | DESeq2, MaAsLin2 |
| Bacillus | Lung | NAT-enriched | DESeq2, MaAsLin2 |
| Devosia | Lung | NAT-enriched | DESeq2, MaAsLin2 |
| Friedmanniella | Lung | Tumor-enriched | ALDEx2, MaAsLin2 |
| Staphylococcus | Ovary | Tumor-enriched | DESeq2, ANCOM-BC2 |
| Escherichia / Shigella | Ovary | Tumor-enriched | DESeq2, ANCOM-BC2 |
5 Discussion
Staphylococcus in breast: most robust finding
Staphylococcus tumour enrichment in breast is the most reliable result of this analysis. It is significant across all three tools that detected it (DESeq2, ALDEx2, MaAsLin2), across all data conditions (raw, mbImpute, BMDD), and at both prevalence thresholds. ALDEx2, the most conservative tool, identified very few genera overall, making its consistent detection of Staphylococcus particularly meaningful.
This corroborates Nejman et al. (Table S7), which showed an 11.4% prevalence enrichment of the genus in breast tumours (binomial FDR < 0.0001) using an independent prevalence-based test. At the species level, the signal is driven primarily by S. haemolyticus (+8.8%, binomial FDR ≈ 0) with a secondary contribution from S. pasteuri; S. aureus is essentially flat. Independent corroboration exists in Urbaniak et al. (2016) and Hieken et al. (2016), both of which reported Staphylococcus enrichment in breast tumour tissue from separate cohorts.
Tool sensitivity: confirming Nearing et al. (2022)
The disparity in sensitivity between tools directly reproduces the central finding of Nearing et al., who found average inter-tool overlap as low as 20–30% across 38 datasets. DESeq2 finding 51 significant genera versus ALDEx2 finding 1 in the same condition (breast, raw 5% filter) is a stark illustration. This disparity is likely amplified here by the pre-normalised input, as DESeq2's internal normalisation operates on values that have already been normalised. The practical implication is that single-tool analyses of microbiome data can be highly misleading; multi-tool consensus is necessary to distinguish robust from fragile findings.
ANCOM-BC2: standard significant, robust not significant
Across all cancer types and conditions, ANCOM-BC2 findings were significant under the standard test
(diff = TRUE) but not the robust test (diff_robust = FALSE). This pattern
reflects sensitivity to the reference frame assumption, that most taxa are unchanged between conditions, which ANCOM-BC2 uses for bias correction. In tumour microbiome data where the entire microbial community
may shift, this assumption is commonly violated. The result is an informative property of the data, not a
methodological failure.
Effect of imputation
Friedmanniella in lung emerged as significant (ALDEx2 + MaAsLin2) only after BMDD imputation, not in any raw data condition. Cross-referencing Nejman Table S7 shows that Friedmanniella does have a lung prevalence signal (binomial FDR = 0.017, +5.4% tumour enrichment), suggesting BMDD recovered a genuine signal masked by sampling zeros. Findings unique to imputed data should generally be treated as hypothesis-generating, but this case has supporting prevalence evidence from the original study.
Ovary: underpowered
With only 24 samples (12 paired patients), the ovary dataset is substantially below the recommended minimum for reliable DA analysis. ALDEx2 and MaAsLin2 found zero significant genera in all conditions. The Escherichia / Shigella tumour enrichment called by DESeq2 and ANCOM-BC2 is not supported by Nejman's prevalence data (0.4% effect size, p = 0.93) and is likely a false positive driven by the small sample size. BMDD could not be applied to the ovary data due to insufficient sample size for stable Bayesian estimation.
Limitations
Pre-normalised input. SMURF-normalised counts were rounded to integers, violating the raw-count assumptions of DESeq2 and MaAsLin2. Compositionally-aware methods (ALDEx2, ANCOM-BC2) are more appropriate for this data type. Genus-level resolution. Aggregating to genus level is standard for low-biomass 16S data but loses species-level specificity. For example, S. haemolyticus rather than S. aureus is the primary driver of the breast signal. Sample size. The ovary cohort is severely underpowered, so the ovary-specific findings should not be treated as reliable. Low biomass. Tumour microbiome studies are susceptible to contamination; Nejman et al. implemented extensive controls, but this caveat applies to all downstream analyses.
References
Nejman D, et al. The human tumor microbiome is composed of tumor type-specific intracellular bacteria. Science. 2020;368(6494):973–980. doi:10.1126/science.aay9189
Nearing JT, et al. Microbiome differential abundance methods produce different results across 38 datasets. Nature Communications. 2022;13:342. doi:10.1038/s41467-022-28034-z
Lin H, Peddada SD. Analysis of compositions of microbiomes with bias correction. Nature Communications. 2020;11:3514.
Fernandes AD, et al. Unifying the analysis of high-throughput sequencing datasets. Microbiome. 2014;2:15.
Mallick H, et al. Multivariable association discovery in population-scale meta-omics studies. PLOS Computational Biology. 2021;17(11):e1009442.
Jiang R, Li WV, Li JJ. mbImpute: an accurate and robust imputation method for microbiome data. Genome Biology. 2021;22:192.
Urbaniak C, et al. The microbiota of breast tissue and its association with breast cancer. Applied and Environmental Microbiology. 2016;82(16):5039–5048.