Back to articles

What ML Papers Actually Show About Microplastic Detection in Water (2019–2025)

A systematic map of 228 studies—why lab spectroscopy is strong, river AI sensors are thin, and Colombia field validation is missing from open evidence.

Listen to article8 min read

What ML papers actually show about microplastic detection in water

Microplastics are plastic particles typically smaller than 5 mm. They show up in rivers, coasts, and wastewater worldwide. Finding and identifying them with traditional microscopy and spectroscopy is slow, expensive, and hard to scale. Over the last few years, hundreds of papers have proposed machine learning and computer vision as a fix—cameras on rivers, drones, satellites, Raman and FTIR “AI” classifiers, YOLO detectors, and more.

We ran a systematic map (not a hype blog post) to answer a practical question: what does the literature actually support if you need monitoring in real water bodies—especially where budgets, labs, and open data are limited?

This article explains the topic, the purpose of the research, what we found, and how you can use or share the work. Use the button above the article body to download the full report (~1 MB+ Markdown) with the complete systematic map, Colombia/LATAM annexes, all 71 claims, extraction tables, and protocol.


What we studied (the topic)

Scope: Publications from 2019 through 2025 on ML, deep learning, or computer vision applied to microplastic or nanoplastic detection in aquatic settings—freshwater, marine, coastal, wastewater, and related sediments when a clear detection pipeline is described.

Out of scope: Toxicity-only papers, policy without a model, food-web studies with no detection method, and pure polymer chemistry without an identification pipeline.

Differentiation: This is not “another review saying microplastics are bad.” Pollution reviews already exist. We focused on which modalities and model families appear in the literature, what metrics authors report, and what blocks field deployment—with explicit attention to resource-limited and Latin American contexts.

LensQuestion
ModalitiesMicroscopy, Raman/FTIR, hyperspectral, RGB/drone, satellite, microfluidics?
ModelsCNNs, YOLO-style detectors, classical ML on spectra, segmentation?
MetricsAccuracy, mAP, precision, κ—reported completely or selectively?
DeploymentLab-only vs field; capex; “low-cost edge” claims vs evidence
GeographyGlobal South field sites; Colombia; Magdalena/Caribbean relevance

Why we did it (purpose)

Four goals drove the factory run:

  1. Map the field — Build a reproducible corpus (OpenAlex harvest, screening logs, structured extraction) so practitioners can navigate methods without reading 228 abstracts ad hoc.

  2. Cut through hype — Headline “98% accuracy” often refers to macro litter, satellite floating debris, or lab spectroscopy on prepared samples—not polymer-level microplastics in turbid river water.

  3. Support decisions — Give procurement teams, utilities, and researchers a defensible picture: what to pilot, what to replicate locally, what not to buy from abstracts alone.

  4. Colombia / LATAM honesty — Document whether local field validation exists in open full texts. In our corpus: no obtained primary reports a Colombia in-country field programme for aquatic ML/CV microplastic monitoring.

The output is a research factory: bibliography, PRISMA counts, 71 evidence-linked claims, a long systematic map manuscript, and a static evidence explorer—not a single vendor product pitch.


How we built the map (methods in brief)

StageCount
Unique papers identified (deduped)228
Title/abstract screened228
Forwarded to full text116
Open-access full texts obtained31
Structured extraction rows37
Evidence-linked claims71
Search lock date2026-05-18

Source: OpenAlex, seven keyword queries, English-centric metadata (Spanish/Portuguese regional databases were not searched in this pilot—documented as a limitation).

Type of synthesis: Systematic map—we chart evidence and gaps; we do not pool incompatible metrics into one effect size.


What the evidence actually supports (pre-conclusion)

One sentence

Transfer to resource-limited and Latin American monitoring requires separating macro-litter computer vision from microplastic-specific identification, reporting detection metrics completely (e.g. precision and mAP), and prioritising tiered architectures that pair affordable imaging with spectroscopic confirmation—not headline accuracy alone.

Six findings you can defend from open full texts

  1. Lab spectroscopy + ML delivers high polymer-class accuracy on defined environmental fractions when samples reach a hub (μFTIR, µ-Raman, bench Raman)—with high capital cost and heavy sample prep.

  2. Microscopy and flow imaging can automate counting or classification on controlled matrices; matrix limits must be stated.

  3. Field MP vision is emerging. One obtained full-text coastal study reports high precision (~85–88%) but low mAP (~34–36%) on small particles—usable for targeted audits, weak as sole compliance sensor.

  4. Macro litter CV in Global South rivers (e.g. India urban river YOLO 89% mAP) is real but not substitutable for microplastic polymer monitoring.

  5. Satellite and debris CV can reach high scenario accuracies for floating debris at resolutions that cannot resolve sub-millimetre microplastics.

  6. In-situ aquatic MP sensing remains review-forward and prototype-heavy compared to ex-situ lab spectroscopy.

The metric trap (precision vs mAP)

Many stakeholders read “87% precision” and assume the system finds most particles. On one Thailand UV + Faster R-CNN field study, precision was ~85–88% while mAP was ~34–36%. Precision is conditional on detections; mAP penalises missed small objects. Report both when evaluating procurement options.

The task trap (litter ≠ microplastics)

Riverine YOLO papers often target floating solid waste, not polymer-level MPs in water. Satellite workflows may report 98% accuracy for debris patches at 10 m pixels. These are valuable for litter management KPIs—unsafe to relabel as microplastic compliance without relabelling, size class, and polymer confirmation.


Tiered monitoring (what we recommend as a hypothesis)

No obtained paper validates a single “AI microplastic box” for Colombia or LATAM. Evidence instead supports a layered design:

TierRoleReadiness in corpus
0Prevention / source (textiles, discharge)Policy; not CV
1Macro litter / debris surveillance (drone, satellite)Moderate for litter; wrong task for MPs
2Field screening (UV/RGB alerts)Low–moderate; report mAP + precision
3Laboratory confirmation (µ-Raman / μFTIR on subsets)High lab performance; not edge
4Research pilots (microfluidics, holographic flow)Lab demos; small field n

For Colombia, Magdalena, or Caribbean programmes: do not cite Colombia-validated ML/CV MP performance from this map. Do run local pilots with dual metrics (detection rate + polymer confirmation) before regulation or large procurement.


Colombia and LATAM (honest negative)

QuestionAnswer in this corpus
Magdalena River + ML/CV MPs in obtained full text?No
Colombian Caribbean coast + ML/CV MPs?No
Any Colombia-linked paper?One forward Raman ML study with author affiliation only—no field site in available text
Closest LATAM field MP+ML signalBrazil beach RS (abstract; full text not obtained in this run)

Transferable patterns (India river litter YOLO, Thailand UV screening, shared spectroscopy hubs) are documented in the downloadable report as method cards with explicit transfer risk—not as proven national solutions.


What we already created (artifacts)

ArtifactRole
Full report (download)This page’s button—Markdown bundle: map + LATAM + Colombia + claims + extraction + protocol
Systematic map manuscriptLong-form slr.md in the notebook research repo
Evidence explorerFilterable HTML table (modality, matrix, search)
LATAM gap analysisRegional synthesis
Colombia transferable / non-transferable methodsActionable and “do not transfer” lists
Build-in-public thread outlineSocial sharing with traceable paper_ids

Journal paper? The map manuscript exists as a draft. It is not submitted for peer review unless you add human screening audit, tighten methods, and edit for a target venue. Faster paths: this article + download, preprint (Zenodo/OSF), or build-in-public thread.


Limitations (read before you cite us)

  • OpenAlex only — retrieval and language bias
  • 31 / 116 full texts obtained — paywall bias toward open-access synthesis
  • Single-reviewer screening in the factory run — fine for a map; weak for formal SLR without audit
  • Agent-assisted harvest and synthesis — citation audit and validation script are the corrective controls
  • Not pooled statistics — do not compare “accuracy” across papers naively

What you can do next

  • Researchers: Spot-check 10–15% of screening decisions; add SciELO/Redalyc pass; extend search after lock date.
  • Programmes: Use tiered architecture as a pilot design, not procurement spec.
  • Communicators: Use the download as the “show your work” appendix for funders or policy briefings.

The science is moving fast; the map is locked to 2026-05-18. The honest story is that lab ID is ahead of field MP vision, and local validation—especially in Colombia—still needs to be built, not assumed from global headlines.

2026

Author

Marcus Chen