What ML papers actually show about microplastic detection in water
Microplastics are plastic particles typically smaller than 5 mm. They show up in rivers, coasts, and wastewater worldwide. Finding and identifying them with traditional microscopy and spectroscopy is slow, expensive, and hard to scale. Over the last few years, hundreds of papers have proposed machine learning and computer vision as a fix—cameras on rivers, drones, satellites, Raman and FTIR “AI” classifiers, YOLO detectors, and more.
We ran a systematic map (not a hype blog post) to answer a practical question: what does the literature actually support if you need monitoring in real water bodies—especially where budgets, labs, and open data are limited?
This article explains the topic, the purpose of the research, what we found, and how you can use or share the work. Use the button above the article body to download the full report (~1 MB+ Markdown) with the complete systematic map, Colombia/LATAM annexes, all 71 claims, extraction tables, and protocol.
What we studied (the topic)
Scope: Publications from 2019 through 2025 on ML, deep learning, or computer vision applied to microplastic or nanoplastic detection in aquatic settings—freshwater, marine, coastal, wastewater, and related sediments when a clear detection pipeline is described.
Out of scope: Toxicity-only papers, policy without a model, food-web studies with no detection method, and pure polymer chemistry without an identification pipeline.
Differentiation: This is not “another review saying microplastics are bad.” Pollution reviews already exist. We focused on which modalities and model families appear in the literature, what metrics authors report, and what blocks field deployment—with explicit attention to resource-limited and Latin American contexts.
| Lens | Question |
|---|---|
| Modalities | Microscopy, Raman/FTIR, hyperspectral, RGB/drone, satellite, microfluidics? |
| Models | CNNs, YOLO-style detectors, classical ML on spectra, segmentation? |
| Metrics | Accuracy, mAP, precision, κ—reported completely or selectively? |
| Deployment | Lab-only vs field; capex; “low-cost edge” claims vs evidence |
| Geography | Global South field sites; Colombia; Magdalena/Caribbean relevance |
Why we did it (purpose)
Four goals drove the factory run:
-
Map the field — Build a reproducible corpus (OpenAlex harvest, screening logs, structured extraction) so practitioners can navigate methods without reading 228 abstracts ad hoc.
-
Cut through hype — Headline “98% accuracy” often refers to macro litter, satellite floating debris, or lab spectroscopy on prepared samples—not polymer-level microplastics in turbid river water.
-
Support decisions — Give procurement teams, utilities, and researchers a defensible picture: what to pilot, what to replicate locally, what not to buy from abstracts alone.
-
Colombia / LATAM honesty — Document whether local field validation exists in open full texts. In our corpus: no obtained primary reports a Colombia in-country field programme for aquatic ML/CV microplastic monitoring.
The output is a research factory: bibliography, PRISMA counts, 71 evidence-linked claims, a long systematic map manuscript, and a static evidence explorer—not a single vendor product pitch.
How we built the map (methods in brief)
| Stage | Count |
|---|---|
| Unique papers identified (deduped) | 228 |
| Title/abstract screened | 228 |
| Forwarded to full text | 116 |
| Open-access full texts obtained | 31 |
| Structured extraction rows | 37 |
| Evidence-linked claims | 71 |
| Search lock date | 2026-05-18 |
Source: OpenAlex, seven keyword queries, English-centric metadata (Spanish/Portuguese regional databases were not searched in this pilot—documented as a limitation).
Type of synthesis: Systematic map—we chart evidence and gaps; we do not pool incompatible metrics into one effect size.
What the evidence actually supports (pre-conclusion)
One sentence
Transfer to resource-limited and Latin American monitoring requires separating macro-litter computer vision from microplastic-specific identification, reporting detection metrics completely (e.g. precision and mAP), and prioritising tiered architectures that pair affordable imaging with spectroscopic confirmation—not headline accuracy alone.
Six findings you can defend from open full texts
-
Lab spectroscopy + ML delivers high polymer-class accuracy on defined environmental fractions when samples reach a hub (μFTIR, µ-Raman, bench Raman)—with high capital cost and heavy sample prep.
-
Microscopy and flow imaging can automate counting or classification on controlled matrices; matrix limits must be stated.
-
Field MP vision is emerging. One obtained full-text coastal study reports high precision (~85–88%) but low mAP (~34–36%) on small particles—usable for targeted audits, weak as sole compliance sensor.
-
Macro litter CV in Global South rivers (e.g. India urban river YOLO 89% mAP) is real but not substitutable for microplastic polymer monitoring.
-
Satellite and debris CV can reach high scenario accuracies for floating debris at resolutions that cannot resolve sub-millimetre microplastics.
-
In-situ aquatic MP sensing remains review-forward and prototype-heavy compared to ex-situ lab spectroscopy.
The metric trap (precision vs mAP)
Many stakeholders read “87% precision” and assume the system finds most particles. On one Thailand UV + Faster R-CNN field study, precision was ~85–88% while mAP was ~34–36%. Precision is conditional on detections; mAP penalises missed small objects. Report both when evaluating procurement options.
The task trap (litter ≠ microplastics)
Riverine YOLO papers often target floating solid waste, not polymer-level MPs in water. Satellite workflows may report 98% accuracy for debris patches at 10 m pixels. These are valuable for litter management KPIs—unsafe to relabel as microplastic compliance without relabelling, size class, and polymer confirmation.
Tiered monitoring (what we recommend as a hypothesis)
No obtained paper validates a single “AI microplastic box” for Colombia or LATAM. Evidence instead supports a layered design:
| Tier | Role | Readiness in corpus |
|---|---|---|
| 0 | Prevention / source (textiles, discharge) | Policy; not CV |
| 1 | Macro litter / debris surveillance (drone, satellite) | Moderate for litter; wrong task for MPs |
| 2 | Field screening (UV/RGB alerts) | Low–moderate; report mAP + precision |
| 3 | Laboratory confirmation (µ-Raman / μFTIR on subsets) | High lab performance; not edge |
| 4 | Research pilots (microfluidics, holographic flow) | Lab demos; small field n |
For Colombia, Magdalena, or Caribbean programmes: do not cite Colombia-validated ML/CV MP performance from this map. Do run local pilots with dual metrics (detection rate + polymer confirmation) before regulation or large procurement.
Colombia and LATAM (honest negative)
| Question | Answer in this corpus |
|---|---|
| Magdalena River + ML/CV MPs in obtained full text? | No |
| Colombian Caribbean coast + ML/CV MPs? | No |
| Any Colombia-linked paper? | One forward Raman ML study with author affiliation only—no field site in available text |
| Closest LATAM field MP+ML signal | Brazil beach RS (abstract; full text not obtained in this run) |
Transferable patterns (India river litter YOLO, Thailand UV screening, shared spectroscopy hubs) are documented in the downloadable report as method cards with explicit transfer risk—not as proven national solutions.
What we already created (artifacts)
| Artifact | Role |
|---|---|
| Full report (download) | This page’s button—Markdown bundle: map + LATAM + Colombia + claims + extraction + protocol |
| Systematic map manuscript | Long-form slr.md in the notebook research repo |
| Evidence explorer | Filterable HTML table (modality, matrix, search) |
| LATAM gap analysis | Regional synthesis |
| Colombia transferable / non-transferable methods | Actionable and “do not transfer” lists |
| Build-in-public thread outline | Social sharing with traceable paper_ids |
Journal paper? The map manuscript exists as a draft. It is not submitted for peer review unless you add human screening audit, tighten methods, and edit for a target venue. Faster paths: this article + download, preprint (Zenodo/OSF), or build-in-public thread.
Limitations (read before you cite us)
- OpenAlex only — retrieval and language bias
- 31 / 116 full texts obtained — paywall bias toward open-access synthesis
- Single-reviewer screening in the factory run — fine for a map; weak for formal SLR without audit
- Agent-assisted harvest and synthesis — citation audit and validation script are the corrective controls
- Not pooled statistics — do not compare “accuracy” across papers naively
What you can do next
- Researchers: Spot-check 10–15% of screening decisions; add SciELO/Redalyc pass; extend search after lock date.
- Programmes: Use tiered architecture as a pilot design, not procurement spec.
- Communicators: Use the download as the “show your work” appendix for funders or policy briefings.
The science is moving fast; the map is locked to 2026-05-18. The honest story is that lab ID is ahead of field MP vision, and local validation—especially in Colombia—still needs to be built, not assumed from global headlines.