Want to learn how to USE AI technology to make money and/or your life easier? Join our FREE AI community here: https://www.skool.com/ai-with-apex/about
From Pokémon Go Scans to PDDL Plans: The Data Stack Behind “Physical AI”
Today’s thread is simple: machines get better at moving through the real world when perception becomes reusable infrastructure—and when planning becomes verifiable.
From crowdsourced city scans to formal solvers, the week’s research points to a new stack for robotics and embodied AI: sense, localize, plan, then act.
TL;DR
- Niantic Spatial says it’s training a Large Geospatial Model from 30B+ “posed images” captured across “millions of locations.”
- MIT introduced a dual vision-language approach that turns a single image into PDDL files + a solver-produced plan, reporting ~70% average success vs ~30% for its best baselines.
- An MIT Media Lab profile of Joseph Paradiso highlights sensing deployments from wearables to ecology, including acoustic monitoring for endangered honeybees in Patagonia.
- MIT’s Matthew Jones is building predictive models of tumor progression and treatment resistance, framing it as trying to anticipate cancer’s “moves” rather than reacting after resistance appears.
- NVIDIA’s guest post on Hugging Face pushes “open data for AI” and directs builders to datasets, recipes, and community tooling—signaling that “open” competition is increasingly about data, not just weights.
1) Niantic Spatial: Pokémon Go-era scans become a Large Geospatial Model
What happened
Niantic’s spatial computing spinout, Niantic Spatial, says it is training a “Large Geospatial Model” using 30+ billion “posed images” collected across “millions of locations.” The company frames the result as a way for machines to reconstruct, localize within, and understand real-world places.
Why it matters
This is one of the cleanest examples yet of consumer AR data collection turning into a foundation layer for autonomy: playful scanning becomes industrial-grade localization. It also sharpens the next big competition after text and image models—building “world models” that can anchor robots and devices in messy physical spaces.
Key details
- Niantic Spatial describes “posed images” as a core ingredient for its Large Geospatial Model, a format that pairs imagery with camera pose/geometry for scalable 3D understanding. (link)
- The company positions its services around reconstruction, localization, and understanding of real-world spaces. (link)
- Public background describes Niantic Spatial as a Niantic spinout and places it in the context of Niantic’s restructuring and shift in focus. (link)
Source links
https://www.nianticspatial.com/en/blog/niantic-spatial-day-one?utm_source=openai
https://en.wikipedia.org/wiki/Niantic_Spatial?utm_source=openai
2) MIT: A dual VLM that turns one image into a formal plan (PDDL + solver)
What happened
MIT researchers presented “VLM-guided formal planning (VLMFP),” a method that uses two vision-language models and a classical planner to generate step-by-step plans from a single image. In MIT’s description, a smaller model (SimVLM) describes the scene, simulates actions, and checks whether a goal is satisfied, while a larger model (GenVLM) converts that information into PDDL domain/problem files that a solver can execute.
Why it matters
The practical move here is forcing an AI system to externalize its intent into a formal representation that can be checked and solved, rather than relying on end-to-end neural reasoning for long-horizon tasks. It’s a credible bridge between perception (what’s in the image) and reliability (a plan a solver can validate and step through).
Key details
- MIT reports ~70% average success for VLMFP vs ~30% for the best baseline in its comparison framing. (link)
- MIT reports SimVLM achieved ~85% success at describing/simulating/checking goals in tests. (link)
- The workflow includes iterative refinement: solver output is compared with simulator expectations to improve the generated PDDL. (link)
- MIT links the work to an open-access arXiv preprint for additional methodological details. (link, link)
Source links
https://news.mit.edu/2026/better-method-planning-complex-visual-tasks-0311
https://arxiv.org/abs/2510.03182?utm_source=openai
3) MIT Media Lab: Joseph Paradiso’s sensing work across wearables, art, and ecology
What happened
MIT prof Joseph Paradiso is profiled for decades of work on wearable and environmental sensing and the idea of “responsive environments.” The piece spans prototypes and field deployments, emphasizing sensing as a cross-disciplinary tool that can be worn, embedded, and carried into the world.
Why it matters
If “physical AI” is the story of machines acting in the world, sensing is the quieter prerequisite: what you can reliably measure determines what you can safely automate. Paradiso’s career is also a reminder that data collection isn’t only web-scale scraping—it can be embodied, ecological, and designed for specific real-world constraints.
Key details
- The profile describes collaborations with National Geographic Explorers using sensors for animal behavior and environmental monitoring, including work involving lions/hyenas in Botswana and goats in Chile. (link)
- The article highlights acoustic sensors with onboard AI used for monitoring endangered honeybees in Patagonia. (link)
- Paradiso was named an IEEE Fellow in January for contributions to wireless wearable sensing and mobile energy harvesting. (link)
Source links
https://news.mit.edu/2026/mit-professor-joseph-paradiso-sensing-innovations-0310
4) Predictive oncology: Modeling how tumors evolve—and how resistance emerges
What happened
In a Q&A, MIT Assistant Professor Matthew Jones describes research aimed at building predictive models of tumor progression and therapy resistance across genetic, epigenetic, metabolic, and microenvironmental factors. The framing is explicitly forward-looking: understanding how tumors change in space and time to anticipate failure modes earlier.
Why it matters
Across AI and biology, a shared theme is shifting from reactive pattern recognition to forecasting dynamics—what happens next, not only what happened. In cancer research, that means trying to predict which resistance pathways are likely to appear, and when, rather than waiting for treatments to stop working.
Key details
- Jones describes therapy resistance as a central challenge: treatments can work initially and then fail as tumors evolve. (link)
- He outlines a multi-level view of tumor evolution spanning genetic, epigenetic, metabolic, and microenvironmental layers. (link)
- The Q&A uses a “play chess with cancer” metaphor to emphasize anticipating tumor evolution rather than reacting after resistance appears. (link)
Source links
https://news.mit.edu/2026/3-questions-building-predictive-models-characterize-tumor-progression-0310
5) NVIDIA on Hugging Face: The “open” conversation shifts toward datasets
What happened
NVIDIA published a guest post on Hugging Face positioning “open data for AI” as a key lever for progress, pointing readers to datasets, recipes, and community resources. The post fits a broader NVIDIA message: openness isn’t only about releasing model weights; it’s also about how training and evaluation data is packaged, shared, and reproduced.
Why it matters
As foundation models commoditize, data becomes the differentiator—especially in robotics and “physical AI,” where collecting high-quality multimodal ground truth is expensive and slow. This also creates a clearer battleground for transparency: provenance, documentation, and repeatable training pipelines.
Key details
- The Hugging Face post describes NVIDIA’s framing around open datasets and invites builders to explore NVIDIA datasets on the platform. (link)
- NVIDIA’s broader blog framing emphasizes open models and open data as part of its ecosystem strategy. (link)
Source links
https://huggingface.co/blog/nvidia/open-data-for-ai?utm_source=openai
https://blogs.nvidia.com/blog/open-models-data-ai/?utm_source=openai
Closing takeaway
This week’s research lands on a shared lesson: the next leap in real-world AI won’t come from “smarter” models alone, but from better inputs and tighter guarantees—richer sensing, stronger localization priors, and planning layers that can be checked before anything moves.
—
Want to learn how to USE AI technology to make money and/or your life easier? Join our FREE AI community here: https://www.skool.com/ai-with-apex/about











