AlphaEarth is remapping the world in remarkable detail

AlphaEarth is remapping the world in remarkable detail

AlphaEarth, a new AI model from Google DeepMind, is remapping the world in stunning detail. This technology is a powerful tool for environmental monitoring, agriculture, and water management, making planetary-scale analysis efficient and accessible.
9 mins read
Aug 18, 2025
Share

Our view of the Earth has continually evolved, from the first maps drawn by hand to the satellite imagery that now provides a daily global view. Each new technology has brought us closer to a complete picture of our planet.

Today, we’re at the forefront of another major leap forward with AlphaEarth Foundations: a new AI model developed by Google DeepMind.

AlphaEarthBrown, Christopher F., Michal R. Kazmierski, Valerie J. Pasquarella, William J. Rucklidge, Masha Samsikova, Chenhui Zhang, Evan Shelhamer et al. "AlphaEarth Foundations: An embedding field model for accurate and efficient global mapping from sparse label data." arXiv preprint arXiv:2507.22291 (2025). is an AI model that integrates vast amounts of Earth observation data to create a unified digital representation of our world. It’s a powerful tool that helps us manage the immense scale of planetary data by distilling it into meaningful, actionable insights.

This newsletter will explore the challenges we face with a patchwork of data from different sensors and the scarcity of high-quality labels for machine learning models. We'll also unpack the core technology of AlphaEarth that aims to solve these problems at scale, its real-world applications, and how it represents a new chapter in global mapping.

The problem(s) with today’s planetary data#

Suppose you’re trying to understand the world by looking at a vast, scattered collection of photos; some taken on sunny days, others on foggy ones, and some at night.

You must manually sort through them, guess what’s in the blurry ones, and combine them into a coherent story. This is similar to the struggle we face when dealing with Earth Observation (EO) data. We have an immense amount of it — petabytes of satellite imagery and other environmental datasets collected constantly —but turning this raw data into useful, planet-scale information is incredibly difficult because of two major, intertwined challenges.

1. The challenge of scarcity and scale#

High-quality maps depend on having high-quality labeled data, yet these labels remain stubbornly scarce. While our satellites give us unprecedented volumes of raw data, the ground-truth information needed to train machine learning models is costly and time-consuming to acquire. Think of it as the difference between having trillions of puzzle pieces and only a few hundred photos of the finished puzzle to guide you.

The dilemma of Earth observation: Vast amounts of satellite data exist, but high-quality labeled data from “ground-truthing” efforts are rare, resulting in a limited, scattered understanding on a global scale
The dilemma of Earth observation: Vast amounts of satellite data exist, but high-quality labeled data from “ground-truthing” efforts are rare, resulting in a limited, scattered understanding on a global scale
  • The cost of ground-truthing: Making physical measurements and observations on the ground, a process known as ground-truthing, is a massive undertaking. Teams of researchers must travel to specific locations, often in remote areas, to take measurements, conduct surveys, and manually annotate data. This effort requires significant human and financial resources, from planning a field campaign to collecting and curating the labels.

  • The precision vs. coverage dilemma: This scarcity forces a fundamental trade-off. We can have highly precise, ground-based annotations for a very small area, but this data is extremely sparse when we try to apply it to a global scale. In contrast, datasets that cover large areas often have broad, less descriptive labels. For instance, a label might just say “forest” for a massive region, when dozens of different tree species and ecosystems exist within that area. This makes it difficult to train models with global coverage and fine-grained detail.

For example, a team might spend months collecting precise, ground-based data on different crop types in a small region of Ethiopia. They’ll have detailed labels for wheat, barley, and teff. However, this detailed data is extremely sparse compared to the vast agricultural lands across an entire continent. The scarcity of such labels makes it incredibly difficult to train models to produce accurate, large-scale crop maps without extensive and expensive fieldwork in every location.

2. The challenge of the patchwork #

Even when we have data, it’s often a disorganized “patchwork” of information. Our understanding of the planet is a mosaic stitched together from many different sensors and platforms, each with its quirks and limitations. Combining this data is known as multimodal data assimilation, and it is far from seamless.

The top visual shows how data from multiple sensors (Sentinel-2, Landsat, Sentinel-1, etc.) is collected at different times and often contains gaps. The bottom visual represents how AlphaEarth integrates this data into a continuous and consistent record over a “valid period,” providing a complete picture where a single source might fail
The top visual shows how data from multiple sensors (Sentinel-2, Landsat, Sentinel-1, etc.) is collected at different times and often contains gaps. The bottom visual represents how AlphaEarth integrates this data into a continuous and consistent record over a “valid period,” providing a complete picture where a single source might fail
  • Diverse data sources: We use a variety of satellite-based instruments to monitor the Earth.

    • Optical imagery (e.g., from Sentinel-2 and Landsat) gives us high-resolution, color-rich images. Its strength is visual detail, but it’s ineffective when persistent clouds, smoke, or haze cover a location.

    • Radar data (e.g., from Sentinel-1) can penetrate through clouds and darkness, providing a consistent signal regardless of weather or time of day. However, radar data is not intuitive to the human eye; it measures surface roughness and properties, not color.

    • 3D laser mapping (LiDAR) (e.g., from GEDI) provides incredible detail on the vertical structure of vegetation, giving us a 3D view of a forest canopy. The trade-off? LiDAR data is extremely sparse, covering only small, intermittent “swaths” of the planet’s surface.

  • The problem of inconsistency: Beyond the different types of data, there are inconsistency problems. These sources are captured at different times, resolutions, and orbital paths. Correcting these inconsistencies and filling in the gaps (for example, with a “best-available-pixel” approach) is a major, often imperfect, challenge.

For example, we might have an optical image from a Sentinel-2 satellite to monitor a forest, but clouds block it. We can also have radar data from a Sentinel-1 satellite, which sees through clouds but provides different information. Before a cohesive system existed, there was no easy way to synthesize these different data types, leading to gaps and inconsistencies in the final analysis.

Bespoke modeling and an unsustainable approach#

These deep-seated challenges have led to bespoke modeling efforts, where specialized, custom models are developed to solve a single, specific problem.

This is the dominant approach today. A research team might spend a year or more on a single project, navigating a complex and time-intensive pipeline that typically includes:

  1. Data collection and preprocessing: Gathering all the necessary data from different sources, manually cleaning it (e.g., removing cloud cover), and preparing it for a specific model.

  2. Custom model training: Building a machine learning model from scratch and training it with a limited, often sparse, set of labels.

  3. Inference and output: Running the model and producing a specific output, like a static map of one land-use type for a single year.

While these models are effective for their intended purpose, they are time-consuming and expensive (you may be noticing a trend here).

They are not a scalable solution for the immense and ever-growing volume of data we need to manage. This piecemeal approach is a fundamental roadblock to achieving a truly planetary-scale, on-demand understanding of our world.

This is the problem that AlphaEarth Foundations is meant to solve: to move beyond these piecemeal approaches and provide a single, universal solution for making sense of our planet’s data. Instead of treating each satellite image as a separate piece of information, AlphaEarth synthesizes data from multiple sources, including optical imagery, radar, and 3D laser mapping (LiDAR). This multimodal approach provides a consistent, comprehensive view of the planet, offering a clearer picture than ever before, even when parts of the view are obscured.

The core breakthrough: AI-powered pixels#

AlphaEarth’s central innovation is its use of satellite embeddings, an “AI-powered pixel” that holds a wealth of information in a compact, intelligent form.

AlphaEarth transforms multiple layers of data into a single, compact “satellite embedding.” Each point on the grid represents a 10-meter square on Earth. The arrow shows how a year’s worth of data for that square is condensed into a 64-dimensional vector, or “digital fingerprint.”
AlphaEarth transforms multiple layers of data into a single, compact “satellite embedding.” Each point on the grid represents a 10-meter square on Earth. The arrow shows how a year’s worth of data for that square is condensed into a 64-dimensional vector, or “digital fingerprint.”

Unlike a regular pixel showing a color or a light reading from a single moment, each AlphaEarth embedding is a 64-dimensional data point. This single data point summarizes an entire year of activity for a specific 10-square-meter area on Earth. It acts like a digital fingerprint for that location, capturing a wide range of characteristics, from seasonal vegetation changes to subtle land use shifts.

This technology is groundbreaking because it makes global analysis substantially more efficient. By condensing a year’s worth of data for each square, the embedding requires significantly less storage space — about 16 times less than other AI systems. This efficiency makes it practical to perform planetary-scale analysis without immense computational cost.

From data sources to intelligent insights#

The technology behind AlphaEarth is a deep learning model known as an “embedding field model.” This model is trained to process multiple data types simultaneously, known as multimodal data assimilation. By learning to correlate optical images with radar data, for example, the model can infer what’s happening on the ground even when clouds block a view.

The embeddings produced by the model are “analysis-ready.” This means developers and researchers don’t have to spend time on difficult, manual data preprocessing steps, such as atmospheric correction or cloud masking. The embeddings are structured in a way that is immediately compatible with machine learning algorithms, allowing for quick and powerful analysis. Because the embedding space is consistent over different years, it’s easy to compare data from one year to the next and detect subtle changes.

This process is a major evolution in managing and utilizing satellite data, shifting the effort from raw data management to meaningful insight generation.

Mapping a better future#

AlphaEarth is a new, powerful tool for addressing some of the most critical issues confronting our planet. Providing a consistent and highly detailed view of the Earth helps experts make better-informed decisions.

This illustrates AlphaEarth’s performance in a real-world crop-mapping task. The bar chart on the left compares the balanced accuracy (BA) of various models, demonstrating that AlphaEarth Foundations (aef) consistently achieves the highest score. The qualitative images on the right visually support this finding. The top-left map, produced by AlphaEarth, provides a clear, accurate, and spatially coherent representation of different crop types, such as “Forages” and “Cereals”. This map is a significant improvement over the top-right map, generated by a traditional method (CCDC), which appears less precise and blockier. The image at the bottom, showing a cloud-free satellite view of the location, serves as a visual reference for the models’ outputs.
This illustrates AlphaEarth’s performance in a real-world crop-mapping task. The bar chart on the left compares the balanced accuracy (BA) of various models, demonstrating that AlphaEarth Foundations (aef) consistently achieves the highest score. The qualitative images on the right visually support this finding. The top-left map, produced by AlphaEarth, provides a clear, accurate, and spatially coherent representation of different crop types, such as “Forages” and “Cereals”. This map is a significant improvement over the top-right map, generated by a traditional method (CCDC), which appears less precise and blockier. The image at the bottom, showing a cloud-free satellite view of the location, serves as a visual reference for the models’ outputs.
  • Environmental monitoring and conservation: AlphaEarth tracks changes to forests, wetlands, and coastal waters in unprecedented detail. This helps monitor deforestation, urban expansion, and the impact of climate change on delicate ecosystems.

  • Food security and agriculture: AlphaEarth provides a clearer, year-round picture of agricultural land, helping to analyze crop health and land use. This information is vital for addressing global food security.

  • Water resource management: The model’s ability to track changes in water bodies and coastal areas allows for better water resource management, which is crucial in a changing climate.

The technology’s efficiency and accessibility mean that these powerful capabilities are now within reach of a wider community of scientists and organizations worldwide.

Accessing the technology#

The results of the AlphaEarth model are made available through the Google Satellite Embedding V1 ANNUALhttps://developers.google.com/earth-engine/datasets/catalog/GOOGLE_SATELLITE_EMBEDDING_V1_ANNUAL dataset. This dataset is hosted on Google Earth Engine and includes embeddings for every 10-meter pixel on Earth’s land surface from 2017 to 2023. This gives developers and researchers a powerful, analysis-ready tool to create their projects and applications.

Ready to get started? Explore the dataset today and begin generating insights for your projects.

The technology is a testament to the collaborative efforts of Google DeepMind, Google Earth, and the research community, as detailed in the original paperBrown, Christopher F., Michal R. Kazmierski, Valerie J. Pasquarella, William J. Rucklidge, Masha Samsikova, Chenhui Zhang, Evan Shelhamer et al. "AlphaEarth Foundations: An embedding field model for accurate and efficient global mapping from sparse label data." arXiv preprint arXiv:2507.22291 (2025)., “AlphaEarth Foundations: An embedding field model for accurate and efficient global mapping from sparse label data.”

The evolution of Earth Observation is accelerating, and AlphaEarth is a clear sign that AI is leading the way toward a more detailed, intelligent, and insightful understanding of our world.


​​Intrigued by AlphaEarth’s AI-powered pixels? Our “Vector Databases: From Embeddings to Applications” course explores the technology that makes such innovations possible. You’ll learn how to work with the same kind of embeddings that AlphaEarth uses to make sense of a world of data and build your real-world applications.

Vector Databases: From Embeddings to Applications

Cover
Vector Databases: From Embeddings to Applications

Vector databases transform how we search, analyze, and recommend data in today’s AI-driven world. These databases are at the heart of modern applications like semantic search, multimodal search, recommendation systems, and retrieval augmented generation (RAG) for large language models. By using embeddings, which are numerical representations that capture the meaning of data, vector databases allow us to find similar information quickly and accurately, even across vast datasets. This makes them essential for building intelligent systems that understand data beyond simple keyword matching. In this vector databases course, you’ll learn to generate embeddings for various data types and use vector databases to store and query them. Using the power of embeddings and vector databases, you’ll build semantic search apps, recommendation systems, and multimodal search solutions. After completing this course, you’ll have the skills to determine when and how to effectively apply vector databases to different projects.

3hrs 15mins
Intermediate
17 Playgrounds
2 Quizzes

Written By:
Fahim ul Haq
The AI Infrastructure Blueprint: 5 Rules to Stay Online
Whether you’re building with OpenAI’s API, fine-tuning your own model, or scaling AI features in production, these strategies will help you keep services reliable under pressure.
9 mins read
Apr 9, 2025