Search⌘ K
AI Features

Neptune ML

Explore how Neptune ML combines Amazon Neptune's graph database with SageMaker's AI to train and deploy graph neural network models. Understand the complete pipeline from data export to real-time inference through Gremlin or SPARQL queries, enabling predictions like node and edge classification, regression, and link prediction. Gain insights into operational considerations such as retraining cadence, IAM roles, and endpoint scaling to effectively integrate machine learning into graph queries.

While Neptune Analytics provides deterministic, in-memory graph algorithms for investigatory workloads, many production scenarios demand something different: learned predictions that generalize from graph topology and node features to answer questions the data does not explicitly contain. Neptune ML fills this gap by integrating Amazon Neptune with Amazon SageMaker AI and the Deep Graph Library (DGL)An open-source framework optimized for building and training graph neural networks on large-scale graph-structured data. to deliver machine learning predictions directly through graph queries.

Neptune ML is not a standalone training system embedded inside the database engine. It orchestrates an external pipeline that exports graph data, trains graph neural network models on SageMaker infrastructure, and surfaces predictions back through Neptune's Gremlin or SPARQL query interface. The key distinction from Neptune Analytics is fundamental. Analytics runs deterministic algorithms such as PageRank or shortest path in memory, producing exact answers. Neptune ML produces probabilistic predictions learned from graph structure and feature signals, estimating outcomes that are not yet recorded in the graph.

Note: Neptune ML does not replace Neptune Analytics or vice versa. Production architectures often combine both, using Analytics for algorithmic scoring and ML for predictive inference, depending on whether the question requires a computed answer or a learned one.

Several authority terms anchor this lesson. Graph neural networks (GNNs)A class of deep learning models that learn node and edge representations by iteratively aggregating information from graph neighborhoods. form the model architecture. SageMaker training jobs handle compute-intensive model fitting. SageMaker hosted inference endpoints serve predictions at query time. The four core inference task families, node classification, edge classification, regression, and link prediction, define what Neptune ML can predict.

The following diagram illustrates how data flows through the Neptune ML pipeline from the graph store to application-consumable predictions.

End-to-end Neptune ML workflow moving graph data through export, preprocessing, training, and deployment before predictions surface in Neptune queries
End-to-end Neptune ML workflow moving graph data through export, preprocessing, training, and deployment before predictions surface in Neptune queries

The Neptune ML workflow

Each stage of the Neptune ML pipeline maps to a distinct AWS component and carries its own operational considerations. Understanding the boundaries between stages clarifies where cost, ...