Neptune Database Fundamentals
Explore the operational fundamentals of Amazon Neptune, understanding its cluster architecture, endpoint routing, high availability mechanisms, continuous backups, and secure connectivity. This lesson equips you to manage Neptune's production-ready graph workloads, optimize failover, and perform large-scale data ingestion.
In the previous lesson, you explored graph database fundamentals: property graphs, RDF triples, vertices, edges, labels, and properties. Those concepts describe what a graph database stores. This lesson shifts focus to how Amazon Neptune runs those models as a production-grade managed service on AWS. Neptune is not simply a query endpoint you point at. It is a clustered database engine with its own distributed storage layer, dedicated compute instances, automated failover, continuous backups, and fine-grained access controls. Understanding these operational mechanics is essential for designing resilient, secure, and performant graph workloads, and for answering exam questions that test whether you can distinguish Neptune's architecture from relational or NoSQL patterns.
By the end of this lesson, you will understand Neptune's cluster architecture and endpoint routing, high availability and failover behavior, continuous backups and point-in-time recovery, secure VPC-based connectivity with optional IAM authentication, bulk loading from Amazon S3, and the three supported query languages. The next lesson dives into query syntax and modeling; here, the emphasis is on operational mechanics.
Cluster architecture and endpoints
Amazon Neptune organizes resources into a
Compute layer
The compute layer consists of database instances that process queries and manage connections. A Neptune cluster always has exactly one primary instance responsible for all write operations: inserts, updates, and deletes. You can add up to fifteen read replicas that serve read queries concurrently. Each replica maintains its own in-memory page cache, so adding replicas improves read throughput roughly linearly for concurrent workloads. However, replicas do not increase write throughput because all mutations still flow through the single primary.
Instance classes determine the CPU and memory available to each instance. You can choose different instance classes for the primary and replicas, allowing cost optimization where read replicas use smaller instances for lighter traversal workloads while the primary handles heavier mutation processing.
Storage layer
Neptune's storage is a distributed, fault-tolerant volume that replicates data six ways across three Availability Zones automatically. Storage scales from 10 GB up to 128 TB without manual provisioning or capacity planning. Because storage is decoupled from compute, you can ...