Hands-on labs/tutorials for Neptune are fewer & further between, than those for other services. Thought I’d have a go at writing one.

Useful resources:

Neptune graph database is built to handle billions of relationships with query access at millisecond latency, on a cluster model whereby you have 1 primary master instance, and up to 15 read replicas* governed in a configured “pecking order”. These can be accessed via the usual tools:

  • AWS Console in browser (steps listed below)
  • scripts
  • CLI
  1. Login at aws.amazon.com and select Region.
  2. Pick Neptune from services dropdown, and Databases from left sidebar/drawer.
  3. A database is a cluster of writers & readers; these are shown in hierarchical nested view showing useful metrics incl status & CPU.
  4. Using orange button for Create Database, explore the basic config including Engine Version (Engine Releases documented here), DB Cluster Identifier (globally unique), Templates , DB Instance Size (resembling the classic EC2 instance sizes such as db.t3.medium at c10ph), Availability & Durability (incl AZ preference), and Connectivity (I’ve yet to need to use an option other than “Default VPC” but your security boundary requirements may be more advanced).
  5. Additional Configuration is available incl DB Instance Identifier, DB Cluster Parameter Group, DB Parameter Group, IAM DB Authentication, Rollover Priority, backup & encryption options (retention periods 1–35 days, and key management) and audit logging (e.g. to CloudWatch).
  6. Maintenance, maintenance window, and deletion protection, are features towards end of Additional Configuration which you may find particularly useful but require a detailed understanding in order to know how to configure safely & cost-effectively. Please refer to docs before using these.
  7. Whilst waiting for creation to complete (Status will change from Creating to Available), create a notebook instance for analysis of the cluster: select Notebooks from the left sidebar/drawer; hit Create Notebook; select instance type (as above the default — mt.t3.medium — has served me well). These are of course SageMaker notebooks behind the scenes (so you access/open them via the SageMaker service selected from Services dropdown); they are powered by Jupyter and are sometimes referred to as “workbenches”.

Every cluster has a default parameter group. One of these is neptune_lab_mode used for enabling experimental features which are recommended not to be used in Production; these have included e.g. Neptune Streams… for the full list/pipeline please visit link at top of this article. Another is neptune_query_timeout which I have found useful in projects to date.

*the 15 is a limit found consistently across AWS with Aurora (another databasing service) also maxing out at 15 “Aurora replicas”; I expect this to rise in 2021 and will link to product announcements as they are published.

DataCat @AstraZeneca. Bass/BV @StornowayBand. Fan @lowislandmusic. CTO @Tigmuso. Voc/Gtr @DrachmaBand. DataEng @ICISOfficial. DataSys @GBioinf. DataSci @GA.