Amazon Redshift Architecture

Oli Steadman
2 min readDec 16, 2020

New users may already be familiar with SQL Clients and/or BI Tools; conceptually Redshift simply gives these tools (via JDBC/ODBC) a Leader node to which they connect, which orchestrates a set of Compute nodes. Redshift performs Load/Unload/Backup/Restore from S3 as its (exabyte-scale) data source.

Redshift Spectrum goes further, supplying an elastic fleet of Compute nodes that query S3 in any of the open formats (including those listed below under “Data loading via COPY command”).

Worker threads are one of the many helpful features operating behind the scenes; they automatically collect, to enable the ANALYZE command (one manual tool for a Redshift user to be able to perform diagnostics ad hoc; another is arguably the EXPLAIN command). In the same vein, Amazon Redshift Advisor offers specific, customised recommendations to improve performance and decrease operating costs on the cluster. Advisor bases its recommendations on M-powered observations on the performance statistics.

Popular models used in Redshift

  • Star
  • Highly denormalised
  • Snowflake

Data types available in Redshift

These adhere to standard SQL data types but alwas handy to have them printed out for ref:

  • Numeric: smallint, integer, bigint, decimal, real, double precision
  • Text: char, varchar
  • Datetime: date, timestamp, timestamptz
  • Other: geometry (brief but insightful example here), boolean

Data loading via COPY command

Impressive versatility here with the ability to cross-account reference specific buckets… and it adheres to the command syntax already familiar to users of DynamoDB, EMR, SSH, etc:

COPY <TABLE> from <location> credentials “aws_access_key_id=<access_key>; aws_secret_access_key=<secret_key>” iam_role “arn” region;

This can be further enhanced with options including

  • specific formats incl CSV, TXT, JSON, ORC, Parquet, Avro
  • compression details
  • encryption details

--

--