AWS Solutions Architect Associate (SAA) 2018 – IV

Topics covered:

What RDS is used for?

RDS is for OLTP which means  online transaction processing process.
  • SQL
  • MYSQL
  • PostgreSQL
  • Oracle
  • Aurora
  • MariaDB
 
DynamoDB – NO SQL
RedShift  – OLAP (Online Analytic Processing)

RDS

RDS types of storage

General Purpose SSD – General Purpose SSD, also called gp2, volumes offer cost-effective storage that is ideal for a broad range of workloads. These volumes deliver single-digit millisecond latencies and the ability to burst to 3,000 IOPS for extended periods of time. Baseline performance for these volumes is determined by the volume’s size.
 

Baseline I/O performance for General Purpose SSD storage is 3 IOPS for each GiB. This relationship means that larger volumes have better performance. For example, baseline performance for a 100-GiB volume is 300 IOPS. Baseline performance for a 1-TiB volume is 3,000 IOPS. And baseline performance for a 5.34-TiB volume is 16,000 IOPS.

Volumes below 1 TiB in size also have ability to burst to 3,000 IOPS for extended periods of time. Burst is not relevant for volumes above 1 TiB. Instance I/O credit balance determines burst performance. For more information about instance I/O credits see, I/O Credits and Burst Performance.

Many workloads never deplete the burst balance, making General Purpose SSD an ideal storage choice for many workloads. However, some workloads can exhaust the 3000 IOPS burst storage credit balance, so you should plan your storage capacity to meet the needs of your workloads.

What happens when we modify the storage

https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_ModifyInstance.MySQL.html

RDS maintenance

Auto Minor Version Upgrade is a feature that you can enable to have your database automatically upgraded when a new minor database engine version is available.

 Upgrading RDS

There are two kinds of upgrades: major version upgrades and minor version upgrades. In general, a major engine version upgrade can introduce changes that are not compatible with existing applications. In contrast, a minor version upgrade includes only changes that are backward-compatible with existing applications.

The version numbering sequence is specific to each database engine. For example, Amazon RDS MySQL 5.7 and 8.0 are major engine versions and upgrading from any 5.7 version to any 8.0 version is a major version upgrade. Amazon RDS MySQL version 5.7.22 and 5.7.23 are minor versions and upgrading from 5.7.22 to 5.7.23 is a minor version upgrade.

For major version upgrades, you must manually modify the DB engine version through the AWS Management Console, AWS CLI, or RDS API. For minor version upgrades, you can manually modify the engine version, or you can choose to enable auto minor version upgrades.

A major version upgrade can take longer than a minor version upgrade. You can also run into compatibility issues during or after a major version upgrade, which are very unlikely after a minor version upgrade.

For both minor and major version upgrades, the system takes the following steps during the upgrade process for MySQL and MariaDB for Amazon RDS:

  1. A snapshot is taken (if backups are enabled) while the instance is still running the previous version.
  2. The instance shuts down in a special mode called a slow shutdown to ensure data consistency.
  3. The instance is restarted with mysqld running the new engine version with networking disabled to prevent remote connections.
  4. The mysql_upgrade program runs on the database to convert system and user tables to the new version.
  5. An RDS-specific upgrade script runs to upgrade RDS-managed tables and stored procedures.
  6. The instance is restarted with networking enabled to allow for remote connections.

The main difference between the process for a minor version upgrade and a major version upgrade is the use of the mysql_upgrade program. This program checks all tables (both system and user) on the database. Depending on the state of a table, the program might need to physically recreate the table to conform to the new version. In most cases, no physical changes are needed and mysql_upgrade merely updates metadata in the table to indicate that the table has been upgraded.

In the case of a minor version upgrade, these table checks are usually trivial. The entire process usually takes only a few seconds to a few minutes depending on the number of tables. However, a major version upgrade can require more extensive changes to tables, as discussed later.

MultiAZ fallacy about upgrades

One other caveat about upgrade downtime is how Multi-AZ fits into the picture. One common fallacy is that Multi-AZ configurations prevents downtime during an upgrade. We do recommend that you use Multi-AZ for high availability, because it can prevent extended downtime due to hardware failure or a network outage. However, in the case of a MySQL or MariaDB engine upgrade, Multi-AZ doesn’t eliminate downtime. The slow shutdown and the physical changes made on the active server by the mysql_upgrade program require this downtime.

Performance Insights

https://www.youtube.com/watch?v=4462hcfkApM

RDS Backups

There are two different types of Backups of AWS. Automated Backups and Database snapshots.
 

Automated backups

Automated backups allow you to recover your database to any point in time within a “retention period”. The retention period can be between one and 35 days. Automated backups will take a full daily snapshot and will also store transaction logs throughout the day. When you do a recovery, AWS will first choose the most recent daily backup, and then apply transaction logs relevant to that day. This allows you to do a point in time recovery down to a second, within the retention period.
 
Automated backups are enabled by default. You get free storage equal the size of your database. Backups are taken within a defined window. During the backup window, storage I/O maybe be suspended while your data is being backed up and you may experience elevated latency.
 

Snapshots

Are done manually, they are stored even after you delete the original RDS, unlike automated backups.
Whenever you restore, snapshot or backup, the restored version will be a new RDS with a new endpoint.

Point in time recovery

https://stackoverflow.com/questions/39053777/point-in-time-data-restore-on-amazon-rds

Backups and MultiAZ

IO suspension doesn’t happen for Multi-AZ. From AWS RDS user guide “During the automatic backup window, storage I/O might be briefly suspended while the backup process initializes (typically under a few seconds) and you might experience a brief period of elevated latency. No I/O suspension occurs for Multi-AZ DB deployments because the backup is taken from the standby.”

Enhanced Monitoring

la Enhanced Monitoring no supone un reinicio. Tiene un coste extra. No tiene relación con habilitar el performance schema.

Performance Schema

El performance schema si requiere reinicio. Si tenemos un parche pendiente de aplicar, y queremos por ejemplo habilitar el PE, podemos hacer el cambio en el parameterGroup (0 a 1) y luego aplicar el parche y hacemos las dos acciones en un solo reinicio.

Best practices

https://aws.amazon.com/blogs/database/best-practices-for-upgrading-amazon-rds-for-mysql-and-amazon-rds-for-mariadb/

Max Connections in SQL SERVER

There is a parameter “user connections” in SQL server RDS
This parameter defines Maximum number of simultaneous user connections. Please note that the service may use up to 40 connections for system maintenance.
You can change this parameter as per your choice.
 
Max simultaneous user connections is not related, with the profile of the RDS instance as it happens with MySQL RDS. If we want to handle up to 10K connections we just need to change parameter
Ex:
in MySQl default value of max connection is set based on instance memory [DBInstanceClassMemory/12582880]
But in MySQl also you can change the max connections to custom value.
in SQLl server the default value for max connection is 0 (which means no limit).
You can change it custom value
 

Encryption

https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Overview.Encryption.html

https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/UsingWithRDS.SSL.html

Amazon RDS encrypted DB instances provide an additional layer of data protection by securing your data from unauthorized access to the underlying storage. You can use Amazon RDS encryption to increase data protection of your applications deployed in the cloud, and to fulfill compliance requirements for data-at-rest encryption.

You can encrypt your Amazon RDS DB instances and snapshots at rest by enabling the encryption option for your Amazon RDS DB instances. Data that is encrypted at rest includes the underlying storage for DB instances, its automated backups, Read Replicas, and snapshots.

Amazon RDS encrypted DB instances use the industry standard AES-256 encryption algorithm to encrypt your data on the server that hosts your Amazon RDS DB instances. After your data is encrypted, Amazon RDS handles authentication of access and decryption of your data transparently with a minimal impact on performance. You don’t need to modify your database client applications to use encryption.

Amazon RDS also supports encrypting an Oracle or SQL Server DB instance with Transparent Data Encryption (TDE). 

TDE automatically encrypts data before it is written to storage, and automatically decrypts data when the data is read from storage.

TDE can be used with encryption at rest, although using TDE and encryption at rest simultaneously might slightly affect the performance of your database. You must manage different keys for each encryption method.

Amazon RDS supports TDE for the following SQL Server versions and editions:

– SQL Server 2017 Enterprise Edition

– SQL Server 2016 Enterprise Edition

– SQL Server 2014 Enterprise Edition

– SQL Server 2012 Enterprise Edition

– SQL Server 2008 R2 Enterprise Edition

Limitations

  • You can only enable encryption for an Amazon RDS DB instance when you create it, not after the DB instance is created.However, because you can encrypt a copy of an unencrypted DB snapshot, you can effectively add encryption to an unencrypted DB instance. That is, you can create a snapshot of your DB instance, and then create an encrypted copy of that snapshot. You can then restore a DB instance from the encrypted snapshot, and thus you have an encrypted copy of your original DB instance. For more information, see Copying a Snapshot.
  • DB instances that are encrypted can’t be modified to disable encryption.
  • You can’t have an encrypted Read Replica of an unencrypted DB instance or an unencrypted Read Replica of an encrypted DB instance.
  • Encrypted Read Replicas must be encrypted with the same key as the source DB instance.
  • You can’t restore an unencrypted backup or snapshot to an encrypted DB instance.
  • To copy an encrypted snapshot from one AWS Region to another, you must specify the KMS key identifier of the destination AWS Region. This is because KMS encryption keys are specific to the AWS Region that they are created in.The source snapshot remains encrypted throughout the copy process. AWS Key Management Service uses envelope encryption to protect data during the copy process. For more information about envelope encryption, see Envelope Encryption.

RDS IAM DB Authentication

IAM database authentication provides the following benefits:

  • Network traffic to and from the database is encrypted using Secure Sockets Layer (SSL).
  • You can use IAM to centrally manage access to your database resources, instead of managing access individually on each DB instance.
  • For applications running on Amazon EC2, you can use profile credentials specific to your EC2 instance to access your database instead of a password, for greater security

https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/UsingWithRDS.IAMDBAuth.html

RDS UPDATES

UPDATE 2019

  • New Script migration tool from Windows to Linux. You can use it in any virtual machine hosted in AWS or located in your datacenter as long as SQL SERVER is 2008 and above, and import to an EC2 instance running SQL SERVER 2016.
  • Import S3 Data into PostgreSQL: csv,text or binary supported from copy command of PostgreSQL
  • Storage AutoScaling:
    • automatically scales storage capacity with 0 downtimes. Applies to Mysql, PostgreSQL, Oracle, MariaDB. All you have to do is set up maximum storage limit and AutoScaling will monitor storage consumption and scale when required.
    • Can be enabled to existing or new RDS with no additional cost.
  • RDS Now Available on VMWARE
UPDATE 2018
  • Deletion protection, for Aurora, RDS.
  • Publish RDS PostgreSQL logs to CloudWatch
  • Always ON
    • now offers Always On Availability Groups for the Multi-AZ configuration in all AWS Regions. This new high availability option meets the requirements of enterprise-grade production workloads on SQL Server.
    • https://aws.amazon.com/about-aws/whats-new/2018/11/amazon-rds-for-sql-server-now-supports-alwayson-availability-groups/

Redshift

Redshift is optimized for batched write operations and reading high volumes of data to minimize I/O and maximize data throughput
 
  • Data Warehouse
  • OLAP transaction
  • Single node (160GB)
    • Multi-node
    • Leader node, manages client connections
    • Computer node, up to 128 compute nodes
  • Columnar Data Storage: instead using rows, AWS Redshift organizes the data by column.
  • Advanced Compressión: can be compressed much more than row-based because similar data is stored sequentially on disk. Dont require indexes or views, uses less space than traditional relational database systems. When loading data in a table, AWS Redshift automatically samples your data and selects most appropriate compressions scheme.
  • Massively Parallel Processing: distributes data and query loads across all nodes.
 

Pricing

    Compute Node Hours (total number of hours you can run across all your nodes. Billed for 1 unit per node per hour. No charged for leader node hours, just compute)
    Backup
    Data Transfer(only within a VPC)
 
  • Encrypted in transit using SSL
  • Encrypted at rest using AES-256
By default takes care of key management
    your own keys through HSM
    AWS Key Management Service
1 Zone availability
 Can restore snapshots to news AZs in the event of an outage.
 
 

Cross Region Automatic Snapshots

You can configure Amazon Redshift to copy snapshots for a cluster to another region. To configure cross-region snapshot copy, you need to enable this copy feature for each cluster and configure where to copy snapshots and how long to keep copied automated snapshots in the destination region. When cross-region copy is enabled for a cluster, all new manual and automatic snapshots are copied to the specified region. 

VPC Enhanced Routing

When you use Amazon Redshift Enhanced VPC Routing, Amazon Redshift forces all COPY and UNLOAD traffic between your cluster and your data repositories through your Amazon VPC. By using Enhanced VPC Routing, you can use standard VPC features, such as VPC security groups, network access control lists (ACLs), VPC endpoints, VPC endpoint policies, internet gateways, and Domain Name System (DNS) servers. Hence, enabling Enhanced VPC routing on your Amazon Redshift cluster is the correct answer.

You use these features to tightly manage the flow of data between your Amazon Redshift cluster and other resources. When you use Enhanced VPC Routing to route traffic through your VPC, you can also use VPC flow logs to monitor COPY and UNLOAD traffic. If Enhanced VPC Routing is not enabled, Amazon Redshift routes traffic through the Internet, including traffic to other services within the AWS network.

 
 
 

UPDATES

2019

  • automatic workload management for Redshift. Now makes it easy to maximize query throughput and get consistent performance for your most demanding analytics workloads. Automatic workload management (WLM) uses machine learning to dynamically manage memory and concurrency helping maximize query throughput. In addition, you can now easily set the priority of your most important queries, even when hundreds of queries are being submitted.

Aurora

  • 5x better performance  than mysql
  • 10 GB of allocated space at start, 10 GB increments up to 64 TB (autoscaling storage!!!!!!!)
  • can scale up to 32vCPUS 244GB of RAM, if you scale there is a maintenance window, couple of minutes
  • 2 copies of your db in each of availability zone, with a minimum of a 3 availability zone. 6 copies of your data.
  • designed to handle the loss of up to two copies of data without affecting database write availability and up to three copies of without affecting read availability.
  • storage is self-healing.

Replicas

2 types of replicas are available
    Aurora Replicas, up to 15
        basically separate database from your original database
    MySQL Read Replicas, up to 5 

 

 

Failover

In case of problem with master aurora,  you’ll failover automatically to your Aurora Replica.
 
Failover is automatically handled by Amazon Aurora so that your applications can resume database operations as quickly as possible without manual administrative intervention.
 
If you have an Amazon Aurora Replica in the same or a different Availability Zone, when failing over, Amazon Aurora flips the canonical name record (CNAME) for your DB Instance to point at the healthy replica, which in turn is promoted to become the new primary. Start-to-finish, failover typically completes within 30 seconds.
 
If you do not have an Amazon Aurora Replica (i.e. single instance), Aurora will first attempt to create a new DB Instance in the same Availability Zone as the original instance. If unable to do so, Aurora will attempt to create a new DB Instance in a different Availability Zone. From start to finish, failover typically completes in under 15 minutes.
 
Aurora relays in a virtual machine located in a one availability zone, for example eu-west-1c, if something fails you will have to failover to some Aurora replica.
 
The cluster DNS failover  of the cluster endpoint is automatic.

Backup retention

from 1 to 35 days
 

Custom Endpoints

Imagine scenario where there is a requirement to optimize your database workloads in your cluster where you have to direct the write operations of the production traffic to your high-capacity instances and point the reporting queries sent by your internal staff to the low-capacity instances. How do you achieve that? With custom endpoints.

The custom endpoint provides load-balanced database connections based on criteria other than the read-only or read-write capability of the DB instances. For example, you might define a custom endpoint to connect to instances that use a particular AWS instance class or a particular DB parameter group. Then you might tell particular groups of users about this custom endpoint. For example, you might direct internal users to low-capacity instances for report generation or ad hoc (one-time) querying, and direct production traffic to high-capacity instances
 

 

RDS TIPS

 
RDS – OLTP
    SQL
    MYSQL
    POSTGRESL
    ORACLE
    MARIA
    AURORA
 
NO SQL
    DYNAMO
 
RedShift – OLAP
 
Elastic
    Redis
    Memcached
 
Limit RDS storage
 
Aurora 64 TiB
MySQL 16 TB
MariaDB 16 TB
 
Maximum Storage RDS VOLUME
SG RDS port protocol number
Maximum IOPS capacity RDS instance is 30K IOPS

DynamoDB

Mirar el siguiente post:

AWS DEVELOPER 2019

 

 

DMS (Data Migration Service)

WS Database Migration Service can migrate your data to and from most of the widely used commercial and open source databases. It supports homogeneous migrations such as Oracle to Oracle, as well as heterogeneous migrations between different database platforms, such as Oracle to Amazon Aurora. Migrations can be from on-premises databases to Amazon RDS or Amazon EC2, databases running on EC2 to RDS, or vice versa, as well as from one RDS database to another RDS database. It can also move data between SQL, NoSQL, and text based targets
 
In heterogeneous database migrations the source and target databases engines are different, like in the case of Oracle to Amazon Aurora, Oracle to PostgreSQL, or Microsoft SQL Server to MySQL migrations. In this case, the schema structure, data types, and database code of source and target databases can be quite different, requiring a schema and code transformation before the data migration starts. That makes heterogeneous migrations a two step process
 
First use the AWS Schema Conversion Tool to convert the source schema and code to match that of the target database, and then use the AWS Database Migration Service to migrate data from the source database to the target database. All the required data type conversions will automatically be done by the AWS Database Migration Service during the migration. The source database can be located in your own premises outside of AWS, running on an Amazon EC2 instance, or it can be an Amazon RDS database. The target can be a database in Amazon EC2 or Amazon RDS.
 
  • Allows to migrate data from
    • on premises
    • EC2
    • supports
      • Oracle
      • Microsoft SQL
      • MySQL
      • MariaDB
      • PostgreSQL
      • Sap
      • MongoDB
      • Db2
    • from this sources to
      • on premises and EC2 instances databases
      • RDS
      • Redshift
      • DynamoDB
      • S3
      • ElasticCache
      • Kinesis
      • DocumentDB
  • You can do migrations of type
    • homogeneus
    • heterogeneus
      • if you dou heterogeneus migration you will need SCT (Schema Conversion Tool)

DocumentDB

 
Características:
 
 * Amazon indica que ese servicio soporta la API de Mongo 3.6(ahora la ultima “stable” de MongoDB es la 4.0). https://docs.aws.amazon.com/documentdb/latest/developerguide/mongo-apis.html 
 * Tengo dudas que Amazon anadira el soporte de las nuevas versiones de MongoDB en relación con el cambio de licensing de MongoDB al principio de diciembre.(explico esto después)
 * La instancia mas pequeña que se puede elegir es la db.r4.large
 * el sistema de backup permite el “point-in-time-recovery” al segundo. Significa que se puede restaurar la base de datos de MongoDB a un momento definido
 * Soporta “encyrption at rest” y connection SSL sin coste adicional
 * Necesita el driver version >= 3.4
 
 
 
MongoDB new licensing(SSPL):
 
Al principio del mes de Deciembre, MongoDB cambio el licensing de su producto para obligar todos los provedores que venden un servicio con el soporte de la API de MongoDB a abrir sus codigos.
Esto no se aplica para la versiones anterior a la 4. 
 
Amazon no es una empreza que le gusta abrir/compartir su codigo. Entonces esto significa que pueden vender un servicio con el soporte de MongoDB hasta la version 3.6 del API pero no con las nuevas versiones.
 
No sé si esto va a cambiar en el futuro. Depende de Amazon.
 
 
Comparacion:
 
 
MongoDB: Configuracion actual Rental(1 ReplicaSet – 4 instances)
 
 
Instance Type:        m4.large  (8GB Ram, 2vCPU)  -> 0.111$/hour        = 79.92$/month/instance
Disks:                      150GB GP2                              -> 0.11$/GB/month = 16.5$/month/instance 
 
Total:
 
319.68$/month para las 4 instancias + 66$/month para el storage = 385.68$/month (esto no inclue el coste S3 de los backup)
 
DocumentDB(MongoDB): 
 
El precio de ese servicio, depende del tipo de instancia, del storage y de los I/O.
 
Instance Type:    db.r4.large(15.25GB Ram, 2vCPU) -> 0.306$/hour                                     -> 220.32$/month/instance
Storage:              0.11$/GB/month                             -> 10GB(tamano de vuestro dataset)  -> 10*0.11 = 1.1$/month
I/O:                      0.22$/1 million requests/month     -> 10*                                                   -> 10*0.22  = 2.2$/month  
 
 
*es complicado de hacer una estimación para los I/O. Ademas Amazon dice que hizo optimización a este nivel para reducirlos(no siempre se necesita acceder al disco para leer or escribir datos)
 
 
Para guardar la alta disponibilidad, se necesitaria a menos 2 instancias(1 Primary y 1 Secondary). Como estas instancias tienen doble recursos en comparación a las vuestras y con las optimizaciones de ese servicio, me parece suficiente 2(No hace falta dedicar un Secondary para hacer los backups).
 
Total:
 
440.64$/month para las 2 instancias + 1.1$/month para el storage + 2.2$/month de I/O  = 443.94$/month (el backup esta incluido en el precio cuando se hace solo 1 backup cada dia)
 
 
PRO:
 
 – Servicio gestionado por Amazon con alta disponibilidad y auto recuperación/creación de los nodos si ocurre un problema.
 – de lo que he visto/leído el rendimiento es muy potente
 – Para añadir un nuevo Secondary se hace en menos de 5minutos.
 – backup automático con “point-in-time restore” al segundo. Significa que se puede restaurar la base de datos de MongoDB a un momento especifico.
 – soporta la integración con VPC entonces se puede guardarlo privado como ahora.
 – se puede importar/exportar los backups como si fuese un MongoDB “standard”
 
 
CON:
 
 – La instancia mas pequeña que se puede elegir es la db.r4.large(precio alto)
 – Dudas sobre su evolución para añadir soporte de las nuevas versiones del API de MongoDB
 – Coste adicional 60$/month 

Leave a Reply

Your email address will not be published. Required fields are marked *