Database Selection

Database Types:

Relational Databases
- Based on the relational model
  - Organizes data in tables with rows and columns
- Standard choice due to:
  - Robust consistency
  - Support for complex queries
  - Adherence to ACID properties
    - Guaranteeing data consistency and integrity
  - Enforce consistent structure
    - It facilitates the management and maintenance of data
  - Referential integrity
    - Relationships between tables
    - Ensures efficient querying of related data
    - Supports complex data relationships
  - SQL Support
  - Support transactions
    - Actions either succeed or fail as a whole
  - Indexing and optimization
    - It offers various indexing techniques and query optimization strategies
    - Improves query performance and reduce resource consumption
- Drawbacks:
  - Limited scalability
    - Scaling relational databases horizontally can be challenging
  - Rigidity
    - The predefined schema in relational databases can make it difficult to adapt to changing requirements
  - Performance issues with large datasets
    - Performance issues when dealing with complex queries and large-scale data manipulations
  - Inefficient for unstructured or semi-structured data

Database	Performance	Scalability	Ease of Use	Support for complex queries	License	Community Support
MySQL	High	Medium	High	High	Open Source	Strong
PostgreSQL	High	High	Medium	Very High	Open Source	Strong
MS SQL Server	High	High	Medium	Very High	Proprietary	Strong
Oracle	Very High	Very High	Low	Very High	Proprietary	Moderate

NoSQL
- Developed as a response to the limitations of relational databases, particularly in terms of:
  - Scalability
  - Flexibility
  - Performance under certain conditions
- Do not strictly follow the relational data model or traditional table-based storage
- Store data in various formats
- Types:
  - Document-based databases
    - Store data in semi-structured documents such as JSON or BSON
    - It provides great flexibility in data modeling
    - It allows more dynamic schemas
    - Well-suited for applications that deal with hierarchical or nested data structures
    - E.g.
      - Mongo DB
      - Couchbase
  - Column-based – Wide-column stores, Column-family stores
    - Organizes data in columns rather than rows
    - Optimizes column-based queries providing:
      - Improved compression
      - Better read performance
    - Designed for applications that need to store and query large amounts of data across many nodes
    - Popular choice for:
      - Big Data and Analytics applications
      - Applications with high write and read workloads
    - E.g.
      - Apache Cassandra
      - HBase
  - Key-Value stores
    - Store data as key-value pairs
    - Ideal for:
      - High-speed reads and writes
      - Horizontal scalability
    - It can serve as:
      - Caching layers
      - Session stores
      - Configuration storage
    - Popular choice for:
      - Applications where performance and low latency are crucial
      - Gaming platforms, real-time analytics systems, and recommendation engines
    - E.g.
      - Redis
      - Amazon DynamoDB
  - Graph Databases
    - Store data as nodes and edges in a graph
    - Enables efficient processing of:
      - Complex relationships
      - Traversals
      - Graph-Based Algorithms
    - Useful for applications that:
      - Involve intricate relationships between entities
        Social networks
        Fraud detection systems
        Recommendation engines
      - Provide powerful querying capabilities for traversing and analyzing interconnected data.
    - E.g.
      - Neo4j
      - Amazon Neptune
- Drawbacks:
  - Lack of standardization
    - They don’t follow a standardized query language
    - Use their own query language or APIs
    - Big learning curves
    - Difficulties when migrating between different NoSQL databases
    - Difficulties when integrating them with other systems
  - Weaker consistency
    - It employs eventual consistency to achieve higher performance and availability
  - Limited support for complex queries and transactions

Database	Type	Query Language	ACID	Consistency Model
MongoDB	Document-oriented	MLQ	Yes (with trans)	Eventual
Couchbase	Document-oriented	N1QL (SQL for JSON)	Yes (with trans)	Eventual
Redis	Key-value store	Redis command	Yes (with trans)	Eventual
Apache HBase	Wide-Column store	HBase Shell / Java API	No (Strong cons. With R/W trans)	Strong consistency
Cassandra	Wide-Column store	CQL	No (tunnable cons.)	Tunable
Neo4j	Graph Database	Cypher	No	Strong consistency

New SQL
- Modern approach to combine the strengths of both relational and NoSQL databases.
- They maintain characteristics from Relational Databases:
  - Relational data model
  - ACID properties
  - SQL support
- While adding functionalities of NoSQL databases:
  - Improved scalability
  - Distributed architecture
  - Performance enhancements
- Designed to address the challenge of modern applications such as:
  - Handling large scale, distributed, and highly concurrent workloads without sacrificing data consistency and integrity
- Main characteristics
  - Distributed architecture
    - Leverage data partitioning and replication across multiple nodes or data centers
    - Good for:
      - Fault tolerance
      - High availability
      - Global scale
  - Scalability
    - NewSQL databases can scale horizontally
    - Good for applications that demand strong consistency and ability to handle a high volume of transactions or users
  - Concurrency Control
    - Advanced Concurrency Control Mechanisms
    - It allows efficient handling of a large number of simultaneous transactions
  - SQL Support and Compatibility
    - Retaining the familiar SQL language for querying and manipulating data
- Drawbacks:
  - Complexity
    - Configuration
    - Maintenance
    - Troubleshooting
  - Vendor lock-in
  - Lack of maturity
- E.g.
  - CockroachDB
  - Google Spanner
  - TiDB
- When considering a NewSQL database, it is essential to evaluate the specific needs of the application in terms of scalability, data consistency, performance, and developer familiarity

Database	Data Model	Scalability	Ecosystem	Cost	Key Advantage
Google Cloud Spanner	Relational	Globally distributed	Integrated with other GC services	Pay-as-you-go	Strong consistency guarantees
Cockroach DB	Relational	Strongly consistent	Integrated with Kubernetes & other cloud services	Open-source	Strong consistency guarantees & Cloud-Native architecture
NuoDB	Relational	Cloud + On-Prem	Support various programming languages	Open-source	Supports cloud and On-Premises
VoltDB	Relational	In-Memory processing	Enterprise focused	Commercial License	In-memory processing
TiDB	Hybrid (Relational & NoSQL)	Cloud + On-Prem	Support various programming languages	Open-source	Distributed transactions, hybrid relationship and NoSQL data model

Time-Series
- Specialize in handling time-stamped data
  - Time-stamped data: sequential nature and time-based ordering
- Designed to:
  - Optimized storage
  - Retrieval
  - Analysis of time-stamped data
- Features:
  - High write and query performance
  - Data compression
  - Time-based data retention policies
  - Built-in time-series functions
  - Scalability
    - Designed to scale horizontally
- E.g.
  - InfluxDB
  - TimescaleDB

Database	Scalability	Performance	Data Consistency	Security	Query Language
InfluxDB	Supports clustering	Designed for high write & query throughput, low latency	Eventual consistency and support customization	Robust	InfluxQL and Flux
TimescaleDB	Supports clustering	Designed for high write and query throughput, low latency	Strong consistency with ACID compliant	Robust	SQL with extensions for time-series data
Prometheus	Supports clustering	Designed for real-time monitoring and alerting	Eventual Consistency	Limited	PromQL
Graphite	Basic Horizontally Scalable	Designed for high performance and low overhead	Eventual Consistency	Limited	Graphite-QL