Database Selection

Database Types:

  • Relational Databases
    • Based on the relational model
      • Organizes data in tables with rows and columns
    • Standard choice due to:
      • Robust consistency
      • Support for complex queries
      • Adherence to ACID properties
        • Guaranteeing data consistency and integrity
      • Enforce consistent structure
        • It facilitates the management and maintenance of data
      • Referential integrity
        • Relationships between tables
        • Ensures efficient querying of related data
        • Supports complex data relationships
      • SQL Support
      • Support transactions
        • Actions either succeed or fail as a whole
      • Indexing and optimization
        • It offers various indexing techniques and query optimization strategies
        • Improves query performance and reduce resource consumption
    • Drawbacks:
      • Limited scalability
        • Scaling relational databases horizontally can be challenging
      • Rigidity
        • The predefined schema in relational databases can make it difficult to adapt to changing requirements
      • Performance issues with large datasets
        • Performance issues when dealing with complex queries and large-scale data manipulations
      • Inefficient for unstructured or semi-structured data
DatabasePerformanceScalabilityEase of UseSupport for complex queriesLicenseCommunity Support
MySQLHighMediumHighHighOpen SourceStrong
PostgreSQLHighHighMediumVery HighOpen SourceStrong
MS SQL ServerHighHighMediumVery HighProprietaryStrong
OracleVery HighVery HighLowVery HighProprietaryModerate
  • NoSQL
    • Developed as a response to the limitations of relational databases, particularly in terms of:
      • Scalability
      • Flexibility
      • Performance under certain conditions
    • Do not strictly follow the relational data model or traditional table-based storage
    • Store data in various formats
    • Types:
      • Document-based databases
        • Store data in semi-structured documents such as JSON or BSON
        • It provides great flexibility in data modeling
        • It allows more dynamic schemas
        • Well-suited for applications that deal with hierarchical or nested data structures
        • E.g.
          • Mongo DB
          • Couchbase
      • Column-based – Wide-column stores, Column-family stores
        • Organizes data in columns rather than rows
        • Optimizes column-based queries providing:
          • Improved compression
          • Better read performance
        • Designed for applications that need to store and query large amounts of data across many nodes
        • Popular choice for:
          • Big Data and Analytics applications
          • Applications with high write and read workloads
        • E.g.
          • Apache Cassandra
          • HBase
      • Key-Value stores
        • Store data as key-value pairs
        • Ideal for:
          • High-speed reads and writes
          • Horizontal scalability
        • It can serve as:
          • Caching layers
          • Session stores
          • Configuration storage
        • Popular choice for:
          • Applications where performance and low latency are crucial
          • Gaming platforms, real-time analytics systems, and recommendation engines
        • E.g.
          • Redis
          • Amazon DynamoDB
      • Graph Databases
        • Store data as nodes and edges in a graph
        • Enables efficient processing of:
          • Complex relationships
          • Traversals
          • Graph-Based Algorithms
        • Useful for applications that:
          • Involve intricate relationships between entities
            • Social networks
            • Fraud detection systems
            • Recommendation engines
          • Provide powerful querying capabilities for traversing and analyzing interconnected data.
        • E.g.
          • Neo4j
          • Amazon Neptune
    • Drawbacks:
      • Lack of standardization
        • They don’t follow a standardized query language
        • Use their own query language or APIs
        • Big learning curves
        • Difficulties when migrating between different NoSQL databases
        • Difficulties when integrating them with other systems
      • Weaker consistency
        • It employs eventual consistency to achieve higher performance and availability
      • Limited support for complex queries and transactions
DatabaseTypeQuery LanguageACIDConsistency Model
MongoDBDocument-orientedMLQYes (with trans)Eventual
CouchbaseDocument-orientedN1QL (SQL for JSON)Yes (with trans)Eventual
RedisKey-value storeRedis commandYes (with trans)Eventual
Apache HBaseWide-Column storeHBase Shell / Java APINo (Strong cons. With R/W trans)Strong consistency
CassandraWide-Column storeCQLNo (tunnable cons.)Tunable
Neo4jGraph DatabaseCypherNoStrong consistency
  • New SQL
    • Modern approach to combine the strengths of both relational and NoSQL databases.
    • They maintain characteristics from Relational Databases:
      • Relational data model
      • ACID properties
      • SQL support
    • While adding functionalities of NoSQL databases:
      • Improved scalability
      • Distributed architecture
      • Performance enhancements
    • Designed to address the challenge of modern applications such as:
      • Handling large scale, distributed, and highly concurrent workloads without sacrificing data consistency and integrity
    • Main characteristics
      • Distributed architecture
        • Leverage data partitioning and replication across multiple nodes or data centers
        • Good for:
          • Fault tolerance
          • High availability
          • Global scale
      • Scalability
        • NewSQL databases can scale horizontally
        • Good for applications that demand strong consistency and ability to handle a high volume of transactions or users
      • Concurrency Control
        • Advanced Concurrency Control Mechanisms
        • It allows efficient handling of a large number of simultaneous transactions
      • SQL Support and Compatibility
        • Retaining the familiar SQL language for querying and manipulating data
    • Drawbacks:
      • Complexity
        • Configuration
        • Maintenance
        • Troubleshooting
      • Vendor lock-in
      • Lack of maturity
    • E.g.
      • CockroachDB
      • Google Spanner
      • TiDB
    • When considering a NewSQL database, it is essential to evaluate the specific needs of the application in terms of scalability, data consistency, performance, and developer familiarity
DatabaseData ModelScalabilityEcosystemCostKey Advantage
Google Cloud SpannerRelationalGlobally distributedIntegrated with other GC servicesPay-as-you-goStrong consistency guarantees
Cockroach DBRelationalStrongly consistentIntegrated with Kubernetes & other cloud servicesOpen-sourceStrong consistency guarantees & Cloud-Native architecture
NuoDBRelationalCloud + On-PremSupport various programming languagesOpen-sourceSupports cloud and On-Premises
VoltDBRelationalIn-Memory processingEnterprise focusedCommercial LicenseIn-memory processing
TiDBHybrid (Relational & NoSQL)Cloud + On-PremSupport various programming languagesOpen-sourceDistributed transactions, hybrid relationship and NoSQL data model
  • Time-Series
    • Specialize in handling time-stamped data
      • Time-stamped data: sequential nature and time-based ordering
    • Designed to:
      • Optimized storage
      • Retrieval
      • Analysis of time-stamped data
    • Features:
      • High write and query performance
      • Data compression
      • Time-based data retention policies
      • Built-in time-series functions
      • Scalability
        • Designed to scale horizontally
    • E.g.
      • InfluxDB
      • TimescaleDB
DatabaseScalabilityPerformanceData ConsistencySecurityQuery Language
InfluxDBSupports clusteringDesigned for high write & query throughput, low latencyEventual consistency and support customizationRobustInfluxQL and Flux
TimescaleDBSupports clusteringDesigned for high write and query throughput, low latencyStrong consistency with ACID compliantRobustSQL with extensions for time-series data
PrometheusSupports clusteringDesigned for real-time monitoring and alertingEventual ConsistencyLimitedPromQL
GraphiteBasic Horizontally ScalableDesigned for high performance and low overheadEventual ConsistencyLimitedGraphite-QL