If you’re taking on big data, you’ll quickly realize that that means taking on a lot of unstructured data. Unstructured data doesn’t fit into a typical relational database, like SQL. That means you’ll be looking into a new type of database that can accept and store unstructured data. Primarily, that means you’ll be selecting a NoSQL database.
As you would expect, not all NoSQL databases are created equal. What works nicely for your buddy in retail won’t necessarily work for your insurance company or software development company. NoSQL databases are inherently scalable, but achieving this scalability usually means making sacrifices elsewhere, such as in immediate consistency.
Examples of popular key-value databases include Riak, Berkeley DB, Redis, Memcached, upscaledb, Couchbase, and one notable example that isn’t open source: Amazon DynamoDB. Key value storage is the simplest of all NoSQL data storage solutions, at least from the API perspective. Values are just meaningless blogs that are just stored with no consideration (by the database, at least) about what it actually is. The applications hold the understanding about what’s stored in the database. Key-value databases typically deliver excellent performance and scale quite easily.
Next on the list are document databases. Notable examples include MongoDB, RavenDB, CouchDB (not to be confused with Couchbase, which is a key-value database), Terrastore, and OrientDB. As the name indicates, the primary idea here is storing documents. These databases can store and retrieve documents, including JSON, BSON, XML, and others. Documents in these databases are self-describing and feature hierarchical tree data structures which may include maps, collections, and scalar values. The documents stored in these databases can be very similar, but do not need to be identical. Some have impressively rich query languages (such as MongoDB) and are easy to transition to when you’re migrating data off of a relational database.
Popular examples of column-family databases include HBase, Amazon DynamoDB, Cassandra, and Hypertable. Of these, Cassandra is probably the most notable. Cassandra is marked by speedy and easy-to-scale write operations across the cluster. Since the cluster doesn’t necessarily feature a master node, any node within the cluster can handle the read write tasks. Column-family databases keep data within rows or “column families”. Multiple columns are associated with each row key. Column families are essential groups of related data sets that can be accessed as a group.
Examples of graph databases include OrientDB, FlockDB, Neo4J, and Infinite Graph. Graph databases let you store relationships between the entities stored within the database. Entities are called nodes, and all nodes have specific properties. A node can be considered as an instance of an object within a given application. Relations in a graph database are called edges, and edges have their own properties. Edges have significance directionally, while nodes are arranged by relationships. This setup gives you the ability to discover interesting patterns and correlations between the nodes.
Obviously, there’s much more to it than this! But these brief introductions should be enough to set you on the right path to discovering the right NoSQL database for your specific needs and applications.
To get the most out of your big data, consider a data lake. For a limited time you can discover the first Full Metal Data Lake as a Service in the world. Get 1TB free for life – limited to 100 applicants. Start here.