This week I am in San Jose, California for the 2013 edition of the NoSQL Now! conference. This is the third annual NoSQL Now! Conference, which bills itself as the largest vendor-neutral forum focused on NoSQL technologies.
But what is NoSQL? I am not an advocate of naming anything based on what it is not, and that is exactly what NoSQL does… or perhaps I should say did. NoSQL used to be two words munged together describing the technology, that is No + SQL = NoSQL. But these days the NoSQL crowd bills itself as Not Only SQL. That is wise because technologies like Hive bring SQL-like functionality to NoSQL database systems. Once you try to use NoSQL databases without SQL, instead coding MapReduce logic in Java to access data for example, you will be thrilled for the simplicity of SQL-like access to your data.
But I digress. The conference offers a great way to learn about the various types of NoSQL database systems (document stores, graph databases, key/value data stores, columnar databases, in-memory data grids, streaming solutions, and so on). The educational sessions on Tuesday, the first day of the conference, were half-day sessions that focused on in-depth dives into technologies like Hadoop and graph databases.
The first session I attended provided a wealth of information on Hadoop and was worth my attending the conference. The session, Introduction to the Hadoop Ecosystem, by Dr. Vladimir Bacvanski (of SciSpike) offered a fantastic overview of Hadoop and related technologies. Some of the relevant takeaways from the session included:
- The mindset of Hadoop is that storage of data is very cheap… and processing is cheap, too.
- The idea is to move the job to the data, not the data to the job… or stated another way, “Don’t move petabytes to the program, but the program to the petabytes”
- When you deal with Hadoop you generally store the data in its original format instead of breaking it apart or filtering it.
Of course, Dr. Bacvanski described MapReduce (the algorithm), Hadoop (the implementation of the algorithm), and the general environment (multiple, redundant nodes for processing large amounts of data). MapReduce is basically two steps:
- the “Map” step where input is split into pieces with nodes processing these pieces in parallel and stores the results of the “query.”
- the “Reduce” step where the mapped data is aggregated by worker nodes under the control of the Job Tracker
He also discussed other aspects of the Hadoop ecosystem including HDFS, Hive, Pig, HBase, HCatalog, Zookeeper, Sqoop, Flume, and more. The speaker covered some of the more common Hadoop systems like Cloudera, Hortonworks, and IBM BigInsights (among others), and closed with a nice introduction to stream computing.
The second half-day session I attended was interesting, but a little more disjointed and difficult to follow than the first. This session was titled NoSQL Database Patterns and was delivered by Srini Penchikala of InfoQ. The session was more application development oriented and covered the architecture and design pattern of NoSQL database systems.
Penchikala discussed popular application development frameworks like Spring Data, Spring Integration and others, showing how to implement applications with polyglot persistence. What is polyglot persistence, you ask? Well, it is basically a fancy (and confusing) term for storing data in multiple database systems while accessing them, as needed, via a framework that masks their differences.
The highlight of this session was the several demos that the presenter walked through using the Spring Data framework against different NoSQL data stores.
Finally, the day ended with a series of 5 minute lightning talks on a variety of topics, including quantum computing, data quality, and FoundationDB – a new NoSQL database systems announced for general availability today (August 20, 2013) that combines NoSQL and ACID. Better yet, FoundationDB is offering a free community license for the new offering.
I’m looking forward to day two of this conference, which is more along the lines of a traditional conference with multiple 60 minute sessions throughout the day… and the opening of the exhibition area where a bevy of NoSQL vendors will promote and demo their offerings.
Be sure to tune in tomorrow for my coverage of Day Two of NoSQL! Now…