Daniel D. Gutierrez – Managing Editor, insideBIGDATA
insideBIGDATA: Cassandra has had a long and illustrious history. How do you feel the DataStax departure will affect this open source project long term?
Ben Bromhead: There is no doubt that after the DataStax departure, it took some time for the community to reflect and regroup with the reduction of resources. What I believe has emerged in that wake, though, is a more balanced ecosystem with more contributions from some of the largest users of this technology – such as Apple, Uber, Instagram and Netflix, to name a few.
DataStax certainly helped to bootstrap the Apache Cassandra project, but over time a rich ecosystem of other vendors and operators started to grow around the community. These users have quickly filled the void and now there is a strong community comprised of those who run open source Apache Cassandra in production (and rely on this technology for some of their most important revenue-generating applications).
insideBIGDATA: What’s the word out about making Cassandra into a new commercialized version? What would that do to the open source version? How well would it be accepted in the marketplace?
Ben Bromhead: There are a number of new vendors emerging that have taken some of the baseline of the technology and have developed their own spin on it. For many organizations choosing to deploy Apache Cassandra – and then deciding on either commercial or open source – it really comes down to a strategic technology decision. But we see that so many of the technology leaders and the largest users of this database will only ever deploy the open source version of Apache Cassandra. No technology lock-in, more transparency, and a large and vibrant community are the key reasons why. These large users need to have the decision-making in their own hands, rather than by a vendor promoting their own product.
insideBIGDATA: Can you give a few words about any upcoming features for Cassandra?
Ben Bromhead: There’s a lot on the horizon that I think will be very interesting – and useful – to the Cassandra community. Among them:
- Pluggable storage with RocksDB will deliver substantial improvements to performance.
- Virtual tables. Some great work being done by Cassandra committer Jeff Jirsa on this front, and this will allow API developers to create virtual tables in Cassandra.
- Change Data Capture (CDC) improvements. Uber is putting a chunk of time in trying to improve CDC performance since they have built some in-process CDC mechanisms – which means this code path is going to be better tested.
- Decoupling redundancy from availability. Both Instagram and Apple have ongoing work allowing Cassandra to have nodes that act as hint stores or lightweight replicas in specific situations using different approaches.
insideBIGDATA: What role will Instaclustr play in all of this?
Ben Bromhead: Within Instaclustr we have now established a team of dedicated developers to work on community-related Apache Cassandra activities. This includes writing code, participating in project activities, fixing bugs, and writing new features that the community can take advantage of. Our intent here is to build a team of contributors that are actively involved and can provide some real operational experience to the community. We have well over 15 million node hours of Apache Cassandra management experience, and have seen this amazing database deployed in very effective (and very ineffective) ways – and both at very large scale and for smaller projects. We believe that our unique operational experience is a real positive for the Apache Cassandra community, and we feel compelled to step up and do our part to make sure that experience is readily available.