Beyond the Delta: Compression is a Must for Big Data

In an era of big data, high-speed, reliable, cheap and scalable databases are no luxury. Our friends over at SQream Technologies invest a lot of time and effort into providing their customers with the best performance-at-scale. As such, SQream DB (the GPU data warehouse) uses state-of-the-art HPC techniques. Some of these techniques rely on modifying existing algorithms to external technological advances, and other algorithms are home-brewed.

Dr. Benjamin C. van Zuiden of SQream wrote a special report, “Beyond the Delta: Compression is a Must for Big Data,” that focuses on compression algorithms that make big data-at-scale possible.

In data and signal processing, data compression is the process of encoding information using less bits (data) than the original representation. Data compression is useful to save disk space or reduce the I/O or bandwidth used when sending data (e.g., over the internet, or from storage to RAM).

Compression algorithms generally come in two varieties: lossy and lossless. Lossy compression is highly advantageous for images, audio, and video because it removes data beyond a certain level of fine-grain detail. In fact, most people will not notice that information is missing. The typical database user, however, does not want to lose any of the data inserted into the database. Therefore, lossy compression remains of little use.

The report hence focuses on lossless compression algorithms. Lossless compression compresses data in such a way that the original data can be fully recovered by decompressing. This is very important for databases, as even a tiny change in the data stored could make it unusable. In SQream DB particularly, different types of numeric sequences are found. The ability to compress these sequences well yields faster queries and insertions and is worth studying.

Advanced compression schemes are used inside SQream DB extensively. SQream DB’s CPU and GPU compression schemes are designed to improve performance, reduce costs for storage, and work transparently at that. The efficient storing methods that are discussed in this article are just some of the compression schemes that SQream DB uses, to allow you to more effectively and optimally maintain your data in physical storage.

Download the complete report HERE.