Machine Learning Solves Data Center Problems, But Also Creates New Ones

In this special guest feature, Geoff Tudor, VP and GM of Cloud Data Services at Panzura, believes AI poses both opportunities and risks in the automation of the datacenter. This article provides an overview regarding the impact of AI in the datacenter, and how companies can prepare their storage infrastructure for these technologies. Geoff  has over 22 years experience in storage, broadband, and networking. As Chief Cloud Strategist at Hewlett Packard Enterprise, Geoff led CxO engagements for Fortune 100 private cloud opportunities resulting in 10X growth to over $1B in revenues while positioning HPE as the #1 private cloud infrastructure supplier globally. Geoff holds an MBA from The University of Texas at Austin, a BA from Tulane University, and is a patent-holder in satellite communications.

Artificial intelligence (AI) with machine learning (ML) capabilities offers the promise of increased efficiency in data centers. As evidence of this, Deloitte Global predicts that the number of ML pilots and implementations will double in 2018 compared to 2017, and double again by 2020. According to McKinsey, total annual investment in AI was between $8B to $12B in 2016.1

AI with ML is particularly important for data centers with 100 or more physical servers because 24×7 support becomes extremely complex with the large number of systems that are managed by multiple people. And, imagine how much more efficient managing data centers that house big data could be if ML could be applied.

An example of a big data use case is the analysis of complex data sets. It is extremely difficult to detect infrastructure problems, intruders, or business issues as they happen using rules or humans looking at dashboards. By applying ML to this problem, insights can easily be found that would otherwise go unseen.

An Opportunity and a Problem

The opportunity in data center automation via ML comes from new ML algorithms that can be run against log data generated from servers, firewalls, routers, etc. that comprise the data center. However, big data growth is exploding. Zion Market Research estimates that global Hadoop market share will reach $87.14B by 2022.2  

The big data problem quickly arises with ML because you need to manage and store all of the operational log data in order to analyze it. Plus most corporate compliance mandates require that you store data three years. The log data that is the input to these ML systems quickly becomes a larger data set than the application data itself.

While ML can help drive efficiencies in data center management, storing and making data available for ML is becoming a new problem that data center operators will have to address. If this new problem can be addressed, automation powered by ML will become extremely valuable.

Where to Start with Machine Learning

Log management is the easiest platform for beginning to use ML for data center automation. The growth of Splunk shows this, but new open-source technologies like Elasticsearch are showing great promise. But with new IoT devices, the problem becomes even more complex, requiring a centralized log management and retention storage.

Baby steps like centralizing and ingesting all of your log data into Elasticsearch is a great first step to prepare your enterprise data center for ML optimization. ML is a data problem. If you are not feeding it with all of the data, you get imperfect decisions. It is a garbage in, garbage out problem. So having a single source of truth for all of your data sets is absolutely critical. A new paradigm is needed to support the management of ML data.

A New Data Storage Paradigm

A hybrid cloud storage platform can simplify your big data storage environment by collapsing your data center footprint. With this type of platform, you can retain your data sets indefinitely by offloading them to object storage. At the same time you are able to automate data protection, replication, tiering, and recovery.

It’s possible to solve the big data problem by giving you a single filer interface for ingesting, storing, accessing, and exposing this data while keeping costs absolutely minimal by deduplicating the data and storing it on low-cost object storage in the cloud. You can store petabytes of machine generated content in a cost effective VM or 2U filer.

The Future: Ecosystems of ML Toolsets in Public Clouds

Right now a lot of analysis is around post-processing operations on data sets—more batch-like operations. The future is in real-time ingestion and processing, such as what is happening around development with Apache Spark. Longer term you will see a flourishing ecosystem of unique ML tool-sets in the public clouds processing data in real-time, driving real-time automation in the cloud.

1 https://www.forbes.com/sites/louiscolumbus/2018/02/18/roundup-of-machine-learning-forecasts-and-market-estimates-2018/#60fdc6c82225

2 https://globenewswire.com/news-release/2017/11/27/1206043/0/en/Global-Hadoop-Market-Share-Will-Hit-USD-87-14-billion-by-2022-Zion-Market-Research.html

Sign up for the free insideBIGDATA newsletter.