insideBIGDATA Guide to Data Platforms for Artificial Intelligence and Deep Learning – Part 3

data platforms for artificial intelligence

With AI and DL, storage is cornerstone to handling the deluge of data constantly generated in today’s  hyperconnected world. It is a vehicle that captures and shares data to create business value. In this  technology guide, insideBIGDATA Guide to Data Platforms for Artificial Intelligence and Deep Learning, we’ll see how current implementations for AI and DL applications can be deployed  using new storage architectures and protocols specifically designed to deliver data with high-throughput, low-latency and maximum concurrency.

The target audience for the guide is enterprise  thought leaders and decision makers who understand that enterprise information is being amassed like never before and that a data platform is both an enabler and accelerator for business innovation.

Characteristics of Storage Solutions Optimized for AI & DL

Achieving successful AI and DL deployments requires ingest, processing and continuous engagement  of large and diverse data sets, often at scale. Further, the combination of different concurrent, mixed  workloads requires a storage system which can cope with a wide range of workloads. As a result, the  characteristics of a data storage solution in support of these types of workflows must be specifically  optimized for AI and DL. What’s needed is an AI data platform.

An AI data platform must enable rapid iteration for model training and refreshing. Rapid prototyping, experimentation and refinement of models are critical for achieving best possible yield from neural  networks. The AI data platform must enable easy transition of models from training into validation  and inference. Seamless transition from development to production enables organizations to quickly  leverage and monetize innovation.

Where data is the new source code, the process begins by learning from a very large amount of data.  The idea of success for a DL development environment is having access to very large data sets that are  meaningful to the problem domain. The size, the quality, the diversity of the data sets are all critical.  The AI data platform must be capable of ingesting, handling and delivering heterogeneous data types  and mixed workloads simultaneously and without compromise. The AI data platform must scale  seamlessly in multiple dimensions—capacity, capability, performance— to match evolving workflow  needs. It should provide flexible configuration options to achieve optimal technical and economic  benefits for organizations.

There is a balance between technical and economic realities, i.e. you want to be efficient in the way you  store your data, you want to be smart about the way you store your data, and you want to spend a lot of  time thinking about these realities when you’re defining your infrastructure.

Data protection also is very important. Organizations engaging AI and DL programs incur significant  expense to collect data sets. Enterprises need to start thinking about how to accumulate data from  multiple sources, heterogeneous types of data, images, text, etc., and hold onto it for an infinite period  of time. How can this data be protected? When you’re talking about 10s or 100s of PB over decades,  this can raise overwhelming and complex considerations. How can you make sure your data is there  when you need it? And maybe even more importantly, how can you organize this data in a way that you  can find it again? A thorough data governance plan is a cornerstone for enterprises engaging in AI  and DL.

“Data plasticity” for AI and DL is an important notion where data is the lifeblood of the march toward insights (similar to neuroplasticity where the brain is able to reorganize itself). When it comes to AI  pipelines, data must be easily molded into forms needed by the application. Once captured, you have a giant pool of very valuable information, and you want to be able to consume it easily, rapidly and in  very different forms. AI thinking, making the AI think fast, is accomplished by using large amounts of  information and making it accessible faster. The more reliable the access, the better the AI results will  be. These are all virtues of a robust AI data platform.

This is the third in a series of articles appearing over the next few weeks where we will explore these topics surrounding data platforms for AI & deep learning:

  • Introduction, Data is the New Source Code
  • Unique Storage Demands for AI and DL Workloads
  • Characteristics of Storage Solutions Optimized for AI & DL
  • Accelerated, Any-scale AI Solutions
  • Data Storage for AI/DL Case Studies, Summary

If you prefer, the complete insideBIGDATA Guide to Data Platforms for Artificial Intelligence and Deep Learning is available for download from the insideBIGDATA White Paper Library, courtesy of DDN.