ClusterGX™ makes it incredibly easy to deploy Spark/Hadoop clusters — either on premise, on AWS or as a hybrid — without any previous big data experience. After the initial cluster deployment the process of filling the data lake begins. This requires identifying the relevant data sources and ingesting the data — both historic and on-going real-time streams.
As we have engaged in conversations about our vision for simplification, the complexity surrounding data ingestion has been a recurring theme,” said Rob Mustarde, CEO of Galactic Exchange. “Often people know where the data is — but getting it into the cluster and keeping it fresh is just plain difficult, and projects can fail as a result.”
DataEnchilada™ simplifies data pipeline building by automating the process of connecting to and ingesting data from common sources. Using an intuitive wizard driven UI, common on-premise and cloud data sources such as Oracle, MySQL, Twitter, LinkedIn and MixPanel can easily be configured for both historic and real-time data ingestion into a ClusterGX™ data lake.
Deep Learning Powers Auto Data Classification
DataEnchilada™ automatically ingests data into a Kafka cluster running inside ClusterGX™. It is at this point that the embedded deep learning of DataEnchilada™ kicks in. Instead of having to manually configure data ingestion parameters, DataEnchilada™ will automatically classify the incoming data into discrete Kafka topics which are then maintained in long term persistent storage as well as being made available for real-time analysis.
Now you can deploy Spark/Kafka clusters anywhere you want them in minutes, easily ingest and auto-classify your data using deep learning algorithms and launch applications with a single click from the embedded AppHub™,” continued Mustarde. “We are essentially facilitating end-to-end data pipelines that can be created 10 to 100x faster than was previously possible, by people with limited to zero big data experience, and all without writing a single line of code.”
Data Watching Data: Artificial Intelligence Delivers Real-Time Data Anomaly Detection
As well as using deep learning algorithms to auto-classify data, DataEnchilada™ uses artificial intelligence to profile each Kafka topic created and quickly learn an expected pattern of stream “activity”. If any stream then departs from the expected activity profile, a sub-topic is created to capture that stream anomaly and alerts are sent to the ClusterGX™ administrator. Stream anomalies could be generated for a variety of reasons — including fraudulent activity, malware, viruses, system overloads or even simple data errors.
ClusterGX™ applies the same deep learning logic to monitoring the Docker container virtualization across the cluster. Every application and process running across ClusterGX™ is deployed within its own Docker container. Deep learning is used to create activity profiles for each container. Certain applications running within a container may reach out to the internet from time to time to gather updates or other data. Despite the best security measures, it is always possible for applications and services to be compromised through such activity. Using deep learning, any change in expected container activity — potentially through malicious activity — can be identified quickly and alerts raised and containers automatically placed into quarantine.
Pretty much without exception, the traditional Hadoop and Hadoop-as-a-Service vendors focus on delivering a vertical stack of open source big data software tools and deliver service around that,” said Robin Bloor, Chief Analyst at The Bloor Group. “By contrast, Galactic Exchange is delivering a data pipeline solution — a horizontal integration of cluster deployment, data ingestion and application launch — all orchestrated in a way to abstract a huge amount of the typical complexity associated with big data projects. Galactic is using machine learning and artificial intelligence to turn Big Data clusters designed for the Fortune 1000 into Smart Data clusters designed for pretty much anyone.”
ClusterGX™ Standard Edition for deployment on-premise or on Amazon AWS is available for FREE with unlimited cluster scaling. Standard Edition includes one free DataEnchilada™ data source but for a limited period all available data sources will be made available for free. After the free period, additional data sources can be added by upgrading to the Premium or Enterprise Edition.