DataScience Inc., today announced a new interactive tool for exploring and visualizing trends in data from more than 2.8 million GitHub repositories. With DataScience Trends, users can easily compare activity across the open source libraries now replacing legacy solutions in enterprise data science, from deep learning to distributed processing, all without writing code.
The tool allows non-technical audiences to visually compare more than 20 events, from new commits to pull requests, across the three terabytes of data released publicly by GitHub and Google last year. DataScience Trends’ easy-to-use interface also makes it simple to mine GitHub for information and interact with trends on a timeline, all within a customizable date range.
“DataScience Trends instantly broadens the audience that can access what is essentially a treasure trove of data hidden behind a technical-knowledge barrier,” said William Merchan, chief strategy officer at DataScience. “Our mission at DataScience is to enable data science for every business; now, everyone has access to rich data from open source repos and can generate beautiful and interactive data visualizations accordingly.
“As open source tools continue to eclipse proprietary solutions in the enterprise data science space, this information will become important to a wide array of decision makers looking for the right open source software for their data science teams,” he explained. “Just last year, 62% of analytics professionals responded to a Burtch Works survey saying they preferred open source languages Python and R to legacy solution SAS. That’s part of a larger trend we’re seeing at the enterprise level.”
Because DataScience Trends was built on top of a three terabyte-dataset, the possibilities for exploration are nearly endless. To get users started, DataScience has included data from 10,000 of GitHub’s most popular repos, which can be viewed in terms of development activity, popularity, and collaboration. DataScience Trends also includes several other useful features for exploring open source software data:
- Specific dates and values: Users can hover over any data visualization to see the values and dates associated with a specific data point.
- Normalized comparative trends: Libraries of any size or popularity can be compared by using a common frame of reference. It takes a single click to return from “relative” to “absolute” view.
- Easy sharing features: Each exploration generates a unique URL for sharing, or users can use the social sharing buttons to send trends directly to their news or Twitter feeds.
“With DataScience Trends, we can put data exploration in the hands of the masses,” Merchan said. “There are many avenues to explore across the GitHub archive, from the popularity of certain repositories — which we have gleaned from trends in ‘starred,’ or bookmarked, repos — to collaboration in open source toolsets, as signified by the number of pull requests.
“We use DataScience Trends to identify the most popular open source tools and incorporate them into the DataScience Cloud, our enterprise platform. For instance, we know Google’s release of machine intelligence library TensorFlow has driven interest in the compatible neural network library Keras, and that data visualization tool ggplot is steadily gaining momentum among Python practitioners,” Merchan added. “And as GitHub’s archive continues to grow, so too will the number of insights we — and DataScience Trends users — will be able to identify.”
To try out DataScience Trends, please visit www.datascience.com/resources/tools/trends.