One of the most exciting challenges I have at Hitachi as the Vice-Chairmen of Hitachi’s “Data Science 部会” is to help lead the development of Hitachi’s data science capabilities. We have a target number of people who we want trained and operational by 2020, so there is definitely a sense of urgency. And I like urgency because it’s required to sweep aside the inhibitors and resistors to change.
I started this assignment with a blog titled “What’s the Difference Between Data Integration and Data Engineering?” that laid out the differences between traditional Data Integration and modern Data Engineering (see Figure 1).
Figure 1: Data Integration versus Data Engineering
But that blog only addressed the Data Engineer role. To achieve the goals for the “Data Science 部会” – which is to become more effective at leveraging data and analytics to optimize key business and operational processes, mitigate compliance and security risks, uncover new revenue opportunities and create a more compelling, differentiated user experience – we need to consider three key roles, and the interaction between those three key roles, that round out the data science community. We need to understand the responsibilities, capabilities, expectations and competencies of the Data Engineer, Data Scientist and Business Stakeholder.
Researching the Data Science Triumvirate
We started this assignment by researching the job hiring profiles of Data Engineers and Data Scientists at the Silicon Valley’s leading data science organizations (Thanks John!). We then created a graphic to highlight the focus and capabilities of those roles, as well as the interactions between those roles – the Data Science Capabilities Venn Diagram (see Figure 2).
Figure 2: Data Science Capabilities Venn Diagram
The Data Science Capabilities Venn Diagram identifies the key objectives and capabilities for the three key data science roles. And the point for data science innovation occurs when three of those roles intersect around:
- Hypothesis Development (clearly defining what it is you’re trying to achieve and how progress and success will be measured),
- Data Monetization (identifying, validating, valuing and prioritizing the business and operational use cases) and
- Governance (operationalization and adoption).
Note: I’ll dive into these points of innovation and their role in driving digital transformation and data monetization in more detail in a future blog.
The supporting details of the research can be seen in the Figure 3 eye chart.
Figure 3: Defining Data Science Responsibilities and Tools
This is a great foundation that helps us understand what skills we are going to need to hire and/or develop. However, the chart in of itself isn’t yet enough. We now need to turn this research into something actionable.
Data Science Capabilities Spider Charts
Our next step was to lay Spider charts on top of the Data Science Capabilities Venn Diagram. We can then use the Venn Diagrams to not only assess the current capabilities of the data science community, but now we have the basis, or benchmarks, against which we can build individualized development plans to improve the data science capabilities across all three roles.
Figure 4 shows an example of such the mapping exercise for a Junior Data Scientist with whom we want to develop their data engineering skills.
Figure 4: Sample Data Scientist Development Plan
Again, we can use the Data Science Capabilities Venn Diagram Spider Chart to assess the current capabilities of this individual across all three dimensions, and then put together a personalized training program or curriculum to advance them along the different dimensions.
Figure 5 shows another example of using the Spider Chart to improve the data science and data engineering awareness of a Business Stakeholder.
Figure 5: Sample Business Stakeholder Data Science Development Plan
While our goal is not to turn our Business Stakeholders into data scientists, we do want to train our Business Stakeholders to “Think like a data scientist”. That way the Business Stakeholders can understand how best to collaborate with a data scientist and a data engineer to uncover the customer, product, service and operational insights that will drive business success. See the blog “Refined Thinking like a Data Scientist Series” for more details on creating “Citizens of Data Science.”
Next step will be to refine the gradients that differentiation junior from senior to master data scientists and data engineers, and then create the curriculum and content to get there. Hello Global Learning team!
Building Your Data Science Team Summary
Data science is a team sport comprised of Data Engineers, Data Scientists and Business Stakeholders. And like a baseball team can’t function effectively with only shortstops and catchers. One’s data science initiative MUST clearly articulate the roles, responsibilities and expectations of the Data Engineers, Data Scientists and Business Stakeholders.
If the goal of your organization is to become more effective at leveraging data and analytics to power your business models and drive digital transformation, you can’t win that game with a team full of pitchers. Right, Alec?
- What’s the Difference Between Data Integration and Data Engineering?
- Great Data Scientists Don’t Just Think Outside the Box, They Redefine the Box
- Identifying Variables That Might Be Better Predictors