KDnuggets How to Get Stuff Done at a Data Startup

This post is a followup to how to structure data science teams, with a focus on how we get stuff done. The same principles we follow can be applied at your data startup or data science team.
c
comments

By Shion Deysarkar, CEO Datafiniti.

In my last post, I told you how we structure our team at Datafiniti. Now I’m going to tell you how we get stuff done. The same principles we follow can be applied at your data startup or data science team.

I’ll break down this post into what I think are three keys to getting stuff done:

Header image

Process

Each team within Datafiniti follows either a Scrum or Kanban methodology to accomplish tasks. Knowing when to apply each one is crucial to maximizing the productivity of the team. In researching these methodologies, I found these articles to be the most helpful in understanding their use cases:

  • The Scrum Guide
  • Scrum Methodology
  • The Kanban Methodology

Project-oriented teams (Data, Infrastructure, Product, and Growth) follow a Scrum approach. Delivery-oriented teams (Crawl Development, Customer Success) follow a Kanban approach. See my last post on what these teams do.

Scrum teams try to accomplish their sprint goals at the end of each sprint. Kanban teams try to reduce the delivery time of their tasks while maintaining a high level of quality.

Process for Scrum Teams
In general, each Scrum team sets goals according to a monthly sprint. At the beginning of each month, these teams will meet to determine sprint goals and identify individual tasks to accomplish these goals.

Let’s use the Data Team as an example. As a reminder, the Data Team is responsible for improving data quality and coverage within Datafiniti. A possible set of sprint goals may be:

  • Goal 1: Improve product data coverage by adding two more online retailers to the websites we crawl.
  • Goal 2: Improve our internal data validation API by adding modules for two business data attributes.
  • Goal 3: Reduce delivery time of new crawls by automatically checking for feasibility (i.e., crawlability) of new websites

After the team sets these goals, they split them further into separate tasks. For the above goals, the tasks may be:

  • Goal 1/Task 1: Crawl discountautoparts.com
  • Goal 1/Task 2: Crawl home-electronics.com
  • Goal 2/Task 1: Implement validation module for business.name attribute
  • Goal 2/Task 2: Implement validation module for business.address attribute
  • Goal 3/Task 1: Check for robots.txt allowance
  • Goal 3/Task 2: Run test crawl and check for existence of HTTP status codes besides ‘200’

To give you a more tangible feel for the backlog, here’s a snapshot from September:

Backlog

For each task, the team estimates the amount of time required to complete it. If the total time across all goals exceeds the time available for the month, the team will eliminate some tasks from the sprint.

With this approach, each employee knows exactly what they need to focus on for the month. If they accomplish their individual goals, they know that month was a success. That sense of accomplishment is a big win each month and keeps the momentum going. It’s also very helpful in terms of measuring each employee’s or team’s productivity (more on this later).

Process for Kanban Teams
Kanban teams are very different. Instead of having monthly goals, they need to ship deliverables quickly while maintaining a high level of quality.

At Datafiniti, a great example of a Kanban team is the Crawl Development team. This team builds out new web crawls for Datafiniti. Each build-out goes through the following steps:

  1. Configure scraper: Implement a small Javascript application that crawls through the website and collects specific data points from each listing.
  2. Live test crawl: Run a small test crawl to make sure the scraper works on a limited basis.
  3. Live production crawl: Set up a recurring, full-scale crawl that will crawl the entire website.
  4. Import: Set up a recurring import of scraped data into Datafiniti.
  5. Final QA: Validate data collected from the website as it appears in Datafiniti.

We try to keep the cycle time of this entire process below five days. More importantly, we want to maintain > 95% accuracy in the data. Accuracy, in this case, means all available attributes are collected and conform to specific rules. Speed and quality can be opposing goals. We spend a lot of time iterating our process and developing new technology so that we have the best of both worlds.

Accountability

Accountability through Leadership
Each team in Datafiniti is led by a “Project Owner”, who is essentially a Scrum master. Each Project Owner isn’t so much a manager as a guide for the team. I try to keep the hierarchy pretty flat, but I do hold each Project Owner accountable for meeting sprint goals on time.

In turn, the Project Owner holds each team member accountable. Typically, Project Owners ask team members to post daily updates on each of their tasks. If things are going well, there’s an update on everything currently “in progress” each day.

Accountability through Measurement
Measurement is also a key part of accountability. For Scrum teams, we track how reliably the team is delivering sprint goals via burndown charts.

Burndown chart

For Kanban, we track cycle times via a control chart.

Control chart

If a Scrum team is reliably meeting their goals each month, we can generally assume they’re performing well. Similarly, if a Kanban team is delivering high-quality work (e.g., web crawls) within a reasonable amount of time, we can assume they’re also performing well. That said, it’s important that Project Owners and executive management regularly check in to ensure the team is meeting company goals.

Communication

Project Owners talk directly with each other for high-level communication. E.g., if a customer is having concerns over data quality, the Customer Success Project Owner would communicate that issue with the Data Project Owner.

Any updates on issues or projects should be logged directly on the relevant card in JIRA, which provides documentation for anyone else on the team. We avoid using email, Slack, or in-person communication for activity logging as much as possible.

Beyond the above-mentioned structured communication, I encourage everyone in our team to discuss issues, problems, and even successes in a group-wide setting. This doesn’t always happen though, and private communication can happen frequently — it’s part of human nature, I suppose. One of my big focuses right now is how to foster more public discussion. I’m a big believer that more public discussion is a “wheel greaser” — it eliminates unseen friction within a team.

In Summary

 If you made it this far, I applaud you! The details of team management can be dry, but the effects of bad team management can be subtle and painful at the same time. It’s usually only after we’ve improved something about our process,accountability, or communication that we realize how slow we were operating before. That lost time hurts, but you can only take what you’ve learned and move forward. Hopefully, this post gives you a head start on improving your data team’s productivity!

Bio: Shion Deysarkar is CEO & Founder of Datafiniti. He has completed his BS from Carnegie Mellon University and MBA from Rice University - Jesse H. Jones Graduate School of Management. Datafiniti offers a comprehensive set of databases on product, business, and property data sourced from the web.

Related:

  • How Do You Identify the Right Data Scientist for Your Team?
  • Building effective “Citizens Data Scientist” teams
  • Three Essential Components of a Successful Data Science Team