A number of weeks ago I solicited feedback from my LinkedIn connections regarding what their typical day in the life of a data scientist consisted of. The response was genuinely overwhelming! Sure, no data scientist role is the same, and that's the reason for the inquiry. So many potential data scientists are interested in knowing what it is that those on the other side keep themselves busy with all day, and so I thought that having a few connections provide their insight might be a useful endeavor.
What follows is yet another round of some of the great feedback I received via email and LinkedIn messages from those who were interested in providing a few paragraphs on their daily professional tasks. The short daily summaries are presented in full and without edits, allowing the quotes to speak for themselves.
If you are interested in the first 2 posts of this series, you can find them here:
- A Day in the Life of a Data Scientist (Part 1)
- Another Day in the Life of a Data Scientist (Part 2)
Leila Afzali is a Data Scientist and a Statistician at Healthgrades, and is based in the Denver, Colorado area.
Being a Data Scientist goes beyond the technical details of writing code or implementing algorithms. Fundamentally, our role is to solve real world problems, which means understanding the problems people and the company are facing.
Having the ability to adapt and embrace the unexpected with an open mind is key to being a Data Scientist. You should anticipate projects not working as expected, being put on hold, or being dropped all together as business needs change. Rather than viewing these events as failures, a good Data Scientist embraces these as opportunities to learn and developed new skills. In this field, opportunities like these are endless. Here is what a typical day looks like.
Today was a typical day, I arrived in anticipation of my weekly meeting with the Vice President of Data Science. I provide a brief overview of the progress on current projects and then address pending issues. Today, I highlighted a few forecasting issues with times series models intended to impute metric values when there are issues tracking traffic on our site or when the site is down.
Mornings are the fun part of the day where my time is spent on my projects. The initial stages of most projects involve a good amount of cleaning and transforming data, the rest being research and development primarily focused in statistics and software development. Today I spent the rest of my morning taking insights from our meeting and additional research on time series analysis to recognize the revisions required to resolve forecasting issues and proceed with next steps.
I allocate sometime in afternoon to meet with product teams and other departments to understand current challenges and seek out new projects where I can apply my expertise. This can involve evaluating the effectiveness of products, problems with data, and suggestions on next steps. Being involved in these conversations allows me to do my job more effectively by minimizing the number and complexity of issues down the road.
I try to devote the final hours of the day to keeping myself current and learning new skills in the field of Data Science, though in today’s case I had to make some time for a last-minute team meeting, where we were notified of changes in business needs, resulting in one of my projects being place on an indefinite hold. As a Data Scientist, it is essential to have the ability to adapt and embrace the unexpected, be ready to take on the next task with an open mind. You should anticipate projects not working as expected, having your projects be put on hold, or the projects being dropped all together as business needs change. Rather than viewing these events as failures, a good Data Scientist embraces these as opportunities to learn and develop new skills. In this field, opportunities like these are endless.
Matt Dancho is Founder of Business Science, a consultancy firm which applies data science to drive ROI and make informed decisions. He is based in State College, Pennsylvania.
For those that don’t know me, I’m Matt, Founder of Business Science, a consultancy that helps organizations apply data science to drive ROI and make better decisions. I have a rather peculiar daily schedule, but here’s some insight into what I do and why I do it. Every day starts the same - I wake up, eat a banana, drink a glass of water, and get a cup of coffee (Gevalia or Starbucks). The banana and water started years ago to prevent cramping during sports and has since turned into a habit over time. The coffee is my jet fuel that helps me get in the “zone”.
As soon as I’m ready, which can be as early as 4:30AM or as late as 8AM, I begin to work. My start is always the same. I create my to-do list. I review my to-do list against my goals. I ask myself which items help me achieve my goals. Those that do get prioritized to the top of the list. Those that don’t are either removed or pushed to the bottom. Then I begin.
My day can go in a number of different directions, but generally it’s split by my three main areas of interest: consulting, educating, and creating. Consulting runs the business, so it usually gets priority. I’ll schedule calls with clients, work on projects, do preparation for meetings, and so on. The preparation is not super thrilling, but the projects become very exciting when we help a client solve a complex problem.
Education is my second focus. I spend a lot of time in two areas: writing and building our University. Our blog is where Business Science got its start, identifying cool topics that matter and showing how data scientists can use their skills to solve these challenges. I love the process - finding an interesting topic, researching it to expose details that often go overlooked, and implementing cutting-edge technology. People are often surprised when they find out an article can take 2 or 3 weeks to put together, but I spend that much time off-and-on because of the research and my own personal learning.
My new focus is Business Science University, an online education platform that takes the blog to the next level. I work with my team creating the storyline, developing the problem, implementing the data science, and creating interactive web applications. My team is great. They are super talented in many different skills, and they are all striving toward one mission: to educate. The culmination of our hard work is coming soon. We are preparing our first two courses as we speak, and I couldn’t be more impressed with the direction we are going.
The third and final area of focus is creating. We create tools and software to help data scientists do great things, and it’s our way of giving back to the community. My software manager, Davis, has since taken over responsibilities, and he’s taking this area to a new level. Together we’ve created four amazing, open source software packages for R: tidyquant, timetk, sweep and tibbletime. They are all packages that we built to help us perform analytics, and we open sourced them so you can use the same tools!
I hope this gives you a better understanding on what I do and why I do it. I’m really passionate about data science. The crazy thing is there’s a lot of untapped potential just waiting be found. If you’re on the fence, jump in. Last, I love feedback and feel free to connect with/contact me.
Anton Prokopyev is a Data Scientist Consultant at The World Bank, located in Washington, D.C.
The World Bank has been a data-driven organization since its inception. It is a recognized leader in production of data and cross-country indicators, which are ultimately used to unlock growth, policy improvements, and research in developing economies.
I am happy to be a part of the Big Data Program here at the Bank. It is a small, tightly knit team with distinct cross-cutting functionality. One day we may be working on monitoring and predicting electricity outages, the next one would be about sourcing data from our partners to measure employment. Things are developing rapidly, and it is important to be able to handle a wide variety of work.
It is the weekly cycles that define our work, that is why it feels very non-linear to me. I arrive at 9 am to either follow up on important emails or dedicate time to technical projects. In either case, this lasts until noon. Depending on the scheduling, I proceed to meetings with other teams, our internal clients. We serve them to enable data science capabilities within their units. Many are economists and have worked with data their entire careers. We present new methods and sources of data to add to the traditional survey and other toolboxes. The meetings sometimes require physical presence, travel, or using Webex to connect with country offices. Then is the time to grab lunch in our cafeteria with most of the world’s cuisines to choose from.
The late afternoons are for the technical work. Ultimately, this is when I get most of my R&D done. Given the nature of our projects, we have to be able to build prototypes in different domains and using a variety of tools. A successful pilot project in country A may be scaled up to a few more next year, and potentially globally too. My background has been in text analytics and GIS. However, many other fields have come up since I started. A particularly interesting one was network science and graph theory.
I like to end the day by going to our fitness center. So far I have been on a streak going to instructor-led classes. Before a week’s end, I try to go to the sauna. This is the best part of the gym, especially during the winter.
Shubhadeep Roychowdhury is not a data scientist! He is Senior Data Engineer at Kpler, located in Paris, France, and his different yet related job title brings some professional diversity to the conversation.
My day at Kpler, Paris starts at 9:30, and at first, I usually go over all the systems (especially some upstream data collection and filtering pipeline) and ensure that everything is working properly. Although we have automated alerts and monitoring in place, checking manually time to time and looking into the granular stats helps a lot to identify potential bottleneck and issues and creating game plans to tackle them. We also collect and analyze logs from all our systems to a great detail, and I often look into them to find out some odd corner cases that we never expected and thus did not cover while writing the unit tests. Mornings also involve meeting some days. Although, we have a light meeting culture normally.
We are in the energy market, and we track thousands of seafaring large vessels containing expensive cargoes with precision and in real time, and as a senior data engineer and a member of the core team, I have already worked on creating a scalable and easily accessible data pipeline using Amazon S3, Redis, Python and AWS Lambda (the upstream system mentioned earlier) to make this process smooth and easy. Now I am actively involved in introducing some new business objects in the system. This means I usually analyze the business problems we are facing presently and suggest approaches/algorithms to overcoming them. A good part of my day is usually spent writing complex SQL queries and trying to make sense of the data returned. I also do some quick POC/Data exploration using Python and Jupyter. And often you may find me writing 100s of lines of mission-critical Python code in our production systems as well.
We have a good code review system in place for all the PR we open in Github across all our repos. And sometimes I spend time reviewing PR from other teammates and providing my feedback to them, and reading comments others have left while reviewing any of my PR and then either incorporate the changes that they suggested or commenting to explain or asking for more details. Task management in JIRA and other issue tracking system is also a part of a working day as well.
Finally, reading about newer technologies (especially Machine Learning, as I am deeply interested in it) and papers from Arxiv are also often a part of my day to keep me up to date with what is happening and keep the learning ongoing. We also apply some ideas from them in our work sometimes.
The day usually ends around 19:00 h Paris time.
- A Day in the Life of a Data Scientist
- Another Day in the Life of a Data Scientist
- Using Deep Learning to Solve Real World Problems