It is no secret that data science is on the rise. While the biggest hype is arguably behind us, data scientists are in high demand and data science seems to be here to stay.
While the applications of data science are being picked up rapidly by firms in the tech industry, there seems to be a lag in other domains. That said, there is clear potential for value creation in these other areas as well. While (in my estimate) only a small portion of companies is today effectively using data science to drive decisions, most are aware of the developments and are at the very least exploring what could be in it for them.
A first difficulty to overcome is to understand what data science is. This is not the easiest thing to do, but it is essential in order to select a first relevant use case. When someone asks me what a data scientist does for a living I generally tell them something along the lines of: a data scientist is someone who is at ease with data, programming languages, mathematics and statistics and is able to quickly build up knowledge on the business domain the project is situated in. He (or she) uses these skills to deliver analyses and models which in turn aid the decision making of the company. The decision making power of a model is often very concrete and specialized. A model can for example aid the decision making in cases such as “Is the risk this customer going to churn high enough for us to take preventive action on?”
, “Is the probability high enough that this transaction is fraudulent in order to block the transaction?”
or “How much of resource X should I stock at location Y to meet the demand for the coming week?”
. As you can see, the models in these examples can only give answers to very specific questions. A first take-away is therefore that a data science project will (at the very best) give you an accurate answer to a specific question. If it is able to do so, it can potentially create tremendous value. This illustrates how important it is to correctly define a data science project and, thus, also your first use case.
I will not try to delve any deeper into what data science is as whole books have been written on the topic. However it is interesting to keep in mind a sort of scale by which to measure data science projects. The two extremes of this scale are on the one hand (simple) descriptive analyses and on the other hand fully automated decision making solutions. Generally, but not always, the former has a lower business impact than the latter.
Another aspect to consider is that data science projects are not IT projects. At the start of an IT project, the objective and feasibility are often quite clear. When we look at the feasibility of a data science project we usually cannot make an accurate estimate before the start of the project. While the objective can be very clear (and should be), gaining a thorough understanding on the feasibility will always
require an exploration phase.
A clear red flag is someone who says “My model will have at least an accuracy of X percent”
without exploring the actual data and building the first test models.
Below you can find a step by step list of things we picked up over the years that are important when going for a first successful use case.
0. Understand what data science is
Make sure that you understand what data science is and what it could potentially mean for you and your business. A basic business-level understanding of data science is essential, as it will allow you to discuss potential projects in the context of their proposed value offering. There are quite a lot of books on the subject, one of the business-focused ones I can recommend is Data Science for Business
Another interesting page is the Awesome Data Science Ideas
list, which has gathered quite a few business relevant examples on data science use cases.
1. Assemble the right project team
The right team is key (isn’t it always?). The right team in this case is one that is multidisciplinary. It needs someone who has a thorough understanding of the business, someone with decision power who will make the difficult decisions when they come up, someone who is able to make the project visible within the company, someone to follow-up on the project status, someone with good knowledge on the company’s data resources and last but not least a strong data scientist. While not all roles have to be distributed over separate people (starting small can be a good thing), all roles are valuable.
The data scientist will be at the center of the team and is the one who generates the main deliverable. Therefore, it is essential that an expert level data scientist is selected to work on this first use case so that there is no doubt whatsoever about his skills.
2. Selecting the right case
For a company to gain a good understanding of the value that data science can offer, a first use case should have a clear impact on value creation or at the very least show a potential to do so.
In this phase, there are four questions which I generally use to estimate the potential success of a first use case.
- Do we expect the case to have an impact on value creation (and aid decision making)?
- Are the results translatable to other departments, allowing to stimulate engagement and induce excitement around the topic of data science within the company?
- Do we have the basic necessities to start working on this case (availability of domain experts within the company, the right data, etc.)?
- Is the objective of the case defined in such a way that the output of the case is actionable? In other words, will it be able to solve a business problem in the short to medium term? In the long run this correlates strongly with value creation.
3. Invest in the case
Once a case is selected we need to make sure that the data scientist has the basics worked out. This means that he needs easy access to the data as well as input from business domain experts. It is important to support him in such a way that he is able to focus most of his time on relevant exploration and modelling. Allow him to work with the tools he prefers, as he will perform best when using these.
As it is doubtful that you would have this person in-house when you are starting out on first project, there is a good chance that you will have to look outside of your company for data science consultants. This is a landscape that is somewhat difficult to navigate. There have been a huge amount of developments in the data science world in terms of both theory and tools. This has made the big players in the consultancy world to have had a hard time keeping up, and consultants working only with a specific proprietary tool often lag behind the state-of-the-art. Too often people with a very limited resume are presented as senior data scientist. Be critical about resumes, press for (more) detailed descriptions of the projects they’ve done and judge them based on their actual experience with data science.
Data is crucial. To be able to go for a successful first use case, make sure the data scientist has the correct data at hand. While building models on small scale data is not impossible, the chance of mediocre results is generally higher. This is not something that you would want to risk as a first project. If the data does not exist in the right format, or does not exist at all but is essential to the case, invest in making it available through manual efforts or go for another first project.
In this phase, you might very well get the advise from someone to invest in a “Big Data Infrastructure” like Hadoop. I’ve even heard about investing in big data infrastructure as being presented as as a first project with data science
. While big data infrastructure has some very concrete merits, it is not a show stopper if you do not have this infrastructure at the moment or simply find it too early to invest in. A data scientist is generally able to adapt rather quickly to the environment and will make due with the means he has available. While in the short run this could potentially mean that you are not able to deploy a solution in a more stable production environment, it is not keeping you from getting the first results and gaining an in-depth understanding of the feasibility and value of the project when it does scale.
4. Explore quickly and fail fast
As already mentioned, exploration is an essential part of any data science project. During this exploration phase, the data scientist tries to better understand if it is possible to find a solution to the problem at hand. If these initial explorations lead you to conclude that the resulting model will most likely have a below-par result, you should either adjust the objective based on the new knowledge that was gained or provide/invest in the correct resources which do enable you to attain the original objective.
This makes for a specific environment where failure is possible and well-founded (intermediary) conclusions should be cheered on, even if those conclusions are found to be negative.
5. Regular status updates and adjustments
Throughout the project, make sure that all parties are involved and up-to-date on the status of the project at all times. This helps increase the feeling of involvement, allows input on potential further explorations and enables shifts in focus if initial explorations give voice to concerns about feasibility.
6. Report on the findings
Give attention to success! Make a detailed report on the findings and share it in combination with a more digestible format such as a small article or poster. Share the story of what data science allowed to do more quickly or more accurately than before.
Before sharing these results always ask yourself “What are the new insights you gained?”
, “Will new actions be taken based on these new insights?”
and “Do you see a path for data science in your company?”
. Make sure to include the answers to these questions in your story telling.
In the case where the project was not a huge success, and let’s be honest, there is a risk of that happening. Make sure that everyone understands why the project didn’t result in the expected output. Is it something that can be resolved in the future? Could better preparation avoid this? Would a project with a different focus in terms of business topic result in more actionable or valuable results?
While the above steps are by no means a certain route to success, they are intended to give a few pointers based on the experience we’ve gathered over the years. Feel free to reach out if you have any comments or questions!
by Bart Smeets
dataroots is a data science consultancy company