When I introduce myself as a data scientist, I often get questions like “What’s the difference between that and machine learning?” or “Does that mean you work on artificial intelligence?” I’ve responded enough times that my answer easily qualifies for my “rule of three”:
When you’ve written the same code 3 times, write a function
When you’ve given the same in-person advice 3 times, write a blog post
— David Robinson (@drob) November 9, 2017
The fields do have a great deal of overlap, and there’s enough hype around each of them that the choice can feel like a matter of marketing. But they’re not interchangeable: most professionals in these fields have an intuitive understanding of how particular work could be classified as data science, machine learning, or artificial intelligence, even if it’s difficult to put into words.
So in this post, I’m proposing an oversimplified definition of the difference between the three fields:
- Data science produces insights
- Machine learning produces predictions
- Artificial intelligence produces actions
To be clear, this isn’t a sufficient qualification: not everything that fits each definition is a part of that field. (A fortune teller makes predictions, but we’d never say that they’re doing machine learning!) These also aren’t a good way of determining someone’s role or job title (“Am I a data scientist?”), which is a matter of focus and experience. (This is true of any job description: I write as part of my job but I’m not a professional writer).
But I think this definition is a useful way to distinguish the three types of work, and to avoid sounding silly when you’re talking about it. It’s worth noting that I’m taking a descriptivist rather than a prescriptivist approach: I’m not interested in what these terms “should mean”, but rather how people in the field typically use them.
Data science produces insights
Data science is distinguished from the other two fields because its goal is an especially human one: to gain insight and understanding. Jeff Leek has an excellent definition of the types of insights that data science can achieve, including descriptive (“the average client has a 70% chance of renewing”) exploratory (“different salespeople have different rates of renewal”) and causal (“a randomized experiment shows that customers assigned to Alice are more likely to renew than those assigned to Bob”).
Again, not everything that produces insights qualifies as data science (the classic definition of data science is that it involves a combination of statistics, software engineering, and domain expertise). But we can use this definition to distinguish it from ML and AI. The main distinction is that in data science there’s always a human in the loop: someone is understanding the insight, seeing the figure, or benefitting from the conclusion. It would make no sense to say “Our chess-playing algorithm uses data science to choose its next move,” or “Google Maps uses data science to recommend driving directions”.
This definition of data science thus emphasizes:
- Statistical inference
- Data visualization
- Experiment design
- Domain knowledge
Data scientists might use simple tools: they could report percentages and make line graphs based on SQL queries. They could also use very complex methods: they might work with distributed data stores to analyze trillions of records, develop cutting-edge statistical techniques, and build interactive visualizations. Whatever they use, the goal is to gain a better understanding of their data.