- Classification (e.g., email spam or not spam)
- Recommendations (e.g., suggested Amazon products)
- Prediction (e.g., expense and revenue forecasts)
- Media Content Recognition (e.g. video, image, text, audio)
- Scoring/ranking (e.g., credit score)
- Pattern / Anomaly detection (e.g., payment fraud detection)
Each of these is intended to address a specific goal and/or solve a specific problem.
The Data Science Process
The data science approach may vary based on project type and typically involves the following phases:
- Data gathering
- Data Discovery and goal clarification (ask the right questions)
- Data Cleaning (munging/wrangling)
- Exploratory data analysis (EDA)
- Choosing a model and algorithm
- Apply data science techniques (e.g., machine learning, statistical modeling)
- Measuring and improving results (validation and tuning)
- Presenting final results to determine next steps
That’s the process in a nutshell. So how does Agile improve this process?
Facilitating The Data Science Process with Agile Practices
Let’s go over three popular Agile practices that appear to be most suitable for Data Science projects.
Scrum is the most widely used Agile framework. According to a 2015 survey by the Scrum Alliance, 95 percent of Agile organizations use Scrum as their development approach. Scrum is based on a small number of best practices and rules such as
- roles (e.g. Developer, Product Owner, Scrum Master),
- events (sprint planning, daily scrum, sprint review & retrospective) and
- artifacts (product backlog, sprint backlog, product increment)
Kanban is Japanese for “visual signal” or “card.”. Toyota line-workers used a kanban (i.e., an actual card) to signal steps in their manufacturing process. Today Kanban is a popular framework used by software teams practicing agile software development. Physical Kanban boards use sticky notes on a whiteboard. Online Kanban boards draw upon the whiteboard metaphor in a software setting.
Extreme Programming (XP) is a method to improve software quality and responsiveness to changing requirements. Examples XP practices include
- Test-driven development (write a test before you write code to fulfill that test)
- Refactoring (small code changes to to make it easier to understand and maintain)
- Pair programming (One, the driver, writes code while the other, the observer or navigator)
- Continuous Integration (frequent code check-ins and merging)
It is recommended to review these Agile practices with your team to determine how they can be implemented in the current data science process to improve the quality of deliverables. E.g. you may want to implement Kanban and XP practices first before changing an existing sequential approach to interactive sprints to minimize disruptions of the current process. Identifying new Agile software tools and assigning Scrum roles will require an open discussion about the change readiness of the entire data science team.