DataScience Project Management Framework – CRISP -DM
A problem solver, integrator and simplifier with a B.E degree in computer Science and 17 years experience in IT in areas of Databases, Cloud and Project Management. I am currently pursuing PG in DataScience and would be happy to connect if anyone wants to learn more about DataScience.
Data is the new oil, new currency. Data is the King or at least the Kingmaker.
Data Science or Business Analytics has a vast application in almost every area. More and more companies are adopting the Data approach in decision making. The number of Data Science Projects is seeing exponential growth and Project Managers must understand Data Science Project Methodologies. CRISP – DM is one such framework which provides Project managers a structured approach to manage Data Science projects.
CRISP-DM – (CRoss Industry Standard Process for Data Mining) was originally developed by IBM is one of the popular, industry-neutral, and application-neutral framework that can be applied to most of the Data Science Projects
According to CRISP-DM, a given DataScience project has a life cycle consisting of six adaptive and iterative phases.
CRISP-DM – The Six Phases
- Problem/Business/Research Understanding Phase
- First, state the project objectives and requirements.
- Then translate these objectives and restriction in Data mining problem definition
- Prepare the strategy for achieving this objectives
- Data Understanding Phase
- Collect the data
- Familiarize with the data, discover initial insight
- Evaluate the quality of data
- Finally, if desired select an interesting subset of data that may contain actionable patterns
- Data Preparation Phase
- This labor-intensive phase covers all aspects of preparing the final data set, which shall be used for subsequent phases.
- Select the cases and variables you want to analyze for your analysis
- Perform Data Transformations
- Clean the raw data so that it is ready for modeling tools
- Modeling Phase
- Select and apply appropriate modeling techniques
- Calibrate model settings to optimize the results
- Loopback to Data Preparation Phase to bring Data in the form in line with the model requirements
- Evaluation Phase
- Evaluate the model generated in the modeling phase for effectiveness, quality, and cost
- Evaluate if model achieves the project objectives as stated in phase 1
- Review all the important factors are considered and nothing is overlooked.
- Finally come to a decision on which model to choose, whether to proceed to deployment or iterate further, close the project or start a new project.
- Deployment Phase
- Model creation does not signify the project completion, Need to make use of the model to get the business results and ROI.
- Model to be deployed inhouse or on some cloud services like AWS, Azure etc
- Monitor the model quality and performance and make the required rectifications
- Report the overall performance and quality of the model.
CRISP- DM is just one of the frameworks for DataScience Projects, there are many other frameworks and tools which can be very useful for Project Managers. I would be looking forward to sharing more on the intersection of Data Science and Project Management in next newsletters.
Data Mining and Predictive Analytics – Daniel T.Larose, Chantal D. Larose
Peter Chapman, Julian Clinton, Randy Kerber, Thomas Khabaza, Thomas Reinart, Colin Shearer, Rudiger Wirth, CRISP-DM Step-by-Step Data Mining Guide, 2000