1 Introduction
data science, life cycle
1.1 DATA 793 Data Science Practicum
1.1.1 Purpose
As the capstone experience in the MS program in Data Science, DATA 793 provides a unique opportunity for a student to collaborate with a client from academia, government, or various industries and institutions, working on real-world problems in the realm of Data Science. The goal is for the student to demonstrate and strengthen their competencies as a professional data scientist while creating a project solution they will be proud to share as part of their portfolio of data science accomplishments.
1.1.2 Description
As a student project-oriented course, a key aspect of this course is for each student to identify a client/sponsor for a project of interest and then collaborate with them in developing a solution to their problem.
There will be some lectures and presentations but the bulk of the class time will be hearing from each student as they share the status of their projects and get feedback from their peers.
Student learning in course depends upon their peer engagement and willingness to engage at a meaningful level across the diversity of projects and their use of data science knowledge, methods, and tools.
1.1.3 Learning Outcomes
Upon successful completion of this course, a student will be able to demonstrate competence in developing solutions requiring diverse data science methods.
Specifically, the student should be able to:
- Research and conceptualize a problem, define the scope of the project and break it down into workable components.
- Gather data necessary to support the solution and clean, shape, explore and analyze the data as required to support the solution.
- Manage a project effectively by identifying key tasks, projecting and monitoring resources, assessing performance and risk, and ensuring all deadlines are met.
- Establish a positive working relationship with a client sponsor by understanding the clients needs and deadlines, responding quickly to client requests, and managing their expectations.
- Deliver an organized, articulate presentation demonstrating the effective application of data science methods and tools and communicating results.
- Prepare a professional, cleanly-written document that encapsulates reproducible results on the topic.
1.1.4 Prerequisites
This course requires successful completion of DATA-613 Data Science as a prerequisite.
1.1.5 Required Resources
There is no specific text for this course.
1.2 Course Themes
This course is structured around three topic themes
- Project Solution
- Project Management
- Professional Development
Project Solution is about following the data science life cycle to create a solution to a client’s problem. While each client project will have different emphasis, students are expected to go through the steps of the life cycle at least up through communicate their results.
Project Management requires students to formalize the collaboration with the client sponsor with a Performance Work Statement (PWS) and then execute a project management life cycle to plan and execute the project to meet the requirements of the PWS.
The Professional Development theme covers course activities related to preparing one for a career as a professional data scientist beyond gaining technical competency. These include job searches, resume development, interview preparation, maintaining an online presence, and developing a professional network to include mentors.
1.3 A Data Science Life Cycle
Responsible Data Science depends upon following a repeatable process or life cycle for analysis and solution development.
There are many different life cycles and frameworks in the community. Some are tailored to one aspect of data science. Others attempt to include all aspects of data science in a single framework.
This course will use the following life cycle as a frame of reference given its focus on answering a question of interest.

- Figure 1.1 portrays eight steps for a Data Science life cycle that start with someone asking a question and end with observing the outcomes of the solution.
- Some might be tempted to stop at an earlier step, but a data scientist knows that every analysis and solution is based on assumptions, explicit and implicit.
- Observing outcomes is a responsible approach to validating if assumptions were valid or responsible.

- Figure 1.2 provides additional details on the types of activities that can occur within each step.
- It also highlights that while Figure 1.1 shows a nice, circular process that is always making progress, responsible data science often takes one step forward and then two steps backwards.
- Feedback from the activities at a step might indicate one should back up and repeat an earlier step.
- As an example, if modeling and analysis shows the data is not as robust as desired or shows sampling bias that will render the results less useful for the question, one may need to back up to step 3 to get more data or even step 1 to get guidance on reframing the question of interest.

- As implementation occurs, it will usually generate new data that could support future analysis.
- Responsible data science will use this new data to assess assumptions made in building the solutions and whether there is disparate impact on the populations affected by the implementation.

- Figure 1.4 shows that responsible data science is not a single step in the life cycle but underlies activities at each step in the life cycle.
- The top of the figure identifies several considerations for shaping the analysis or solution at each step to ensure the analysis or solution complies with laws and ethical guidelines while minimizing risks to fairness, privacy, and confidentiality of data and people.
- The bottom of the figure identifies attributes for the activities at each step to ensure the work aligns with principles for responsible data science.
- We will address aspects of responsible data science in more detail throughout the course.

- Figure 1.5 shows that the data science community is here to help you as you work through each step in the data science life cycle.
- The data science community is not a single organization but an ecosystem of professional organizations, online forums, individual mentors, peers, and others who you can help.
- Engaging in the community helps your professional development and helps you contribute back to the community by helping others.