4 Framing the Project

Published

January 7, 2026

Keywords

Requirements Analysis, User Workflows, Techncial Approach, Responsible Data Science, Feasibiity

4.1 Context

A Project Plan creates a framework for managing the work on the project to meet the requirements on time and within resources. It does not specify how the work will be done.

Framing the project establishes a disciplined foundation for all subsequent technical and project management decisions.

Rather than beginning with tools or methods, effective framing starts by clarifying what problem is being addressed, under what constraints, and according to what standards of evidence and responsibility.

This process begins with requirements analysis, which identifies stakeholder needs, acceptance criteria, and project constraints.
It is followed by a review of relevant literature and prior work to understand established methods, performance expectations, and known risks.
Together, these inputs inform the development of a coherent technical approach that integrates data strategy, methodologies, methods, tools, and workflows.
Once the technical approach is defined, it must be examined through a Responsible Data Science lens to assess ethical, fairness, privacy, and societal implications.
Finally, the proposed approach is subjected to a feasibility analysis to ensure it is acceptable, affordable, and deliverable within the project plan.

Taken together, these steps ensure that the project is not only technically sound, but also responsible, realistic, and aligned with stakeholder expectations.

Note

Framing the project can be placed in the context of a higher-level life cycle of project selection and execution, whose results inform the current project’s requirements and constraints.

Most projects start from a germ of an idea, an intuition or inspiration of how one might answer a question, create new knowledge, or solve a known problem.

Before a project can be executed, however, a decision must be made to allocate resources to it.

Because multiple potential projects often compete for the same limited resources, this decision typically involves some level of screening as there could be multiple potential projects vying for the same set of resources.

For small projects requiring few resources, the screening may be informal and completed quickly.
For larger projects, requiring substantial resources, the screening process may involve a formal cost-benefit analysis (CBA) or return on investment (ROI) assessment to determine whether the project should be undertaken at all; in some cases, this analysis may itself constitute a separate project.
The US Defense Advanced Research Projects Agency (DARPA) and many other organizations use a structured approach such as the Heilmeier Catechism as part of their project screening process.

Once the decision has been made to initiate a project, the outcomes of the screening, whether a brief discussion or a detailed CBA, shape the project’s requirements and guidance.

These are typically formalized in a Performance Work Statement (PWS) or a project description document.
The decision-making process also typically shapes or constrains the resources and timeline for the project.

The requirements and constraints from those documents inform and guide the work to frame the project.

Conducting an analysis of the requirements and literature review helps mature and refine the ideas and assumptions from the initial screening into the detailed and feasible technical approach necessary for the project’s success.

When a project concludes, its results inform future potential projects and the projects life cycle rolls on.

References

4.2 Requirements Analysis

4.2.1 Introduction

Central Question: How do you analyze a Performance Work Statement (PWS) to determine what’s really required to meet the performance requirements and acceptance criteria?

Data scientists generally possess strong analytical skills and expertise in data wrangling, statistical modeling, machine learning, and data interpretation. When given a new problem, we are eager to jump right into coding to answer the question. However, while the requirements in a (PWS) may be specific, they tend to be “high-level” as they are descriptions of required capabilities or outcomes and not detailed specifications about the code.

The “Jump to Code” Trap

Data scientists are often tempted to immediately start coding without thoroughly understanding what they’re actually trying to build. This leads to:

Scope Creep: Discovering requirements mid-project that change everything
Technical Debt: Quick hacks that lack robustness and fail as the code expands.
Requirements Misalignment: Building something that doesn’t meet actual standards or expectations.
Resource Overruns: Underestimating complexity and having to redo work.

When facing high-level requirements, a best practice is summarized by the maxim “Think before you code”. Taking the time to do an expanded approach to problem analysis and solution design helps ensure you understand what is really meant by the requirements. What are the the specified and implied tasks, any constraints, and the standards for the performance of your solution. This enhanced understanding helps you design your technical approach to meet all the client’s expectations and requirements and then implement your code more efficiently at lower risk.

This expanded approach is often called “Requirements Analysis”.

Requirements analysis helps avoid the “Jump to Code” problems by forcing systematic thinking about the complete solution before any code is written.
Requirements analysis provides a framework for a critical professional skill: the systematic interpretation and translation of high-level requirements into actionable and feasible technical approaches for research/analysis or development.

Note

Project Planning and Requirements Analysis are typically parallel and integrated processes. Large projects may have three (or more) teams involved.

The Project Management team focuses on what work packages must be done, by when, to what level of performance standards to meet the PWS requirements and they project the resource requirements to execute the work within budget.
The Technical Team focuses on the technical approach, how to execute the work given the projected resources.
The Quality Assurance (QA) and Test Team focuses on building and executing quality checks and tests to ensure all deliverables meet the PWS quality and performance standards.

No single team is in charge! Each must collaborate with the others to share information so the project solution is feasible, affordable, and acceptable to the client.

When there are conflicts (and there always are), e.g., the technical team wants two additional data scientists and the program management teams says that will cost too much, they generate their analyses and recommendations to the Project Manager who has to decide.

The Good News/ Bad News for this course is you play all four roles: Project Management Team, Technical Team, QA and Test Team, and the Project Manager 😎.

4.2.2 A Systematic Framework for Analyzing High-level Requirements

Projects based on high-level requirements such as in a PWS require systematic interpretation of the requirements to extract both the explicit requirements and hidden assumptions (Implied requirements) that will drive the technical approach.

A systematic framework for requirements analysis answers four questions.

Specified Tasks: What tasks or requirements are explicit?
Implied Requirements: What tasks or requirements are implied, i.e, are necessary to accomplish the specified tasks or requirements or are expected by the client, but are not stated?
Constraints: Are there explicit or implicit factors that limit or shape suitable technical approaches?
Performance Standards: Are the specified performance standards sufficient for testing and ensuring acceptability?

4.2.2.1 Specified Tasks

Identifying specified tasks is straightforward. Look for concrete, actionable statements in the PWS that specify what must be accomplished or delivered. Table 4.1 has several examples.

Table 4.1: Examples of specified requirements

PWS Language	Specified Task	Technical Approach Option
“Shall develop predictive model for customer churn”	Prediction capability required	Classification model
“Create interactive dashboard”	Build a dashboard	Shiny or DASH or Tableau
“Analyze seasonal patterns in sales data”	Identify patterns in sales data based on seasons	Time series analysis
“Provide recommendations for marketing strategy”	As-Is and To-Be Strategy Analysis	Framework for Marketing Tools

Many PWS documents use the words “shall” or “must” to specify requirements that are legally enforceable.

There are many tools to search for all the “shall statements” in a PWS and extract them into a Requirements Traceability Matrix to facilitate tracking all the requirements for a project to ensure compliance.

Note

In the Federal Code of Regulations, “shall” and “must” are defined as “imperative” which means they are used in the sense of issuing a command or directive. A “shall statement” creates a mandatory legal obligation that is a legally enforceable requirement.

The use of “will” typically expresses future intention or expectation without the same binding legal force. As an example, in the statement “The contractor shall build system X that will be used by future generations,” the contractor is responsible for building the system, not whether future generations actually use it.

4.2.2.2 Implied Tasks and Requirements

Specified tasks often have sub tasks that are part of the standard technical approach for performing that task. These are usually derived during the work breakdown analysis.

However, specified requirements can also imply or assume other tasks will be included in the work, as they are necessary to accomplish the high-level requirements, but they are not specified.

These are often predecessor or parallel tasks.

Table 4.2 includes some possible examples for the tasks from Table 4.1.

Table 4.2: Examples of possible implied requirements

Specified Task	Possible Implied Tasks
Prediction capability required	Develop a data repository and mechanism for version control of the data used for training and testing. Develop scripts for quickly updating model training and test data.
Build a dashboard	Conduct a stakeholder analysis to identify key metrics and user experience requirements. Build a streaming API for real time updating of input data. Provide a User Help/Guide.
Identify patterns in sales data based on seasons	Establish framework to clearly define seasons that is consistent with client expectations and incorporates the geographic locations of customer sales.
As-Is and To-Be Strategy Analysis	Conduct stakeholder analysis of existing marketing strategy visibility, understanding, strengths, and gaps.

Other examples of common implied tasks include:

Building data scraping tools.
Integrating with multiple data systems.
Identifying security and privacy requirements and building in access controls.
Conducting web accessibility testing.
Ensuring compliance with regulatory requirements.
Building scripts or actions to support continuous integration and deployment.

It is good practice to confirm client expectations around implied or assumed tasks, especially if they add significant level of effort, time, or risk to the project.

4.2.2.3 Identifying Constraints

High-level requirements often specify some constraints but may omit others that clients “assumed” were known. These can affect the the technical approach as well as performance, schedule, and cost projections.

Possible examples include:

Specified Constraints:

That solution shall operate on hardware and software that is compatible with the organization’s production environments.
Only use open-source software with specific licenses to allow for proprietary development, e.g., the MIT license.
Software development shall be done by a team certified as CMMI-level 4.

Implied Constraints:

Follow organization’s data access permissions and approval processes
Legacy system integration requirements

4.2.2.4 Deriving Performance Standards

High-level requirements should include explicit performance standards, especially for firm fixed price deliverables. These may be in the requirements statements, the deliverable acceptance criteria, or a separate section of the document.

However, there may be general or implied standards such as “rapid response”, “reasonable accuracy” from which you must derive or translate into testable standards for performance or for making evaluation choices.

Examples include:

Specified Standards

Accuracy Expectations: Prediction models should exceed 85% accuracy with no more than 5% False Negatives
Reliability Standards: Systems shall maintain no less than .999% operational availability each month.

Derived Standards

Quality Standards: Determine how clean must data be, e.g., “The system shall ensure that ≥ 99.5% of ingested records conform to defined schema, data type, and domain constraints at the time of validation.”
Completeness Thresholds: Determine what missing data percentage is acceptable, e.g., “For all mandatory fields, the system shall ensure that ≥ 99.8% of records contain non-null, non-placeholder values.”
Response Time Requirements: Define Real-time, near real-time, or batch processing responses e.g., “The system shall update the dashboard within 100 ms for 99% of events; late updates may be dropped.”

4.2.3 Requirements Analysis for Research and Analysis Projects

Research and analysis projects focus on generating insights, testing hypotheses, and building predictive models.

Requirements analysis must identify the analytical complexity, methodological constraints, and validation standards that will drive technical approach decisions.

4.2.3.1 Critical Requirements Assessment Areas

Analytical Scope and Complexity

Scope Creep Risk Factors

Scope creep is a common risk in projects where requirements have not been well defined or analyzed which leads to a mismatch between the client and the project team about what is “in scope” of the project.

Resolving this mismatch often leads to the unplanned addition of requirements to the scope of the project.
Scope creep can result in re-planning, re-working code, schedule delays or budget overruns and ultimately in failed projects and/or dissatisfied clients.

When reviewing the high-level requirements watch out for High-Risk Indicators such as the following:

Vague success criteria (“provide insights” or “highly accurate”) or acceptance criteria e.g., “user-friendly”
Multiple stakeholder groups with different expectations or no clear client decision maker.
“Exploratory” analysis without specifying hypotheses or the number of hypotheses to examine.
Limited specifications about the scope or currency of the data.
Requests for “comprehensive” analysis of complex phenomena

Table 4.3 provides an example framework for asking questions to analyze the requirements for a research/analysis project.

Table 4.3: A Framework for Analyzing Research/Analysis Projects

Requirement Area	Key Questions	Technical Implications
Research Questions	Are hypotheses clearly defined? How many research questions?	Determines statistical methods, sample size needs
Analytical Depth	Descriptive, predictive, or causal inference required?	Affects methodology complexity, validation approaches
Temporal Scope	Historical analysis, real-time monitoring, forecasting?	Influences data architecture, update mechanisms
Comparative Analysis	Internal benchmarks, industry comparisons, A/B testing?	Affects data requirements, statistical power needs

Data and Methodology Constraints

High-Effort/High-Risk Data Scenarios:

Multiple disparate data sources requiring complex integration
Unstructured data (text, images, audio) requiring preprocessing
Sensitive data with privacy/compliance restrictions
Real-time data streams requiring specialized infrastructure
Historical data with quality/completeness issues

Methodological Complexity Factors:

Novel or cutting-edge techniques requiring significant learning time
Custom algorithm development vs. off-the-shelf solutions
Ensemble methods or model stacking approaches
Causal inference techniques requiring specialized expertise
Large-scale distributed computing requirements

Validation and Performance Standards

Performance Standard Categories

Statistical Performance: Accuracy, precision, recall, statistical significance
Business Performance: ROI impact, decision improvement, process efficiency
Operational Performance: Processing speed, reliability, maintainability
Communication Performance: Stakeholder understanding, are they actionable

Critical Assessment Questions:

Validation Rigor: Academic peer-review standards vs. business validation needs?
Performance Thresholds: What accuracy levels justify deployment/implementation?
Interpretability Requirements: Black-box models acceptable or explanation required?
Uncertainty Communication: How will confidence intervals/limitations be conveyed?
Reproducibility Standards: What documentation/code sharing is required?

4.2.3.2 Research Project Risk Assessment Framework

Table 4.4 provides a framework for analyzing risks in a research/analysis project. What might be complex or hard to complete?

Table 4.4: Framework for Research/Analysis Project Risk Assessment

Risk Category	High-Risk Indicators	Mitigation Strategies
Methodological	Unfamiliar techniques, custom algorithms	Start with simpler approaches, build complexity gradually
Data Quality	Multiple sources, historical data gaps	Early data exploration, backup data identification
Scope Management	Vague success criteria, multiple stakeholders	Define specific hypotheses, prioritize research questions
Validation	Novel domains, limited benchmarks	Plan multiple validation approaches, stakeholder review cycles

4.2.4 Risk Framework for Application Development Projects

The requirements for application development projects (dashboards, Shiny apps, web applications) or even analysis pipelines are often high-level descriptions about what “users” should be able to do with the app.

These are insufficient to design code.

As an example, the client could be considering multiple users for a dashboard from the Senior Executives to mid-level managers to their business analysts.
Each of these users probably has different questions they want to answer with the dashboard and different skill sets for using the dashboard.
These different expectations will affect the technical approach for designing and building the dashboard.

Thus, application development projects require a different perspective on requirements analysis than research/analysis projects to shape the technical approach and manage risk.

When working on an application development project, consider deriving requirements for user workflows, interface requirements, and system integration.

4.2.4.1 Critical Requirements Assessment Areas

User Experience Workflow Requirements

Software developers uses many different approaches to derive and describe the requirements for an application, ranging from general descriptions to highly detailed templates and diagrams (Scott Ambler 2023).

These include requirements matrices, wire diagrams, use cases, user templates, etc..

Table 4.5 provides an example framework for asking questions to analyze the requirements for an application development project.

Table 4.5: Framework for Application Development Requirements Analysis

User Dimension	Assessment Questions	Technical Implications
User Profiles	Technical skill level? Domain expertise? Time constraints?	Interface complexity, help system needs
Use Cases	Primary tasks? Frequency of use? Decision support needs?	Feature prioritization, performance requirements
Context of Use	Desktop/mobile? Individual/collaborative? High-pressure situations?	Responsive design, collaboration features
Success Metrics	Task completion rates? User satisfaction? Adoption metrics?	Usability testing, analytics integration

Developing a fully-detailed specification can take a lot of time and effort and no longer as common as it can violate the build-a little, test-a-little axiom for getting user feedback early and often.

Developers have learned that spending a lot of time to lock down requirements creates its own challenges as clients and users tend to change their mind once they see prototypes, or at least they communicate their expectations differently.

This has led to the the concept of defining “user workflows” which is popular in the design of user interfaces (Tamara Martinez 2025).

These are much shorter descriptions of the essential requirements to help scope the project while providing clear, testable, expectations in client-friendly language.

A user workflow captures who the user is (a profile of expertise and interests), what you expect them to be able to do with the app (their workflow steps) and the eventual outcome .

Technical elegance means nothing if users can’t or won’t engage with the system.

Specifying the user workflows helps inform the technical approach, design, and testing strategy for the app.

An project to create an application or analysis pipeline may have multiple user workflows. Here is one example:

Workflow 1: Exploratory Data Analysis

User Profile: Business analyst with basic statistical knowledge
Goal: Understand customer behavior patterns
User Workflow Steps:
1. Upload customer dataset
2. Generate summary statistics and visualizations
3. Filter data by customer segments
4. Export insights for presentation
Data Interactions: Interactive charts, filtering controls, drill-down capabilities
Outcome: Clear understanding of customer segments for strategic planning

Functional and Technical Complexity

Another area where high-level requirements can lead to risk is the requirement for High-Effort/High-Risk Application Features:

Review the requirements for these kind of requirements and be sure there are clear definitions and standards for performance.

Live data feeds and automatic updates
Real-time collaboration and multi-user access
Interactive visualizations with instant response
Risk Factors: Infrastructure complexity, performance optimization needs

Complex dynamic filtering and drill-down capabilities
Custom visualization libraries and animations
User-configurable dashboards and views
Risk Factors: Frontend development expertise needs, testing complexity

Multiple data source connections and APIs
User data upload and processing capabilities
Integration with existing enterprise systems
Risk Factors: Authentication, security, data validation needs

On-demand model training and prediction
Statistical analysis and hypothesis testing interfaces
Machine learning model explanation and interpretation
Risk Factors: Backend computational requirements, result interpretation

System Integration and Deployment Requirements

Most applications need to be deployed somewhere to be useful.

If the app is intended for other data scientists, an alternative could be to convert the application (and data?) into a package for distribution rather than deployment.

Table 4.6 provides a framework for asking questions about infrastructure and deployment requirements that can shape the technical approach.

Table 4.6: Framework for Application Deployment Requirements Analysis

Requirement Area	Key Questions	Resource Implications
Hosting/Deployment	Internal servers, cloud platforms, client infrastructure?	DevOps expertise, hosting costs
Authentication	Single sign-on, role-based access, public vs. private?	Security implementation time
Performance	Concurrent users? Data volume? Response time expectations?	Architecture complexity, testing needs
Maintenance	Update frequency? Bug fix responsibility? Feature evolution?	Long-term support planning

4.2.4.2 Application Development Risk Assessment

Common Application Development Pitfalls

Underestimating UI/UX Work: Interface design and user experience often require 40-60% of development time
Deployment Complexity: Getting applications from local development to production can require significant additional work
User Adoption Challenges: Building something users can use doesn’t guarantee they will use it

To help avoid some of the pitfalls, Table 4.7 provides a framework for asking questions about the requirements to identify risk areas and potential mitigation strategies.

Table 4.7: Framework for Application Development Risk Analysis

Risk Category	High-Risk Indicators	Mitigation Strategies
User Experience	Unclear user requirements, complex workflows	User interviews, prototype testing
Technical Complexity	Multiple integrations, real-time features	Phased development, MVP approach
Deployment	Unfamiliar hosting platforms, security requirements	Early deployment testing, infrastructure research
Adoption	Organizational change resistance, training needs	Change management planning, pilot programs

4.2.5 Summary:

Consider Requirements Analysis as an important professional skill.

Requirements analysis represents a fundamental shift from individual problem solving to professional solution design.
The process forces systematic thinking about problem interpretation, solution feasibility, and stakeholder alignment.
These are skills that distinguish successful data science practitioners from those who struggle with project delivery.

Requirements Analysis Builds Additional Competencies

Systems Thinking: Understanding how technical choices affect stakeholders, timelines, and maintenance requirements
Risk Assessment: Anticipating problems and designing mitigation strategies before implementation begins
Communication Integration: Designing technical solutions that support diverse stakeholder communication needs
Feasibility Evaluation: Honestly assessing what’s possible given real constraints rather than ideal conditions

The “think before you code” discipline developed through systematic requirements analysis provides a foundation for:

Project Leadership: Ability to guide technical teams through complex, ambiguous requirements
Stakeholder Management: Skills to translate between business needs and technical capabilities
Strategic Planning: Understanding how technical decisions affect organizational capabilities
Risk Management: Proactive identification and mitigation of project threats

The investment in thorough requirements analysis pays dividends for projects and throughout professional careers as project complexity increases and stakeholder expectations evolve. The ability to systematically analyze problems, design feasible solutions, and communicate effectively with diverse audiences represents core competencies that enable long-term professional success in data science roles.

4.3 Integrate Requirements Analysis with the Literature Review

The requirements analysis and literature review should work together to inform your technical approach.

Be Strategic with Your Literature Review

Don’t just summarize what others have done. Use the literature to validate your requirements analysis and inform technical decisions.

Analyze the literature sources to validate your understanding of complexity, identify proven solutions, and anticipate implementation challenges.

4.3.1 Data Approach

Consider the Analysis Inputs:

Data source complexity and access constraints
Quality standards and validation needs
Integration and update requirements
Volume and performance expectations

to shape Technical Approach Outputs:

Specific data acquisition and preprocessing workflows
Quality control and validation protocols
Architecture for data storage and access
Update and maintenance procedures

4.3.2 Methodology Selection and Validation

Use Literature to answer:

Have similar problems been solved successfully with your proposed methods?
What performance benchmarks exist for comparable projects?
What are common failure points and how were they addressed?
Are there simpler approaches that achieve similar results?

to shape technical decisions:

Research/Analysis Projects:

Statistical approaches aligned with research questions and data constraints
Validation strategies matching performance standards and stakeholder needs
Interpretation frameworks supporting communication requirements

Application Projects:

User interface frameworks supporting identified workflows and user profiles
Backend architectures handling identified performance and integration needs
Testing and deployment strategies addressing organizational constraints

4.3.3 Tool Selection

Tool Selection Integration

Your tool choices should emerge logically from the combination of requirements, constraints, and literature evidence, not just from personal preference or familiarity.

4.3.4 Complexity and Risk Assessment

The literature review can also help inform decisions about risks and mitigation strategies.

Table 4.8 provides some examples of risk indicators you can derive from the literature review.

Table 4.8: Literature Review Risk Indicators

High Complexity / Higher Risk	\|Lower Complexity / Lower Risk
Multiple papers describing partial solutions rather than complete approaches	Established methods with consistent results across multiple studies
Recent publication dates suggesting cutting-edge or immature techniques	Available implementations in major libraries or frameworks
Extensive preprocessing or feature engineering requirements	Clear performance benchmarks and evaluation metrics
Custom implementation needs rather than library availability	Successful applications in similar domains or contexts
Mixed or inconclusive results across similar studies	—

4.4 Deciding on a Technical Approach

When making decisions on the technical approach, consider these four questions:

Requirements Constraints: What are the non-negotiable technical requirements?
Literature Evidence: What tools have proven successful for similar challenges?
Resource Constraints: What tools align with your skill level and timeline?
Risk Assessment: What tools offer the best balance of capability and reliability?

Your technical approach must address four core elements that work together to deliver your solution.

Data Strategy
Methodologies and Methods /Application Design
Primary Tools and Technology to include enabling Version Control, Collaboration, and Reproducibility
Project Workflow

4.4.1 Data Strategy

The data strategy defines how data are accessed, managed, transformed, and preserved across the full project life cycle.

This includes raw data ingestion, intermediate processing, analytical or modeling datasets, and any derived data used for validation, testing, or deployment.

A sound data strategy is foundational for reproducibility, transparency, and maintainability, regardless of whether the project is primarily a research/analysis effort or an application development effort.

The data strategy must explicitly address:

How raw data are obtained and preserved
How data are transformed into analysis- or application-ready forms
How datasets used for modeling, validation, and testing are defined and managed
How data provenance and versioning are maintained
Risk and mitigation strategies for the data life cycle.

All downstream technical decisions depend on these choices.

4.4.1.1 Sources and Access Strategy

This element specifies where data originate and how they are accessed in a repeatable manner.

Key components include:

Data Source Documentation: Specific databases, APIs, file systems, sensors, or manual collection methods
Access Protocols: Authentication, approval workflows, rate limits, and retrieval procedures
Data Rights and Permissions: Usage restrictions, licensing, privacy, and compliance requirements
Backup and Contingency Sources: Alternative or secondary data sources if primary sources become unavailable

Clearly documenting access strategies reduces project risk and supports reproducibility.

4.4.1.2 Data Structure, Scale, and Lifecycle Planning

This element characterizes the form, size, and evolution of the data over time.

Key components include:

Data Formats: CSV, JSON, Parquet, database tables, unstructured text, images, streaming data
Volume and Growth Expectations: Current size, anticipated growth, computational implications
Schema Documentation: Variables, data types, keys, and relationships across datasets
Temporal Characteristics: Historical coverage, update frequency, latency, and seasonality

For application projects, this also includes how data are refreshed or updated during operation.

4.4.1.3 Data Quality Considerations and Mitigation

This element defines how data quality risks are identified, measured, and addressed.

Key components include:

Missing Data Strategy: Acceptable thresholds, imputation methods, exclusion rules
Bias Identification: Sampling, measurement, temporal, or selection biases
Validation Protocols: Range checks, consistency rules, cross-source comparisons
Ongoing Quality Monitoring: Automated checks, logging, and alerts where appropriate

Explicit quality strategies prevent hidden assumptions from undermining results or functionality.

4.4.1.4 Analytical and Modeling Data Management

For projects involving statistical analysis, machine learning, or predictive modeling, the data strategy must clearly define how datasets are constructed and separated.

Key components include:

Dataset Partitioning: Training, validation, testing, and cross-validation splits
Reproducible Splits: Fixed random seeds, deterministic partition logic
Feature Construction: Derived variables, transformations, and feature selection
Leakage Prevention: Ensuring no information flows improperly between data partitions

For application projects, this includes alignment between analytical datasets and data used in production workflows.

4.4.1.5 Data Provenance, Versioning, and Reproducibility

Reproducibility requires that all data transformations are traceable and repeatable.

Key components include:

Raw Data Preservation: Immutable storage of original inputs where feasible
Transformation Documentation: Scripts or pipelines that generate derived datasets
Versioning Strategy: Dataset versions tied to code and environment versions
Reconstruction Capability: Ability to regenerate analytical or application datasets from raw data

These practices ensure results can be reproduced, audited, and extended.

Data Strategy Reality Check

Common underestimations include:

Time required for data access approvals and API setup
Data cleaning and preprocessing effort (often 60–80% of project time)
Integration challenges when combining multiple data sources
Quality issues discovered only after significant processing

Planning for these realities improves schedule reliability and project outcomes.

4.4.1.6 Data Risk Identification and Mitigation

Data-related risks can materially affect project validity, reproducibility, timelines, and operational reliability.

Identifying these risks early allows for mitigation strategies to be incorporated into the technical approach rather than addressed reactively.

Consider known or anticipated data risks and how they will be mitigated.

Data risks typically fall into several overlapping categories:

Availability Risks: Data may be delayed, incomplete, or become inaccessible
Quality Risks: Errors, missingness, or inconsistencies may undermine analysis or functionality
Bias and Representation Risks: Data may not reflect the target population or use case
Stability Risks: Data distributions or schemas may change over time
Compliance and Ethical Risks: Legal, privacy, or licensing constraints may restrict use

Explicitly identifying which categories apply improves transparency and planning.

4.4.1.7 Risk Assessment and Mitigation Planning

Each identified data risk should be assessed for likelihood, impact, and mitigation strategy.

Table 4.9 provides some examples of data risks.

Table 4.9: Common data risks and potential mitigation strategies

Risk Area	Example Risk	Potential Impact	Mitigation Strategy
Data Access	API rate limits or approval delays	Project schedule delays	Early access requests, caching, backup sources
Missing Data	High missingness in key variables	Reduced statistical power, biased results	Imputation strategy, sensitivity analysis
Data Bias	Non-representative samples	Invalid inference, unfair outcomes	Re-weighting, stratification, bias audits
Schema Changes	Variable definitions change over time	Pipeline failures, inconsistent results	Schema validation, version-controlled datasets
Data Leakage	Information bleed between datasets	Inflated performance metrics	Strict partitioning, leakage checks

Documenting these risks supports informed decision-making and reviewer confidence.

Monitoring and Review

Data risks are not static and may evolve as a project progresses.

Recommended practices include:

Periodic reassessment of data risks
Automated checks for schema, distribution, and volume changes
Logging and alerting for data ingestion or validation failures
Revisiting mitigation strategies after major data updates

Ongoing monitoring ensures that data risks remain visible and manageable.

Explicit identification and management of data risks:

Improves reproducibility and analytical validity
Reduces downstream rework and unexpected failures
Supports ethical and compliant data use
Strengthens confidence in results and system behavior

Incorporating data risk management into the data strategy ensures that data-related uncertainties are acknowledged and addressed as part of the overall technical approach.

4.4.1.8 Summary

A robust data strategy:

Treats data management as a first-class design decision
Supports reproducibility from raw inputs through final outputs
Applies consistently across research/analysis and application projects
Reduces technical risk and improves long-term maintainability

Clear data strategy decisions strengthen the entire technical approach by ensuring that all subsequent analysis, modeling, and application behavior is grounded in well-managed data.

4.4.2 Methodologies for Research/Analysis Projects

Your methodology section must clearly connect research/analysis questions to analytical approaches.

The following framework is consistent with methodological expectations commonly found in journals emphasizing clarity, reproducibility, and rigor.

4.4.2.1 1. Research Questions and Objectives

The research questions define the purpose and scope of the project and shape all subsequent methodological choices.

Key components include:

Clearly articulated primary and secondary research questions
Hypotheses or decision objectives, where applicable
Scope boundaries and assumptions

Some examples of questions types and associated methods are:

Descriptive Questions (“What patterns exist?”)
- Exploratory data analysis, clustering, visualization methods
Predictive Questions (“What will happen?”)
- Machine learning models, time series forecasting, regression analysis
Causal Questions (“What causes what?”)
- Experimental design, causal inference methods, natural experiments
Comparative Questions (“Which is better?”)
- A/B testing, statistical hypothesis testing, comparative analysis

A well-defined research question ensures that the analysis is targeted, interpretable, and testable.

4.4.2.2 2. Core Analytical or Research Elements

This element specifies what is being studied and analyzed.

Key components include:

Data sources and data provenance
Variables, features, or constructs of interest
Data quality assumptions and preprocessing requirements
Conceptual or theoretical framework, if applicable

Clearly defining analytical elements establishes transparency and supports reproducibility.

4.4.2.3 3. Study Design and Analytical Strategy

The study design describes how the research or analysis shall be conducted.

Key components include:

Study type (e.g., observational, experimental, quasi-experimental, simulation)
Sampling strategy or data partitioning approach
Analytical methods, statistical models, or algorithms
Validation, robustness checks, or sensitivity analyses

Table 4.10 provides additional considerations for designing the research or analysis project.

Table 4.10: Example Study Design Components

Design Element	Specification Required	Risk Considerations
Sampling Strategy	Population definition, sample size calculation, selection method	Representation, power analysis, bias sources
Variable Selection	Dependent/independent variables, control variables, feature engineering	Multicollinearity, confounding, measurement validity
Statistical Methods	Specific tests, model algorithms, validation approaches	Assumption violations, multiple testing, overfitting
Effect Size Planning	Minimum detectable effects, practical significance thresholds	Statistical power, sample size adequacy

Sound study design is critical for managing bias, ensuring validity, and supporting reliable inference.

4.4.2.4 4. Metrics, Evaluation, and Interpretation Criteria

Metrics define how results are evaluated and how conclusions are drawn.

Key components include:

Performance or outcome metrics
Statistical tests or uncertainty measures
Error tolerances and confidence criteria
Predefined success or decision thresholds

Defining metrics in advance helps prevent post-hoc interpretation and strengthens analytical rigor.

Consider the elements of pre-registration such as at the Center for Open Science

4.4.2.5 Methodology Risk Assessment

Choosing methodologies and methods often requires trade offs among performance, speed, and risk.

Consider the following as you choose your methodologies and methods.

Novel/Experimental Methods: Cutting-edge techniques without established validation
Complex Ensemble Models: Multiple algorithms requiring extensive tuning and interpretation
Causal Inference Methods: Instrumental variables, difference-in-differences requiring strong assumptions
Custom Algorithm Development: Building methods from scratch rather than using established libraries

Advanced Machine Learning: Deep learning, complex feature engineering, hyper-parameter optimization
Time Series Methods: Sophisticated forecasting models, change point detection
Multivariate Statistical Methods: Factor analysis, structural equation modeling
Bayesian Methods: MCMC sampling, hierarchical models requiring specialized expertise

Standard Statistical Tests: t-tests, ANOVA, chi-square tests with established implementations
Basic Machine Learning: Linear regression, decision trees, random forests using standard libraries
Descriptive Analytics: Summary statistics, basic visualization, correlation analysis
Established Survey Methods: Validated instruments, standard sampling procedures

4.4.2.6 Summary

This four-element framework aligns closely with expectations found in journals which emphasize:

Clear scientific objectives and clearly articulated research questions
Explicit data and assumption disclosure
Well-defined analytical elements
Reproducible study design and data (see JASA Reproducibility Guidelines)
Transparent evaluation and interpretation following pre-specified metrics and evaluation criteria

Together, these elements provide a defensible and transparent foundation for rigorous research and analysis.

4.4.3 Application Design and Architecture

The application design must clearly connect functional requirements and user workflow descriptions to interface, architectural, and implementation decisions.

The following framework emphasizes clarity, traceability, and feasibility, consistent with best practices in applied software design and engineering.

4.4.3.1 1. Functional Requirements and User Workflows

Functional requirements and user workflows define the purpose, scope, and behavior of the application and constrain all downstream design decisions.

Key components include:

Clearly defined user roles and personas
Primary and secondary user workflows
Functional requirements mapped to user actions
Assumptions and scope boundaries

Examples of workflow-driven design considerations include:

Exploratory Workflows (“What information do users need to explore?”)
- Interactive filtering, summary views, flexible navigation
Operational Workflows (“What actions must users complete?”)
- Form inputs, validation, stepwise processes, confirmations
Decision-Support Workflows (“What decisions are users making?”)
- Visual prioritization, comparisons, alerts, thresholds
Administrative Workflows (“How is the system managed?”)
- Configuration panels, logging, access control

Well-defined workflows ensure the application is usable, testable, and aligned with user needs.

4.4.3.2 2. Core Application Elements

This element specifies the essential components required to support the defined workflows and functional requirements.

Key components include:

Input types (forms, file uploads, parameters, selections)
Output types (tables, visualizations, reports, notifications)
State management and data flow
Usability, accessibility, and performance assumptions

Clearly defining core application elements establishes a shared understanding of system behavior and reduces implementation ambiguity.

4.4.3.3 3. Interface Design and Interaction Strategy

The interface design describes how users interact with the application and how functionality is organized and presented.

Key components include:

Layout and navigation structure
Visual hierarchy and information prioritization
Interaction patterns and feedback mechanisms
Error handling and validation strategies

Table 4.11 provides additional considerations for interface planning.

Table 4.11: Example UI and Interaction Design Components

Design Component	Specification Required	Development Implications
Page Structure	Multi-page vs. single-page layout, navigation flow, responsive breakpoints	Frontend framework choice, routing complexity
Visual Hierarchy	Information prioritization, accessibility compliance	Styling effort, usability testing
Interaction Design	User input methods, feedback and error handling	Client-side logic, validation complexity
Data Presentation	Visualization types, filtering, export options	Charting libraries, performance considerations

Sound interface design is critical for usability, adoption, and maintainability.

4.4.3.4 4. Architecture and Implementation Strategy

Architecture decisions describe how application functionality is implemented and deployed.

Key components include:

Frontend and backend responsibilities
Data storage and persistence strategy
Performance and scalability considerations
Deployment and maintenance assumptions

Table 4.12 summarizes common architectural decision points.

Table 4.12: Example Application Architecture Decisions

Architecture Component	Technical Choices	Resource Implications
Frontend Framework	R Shiny, Python Dash, React, vanilla JavaScript	Learning curve, development time
Backend Processing	Server-side, client-side, or hybrid execution	Performance, scalability
Data Storage	In-memory, files, databases, cloud storage	Persistence, access speed, backups
Deployment Platform	Local, cloud, organizational servers	DevOps effort, cost, maintenance

Architecture choices should balance functional requirements, technical feasibility, and long-term sustainability.

4.4.3.5 Application Design Risk Assessment

Design and architecture decisions involve trade-offs among functionality, complexity, development effort, and future extensibility.

Consider the following categories when selecting tools and design patterns.

Highly Custom Interfaces: Extensive bespoke UI logic and styling
Tightly Coupled Architectures: Frontend and backend interdependencies
Unproven Frameworks: Limited documentation or community support
Real-Time or Low-Latency Requirements: Strict performance constraints

Advanced Interactivity: Complex state management, reactive updates
Scalable Architectures: Multi-user concurrency, shared resources
Integration with External Systems: APIs, authentication providers
Moderate Custom Visualization: Specialized charting or interaction patterns

Standard Layouts: Well-established navigation and page patterns
Established Frameworks: Mature tools with strong community support
Static or Semi-Static Outputs: Reports, dashboards with limited interactivity
Single-User or Read-Only Applications: Minimal concurrency concerns

4.4.3.6 Summary

This application design framework emphasizes:

Clear functional requirements and user workflows
Explicit definition of core application elements
Thoughtful interface and interaction design
Feasible and maintainable architectural choices

Together, these elements provide a structured and defensible foundation for designing applications that are usable, maintainable, and aligned with user needs.

4.4.4 Primary Tools and Technology Stack

The selection of tools and technologies must be guided by project objectives, functional requirements, and operational constraints, regardless of whether the project is primarily a research/analysis effort or an application development effort.

While specific tools may differ, the underlying selection principles remain the same: tools should enable the required work efficiently, support reproducibility and maintainability, and introduce acceptable levels of risk.

4.4.4.1 Tool Selection Framework

Tool selection should emerge from the intersection of requirements, capabilities, and constraints, not from novelty or personal preference.

Tool Selection Pitfalls

Avoid these common mistakes:

Selecting tools based on personal interest rather than project needs
Choosing cutting-edge technologies without accounting for learning curve or stability
Underestimating integration and interoperability complexity
Ignoring long-term maintenance, documentation, and support requirements

A defensible tool choice can be justified by answering:

What requirement does this tool satisfy?
What alternatives were considered?
What trade-offs or risks does this choice introduce?

4.4.4.2 Technology Stack Components

Table 4.13 illustrates common layers of a technical stack with parallel examples for research/analysis and application development contexts.

Table 4.13: Examples of a technical stack with possible tools

Stack Layer	Research / Analysis Examples	Application Development Examples	Selection Criteria
Programming Language	R, Python, SAS	R (Shiny), Python (Dash/Streamlit), JavaScript	Team expertise, library ecosystem, stakeholder expectations
Data Processing	`pandas`, `dplyr`, `data.table`	Same plus reactive or streaming processing	Data volume, transformation complexity, performance
Analysis / Modeling	`scikit-learn`, `caret`, `tidymodels`	Interactive or on-demand modeling	Methodological needs, interpretability, documentation
Visualization	`ggplot2`, `matplotlib`, `plotly`	Interactive dashboards and UI components	Interactivity needs, accessibility, customization
Execution Environment	RStudio, Jupyter, VS Code	Same plus deployment and build tools	Debugging, version control, collaboration
Deployment (if applicable)	Batch scripts, reports, pipelines	Web apps, services, cloud deployments	Audience, scalability, operational overhead

Documenting tool choices at each layer supports transparency and reproducibility while also enabling feasibility reviews.

4.4.4.3 Tool Risk Assessment

Tool choices introduce different levels of technical, operational, and project risk, independent of project type.

These risks should be evaluated explicitly.

Bleeding-Edge Tools: Recently released libraries or frameworks with limited validation
Custom Infrastructure: Building core functionality from scratch instead of using established solutions
Complex Tool chains: Many tightly coupled components requiring careful coordination
Strict Performance Constraints: Real-time or low-latency requirements with limited tolerance for failure

Specialized Libraries: Domain-specific tools with limited transferability
Cloud or Platform Dependencies: Vendor lock-in, cost uncertainty, connectivity requirements
Advanced Modeling Frameworks: Steep learning curves or complex configuration
Multi-System Integration: APIs, authentication systems, or external services

Established Libraries: Mature, well-documented tools with active communities
Standard Platforms: Widely adopted environments with proven reliability
Open Source Software: Transparent development and community support
Previously Used Tools: Technologies successfully applied in prior projects

4.4.5 Tools for Reproducibility and Maintainability

Reproducible and maintainable technical work, whether research-focused or application-oriented, requires explicit management of software environments, dependencies, and collaboration workflows.

Without appropriate processes and tools, results become difficult to reproduce, extend, or maintain over time.

Following best practices can enable reproducibility and maintainability across project types.

4.4.5.1 Dependency and Environment Management

Projects must explicitly define and manage their software environments to ensure consistent behavior across systems and over time.

Key goals include:

Reproducible package versions
Isolation from system-wide dependencies
Ease of setup for collaborators and reviewers

Table 4.14 shows several examples of tools used to help manage and control the technical environment.

Table 4.14: Common environment management approaches

Project Context	Environment Tooling	Purpose
R-based Projects	`renv`	Lock R package versions, restore environments reliably
Python-based Projects	`uv`, `venv`, `conda`	Isolate Python dependencies, manage versions
Mixed R / Python Projects	`renv` + Python environment manager	Prevent cross-language dependency conflicts
Application Deployment	Containerization (e.g., Docker)	Ensure consistent run time environments

Best practices include:

Committing lockfiles (e.g., renv.lock, uv.lock) to version control
Documenting environment setup steps
Avoiding reliance on implicit or system-installed packages

Well-managed environments reduce “it worked on my machine” failures and support long-term project viability and reproducibility.

4.4.5.2 Version Control and Collaborative Development

Version control systems are essential for reproducibility, traceability, and collaboration in both research and application development.

Projects should use a distributed version control system (e.g., Git) to track changes to:

Code
Configuration file, including hyper-parameters
Documentation
Environment definitions

Cloud-based repositories provide additional benefits, including issue tracking, code review, and automated testing.

Common platforms include:

4.4.5.3 Reproducibility and Collaboration Practices

Effective use of Git and hosted repositories supports both individual rigor and team collaboration.

Recommended practices include:

Structured Repositories
- Clear directory layout (data, code, docs, config)
- Explicit README describing project purpose and setup
Commit Discipline
- Small, meaningful commits
- Informative commit messages
- Version-controlled milestones or releases
Collaboration Workflows
- Feature branches for development
- Pull/merge requests for review
- Issue tracking for bugs and enhancements
Reproducibility Support
- Version-controlled data inputs or data access instructions
- Tagged releases corresponding to results or deliverables
- Automated checks where feasible

These practices support peer review, facilitate onboarding, and enable reliable reuse of project outputs.

Managing Environments and Code

Managing tool environments and collaboration infrastructure is a core component of a sound technical approach.

Across both research/analysis and application projects, these practices:

Enable reproducibility and transparency
Reduce maintenance and on-boarding costs
Support collaboration and review
Preserve the integrity of results and deliverables

Explicit attention to environment management and version control ensures that technical work remains reliable, extensible, and defensible over time.

4.4.5.4 Summary

A unified approach to tool and technology selection:

Applies consistent principles across research/analysis and application projects
Uses parallel examples to reflect different project contexts
Encourages explicit justification and risk awareness
Supports reproducibility, maintainability, and long-term project success

By focusing on shared concepts rather than specific tools, this framework ensures technology choices align with project goals.

4.4.6 A Workflow for the Technical Approach

Once the data strategy, methodologies, methods, and tools have been selected, the next step is to define a technical workflow that makes the relationships among these elements explicit.

The workflow demonstrates your ability to integrate all the elements of the technical approach into a concise coherent process, showing how data, methods, and tools interact over time and how work progresses from inputs to outputs.

A well-designed workflow should:

Demonstrate clear connections between data sources, analytical methods, and tools
Reflect the logical sequencing of technical tasks
Make dependencies and hand offs between steps visible
Align the technical approach with the project plan and timeline
Be converted into a graphical representation.

The workflow is not merely a process diagram, it is a concise visual argument for why the technical approach is coherent and feasible.

4.4.6.1 Workflow Design Considerations

When designing a workflow graphic, consider the following:

Phases: Distinct stages such as data ingestion, preprocessing, analysis/modeling, validation, and delivery
Data Artifacts: Where raw, intermediate, and final datasets are created and used
Methodological Transitions: How outputs of one method become inputs to the next
Tool Usage: Which tools or environments are responsible for each step
Iteration and Feedback: Where revision, validation, or retraining may occur

There is no single “correct” workflow structure; the flow should reflect the specific technical approach of the project.

4.4.6.2 Example Workflow

Figure 4.1 provides one example of a project workflow that integrates data preparation, modeling, evaluation, and deployment.

A sample workflow of five steps: data preparation, model training, model optimization, model evaluation, and deployment. — Figure 4.1: Example workflow

Design notes:

Workflow steps are explicitly numbered to support easy reference
Each step corresponds to a distinct technical phase
The flow communicates both sequence and dependency

You can create the workflow diagram using a variety of tools to include mermaid or GraphViz in Quarto

4.4.6.3 Using the Workflow as a Communication Tool

A clear workflow graphic functions as a “one-picture story” that can be used to:

Communicate the big picture of the technical approach to non-technical audiences
Support technical reviews by showing method and data dependencies
Align team members on scope, sequencing, and responsibilities
Track project status by mapping progress to workflow stages

A strong workflow graphic makes the technical approach easier to understand, evaluate, and defend.

4.4.6.4 Summary

The technical workflow:

Integrates data strategy, methodology, and tooling into a single coherent view
Makes sequencing and dependencies explicit
Supports reproducibility and project planning
Serves as a high-impact visual summary of the project’s technical approach

Students should treat the workflow diagram as a first-class deliverable that communicates not just what they did, but how and why the technical approach fits together.

4.4.7 Documenting the Rationale for the Technical Approach

Once the technical approach has been finalized, it is good practice to provide a concise rationale explaining why specific data strategies, methodologies, methods, and tools were selected.

The rationale is not a restatement of the technical approach. Instead, it is a justification layer that makes explicit how decisions were informed by requirements, constraints, and existing evidence.

The rationale should demonstrate that technical decisions were:

Requirement-driven rather than preference-driven
Informed by relevant literature or established practice
Appropriate given project constraints (data, time, risk, resources)
Internally consistent across data, methods, and tools

A strong rationale allows reviewers to understand why these choices make sense, even if alternative approaches exist.

The rationale must explicitly connect: Requirements Analysis Findings and Literature Review Evidence to Technical Decisions.

This connection should be made visible for each major element of the technical approach, including:

Data strategy
Methodological choices
Analytical or modeling methods
Tool and technology selection

4.4.7.1 Recommended Rationale Structure

Rationale statements should follow a consistent, evidence-based pattern.

Example structure:

Based on requirements analysis identifying [specific constraint or need] and literature evidence showing [proven approach, benchmark result, or established best practice], we selected [specific method, design choice, or tool] because [explicit reasoning linking requirements and evidence to the decision].

4.4.7.2 Illustrative Examples

Based on requirements analysis identifying limited labeled data and literature demonstrating strong performance of regularized linear models in low-sample settings, we selected ridge regression to reduce overfitting risk.
Based on requirements analysis indicating a need for reproducible analysis pipelines and literature emphasizing environment isolation for computational reproducibility, we adopted renv to manage R package dependencies.
Based on requirements analysis highlighting exploratory user workflows and literature supporting interactive visualization for sense-making, we selected an interactive dashboard-based interface.

4.4.7.3 Scope and Level of Detail

The rationale should be:

Concise: Focus on major decisions, not every minor implementation detail
Selective: Emphasize choices with meaningful trade-offs or alternatives
Evidence-based: Reference literature, benchmarks, or established practice where applicable

Length is less important than clarity and traceability.

4.4.7.4 Summary

Documenting the rationale for the technical approach ensures:

Technical choices are transparent and defensible
Decisions can be evaluated independently of outcomes
The project demonstrates methodological maturity and rigor

A clear rationale shows not only what technical approach was chosen, but why it was the appropriate choice given the project’s requirements and evidence base.

4.5 Responsible Data Science Review

Reviewing for considerations of responsible data science should occur throughout the data science life cycle.

It is especially important during the framing step as this shapes the technical approach which can dominate other aspects of the project.

Once the technical approach has been defined and justified, data scientists should conduct a Responsible Data Science (RDS) review to evaluate the ethical, social, and practical implications of their choices.

The RDS should not be an abstract discussion of ethics or bias. It is a structured examination of the specific data, methods, tools, and workflows used in the project, using an explicit responsibility framework.

The RDS review ensures the technical approach:

Anticipates potential harms or unintended consequences and mitigates their effects
Treats data subjects and stakeholders responsibly
Produces results that are fair, interpretable, and appropriate for use
Aligns with professional, legal, and societal expectations

A strong RDS review demonstrates how the data scientist considered technical rigor and responsibility together, not separately.

4.5.1 Framework Selection

There are multiple frameworks for responsible data science and you don’t have to look at all of them.

Selecting a recognized Responsible Data Science or AI ethics framework to structure the review helps establish credibility for the analysis.

Examples of commonly used frameworks include:

General Frameworks
- American Statistical Association (ASA): Ethical Guidelines for Statistical Practice help evaluate professional responsibility in statistical reasoning, data collection, analysis, interpretation, reproducibility, and communication of results.
- Association for Computing Machinery (ACM): ACM Code of Ethics and Professional Conduct help assess ethical responsibilities in software, algorithms, systems design, privacy protection, and accountability for deployed technical systems.
- Royal Statistical Society: A Guide for Ethical Data Science address five themes from multiple ethical frameworks that apply across data science work.
Data Focused
- FAIR Data Principles (Findable, Accessible, Interoperable, Reusable): FAIR Guiding Principles for Scientific Data Management and Stewardship help evaluate data stewardship practices, reproducibility, metadata quality, and responsible data sharing.
- CARE Principles for Indigenous Data Governance: CARE Principles for Indigenous Data Governance help when working with Indigenous data or community-associated data to ensure collective benefit, authority to control, responsibility, and ethics.

There are many other domain specific frameworks that cover topics such as human subject research, working with data about minors, or working with medical data that has privacy concerns.

The chosen framework should be briefly identified and justified based on the project context.

4.5.2 Core Dimensions to Examine

Using the selected framework, examine the technical approach across the following dimensions.

4.5.2.1 Data Responsibility

Consider how the framework applies to how data are collected, used, and managed.

Key questions could include:

Are data sources appropriate and ethically obtained?
Do data rights, consent, or licensing restrictions apply?
Could data quality or bias affect downstream outcomes?
Are privacy and confidentiality adequately protected?

4.5.2.2 Methodologogy and Methods

Consider how the framework applies to choices in methodology or methods that may introduce risk or harm.

Key questions could include:

Do chosen methods amplify bias or inequity?
Are assumptions transparent and defensible?
Are results interpretable by intended audiences?
Could model misuse or overconfidence lead to harm?

The goal is not to avoid advanced methods, but to acknowledge their implications.

4.5.2.3 Evaluation and Metrics Responsibility

Consider how the framework applies to whether evaluation criteria align with responsible use.

Key questions could include:

Do metrics reflect meaningful outcomes, not just technical performance?
Are error rates or uncertainty communicated clearly?
Are trade-offs (e.g., accuracy vs. fairness) acknowledged?
Are validation and testing procedures sufficient to detect failure modes?

Metrics shape behavior and interpretation; this makes them ethically relevant.

4.5.2.4 Deployment, Use, and Communication Context

For application or decision-support projects, consider how the framework applies to the use cases and results.

Key questions could include:

Who will use the outputs, and for what purpose?
Could results be misunderstood or misapplied?
Are limitations and appropriate use clearly communicated?
Is ongoing monitoring or review required?

Even research outputs can influence decisions and should be framed responsibly.

4.5.2.5 Identified Risks and Mitigations

The RDS review should explicitly document:

Identified ethical, fairness, privacy, or misuse risks
The likelihood and potential impact of these risks
Mitigation strategies incorporated into the technical approach

If the review identifies significant issues in the technical approach, update the technical approach and the rationale discussion.

The RDS complements the data and technical risk assessments but focuses on human and societal impact.

4.5.2.6 Summary

The Responsible Data Science review:

Applies an explicit framework to the finalized technical approach
Makes ethical and societal considerations visible and reviewable
Demonstrates professional responsibility alongside technical competence

A well-executed RDS review shows that the project is not only technically sound, but also thoughtful, defensible, and appropriate for its intended context.

4.6 Feasibility and Risk Alignment Check

Warning

An elegant technical approach that does everything you want can be a beautiful creation; it can also be a siren call to project failure.

The technical approach must be feasible within the constraints of the project plan to be useful.

Given the integrated nature of project management and solution development, the technical approach must be evaluated for acceptability, affordability, and feasibility after it has been designed and justified.

Acceptability: The technical approach must deliver a solution that meets defined acceptance criteria and stakeholder expectations.
Affordability: The technical approach must be executable within the available resources defined in the project plan.
Feasibility: The technical approach must be capable of delivering the solution within the required timeline and operational constraints.

4.6.1 Checking Feasibility Is a Reality Check

After developing the technical approach, return to the requirements analysis and project plan to verify alignment.

Ask the following questions explicitly:

Do the selected data, methods, and tools address the identified requirements and acceptance criteria? (Acceptable?)
Can the technical approach be executed within the available time, staffing, and computational resources? (Affordable?)
Can the complexity of the approach realistically deliver results on the required timeline? (Feasible?)
Have appropriate mitigation strategies been planned for the highest-risk components? (Feasible?)

This is the point where optimism must meet realism.

Identifying issues now is far less costly than discovering them mid-project.

4.6.2 Phase-Level Feasibility Analysis

For each project phase, analyze the alignment between the technical approach and planned resource allocation.

Consider:

What could go wrong with this part of the technical approach?
How likely are these problems given your constraints?
What adjustments are being made to reduce risk?
How will early warning signs be monitored during execution?

Planned Effort vs. Reality

How much time is allocated for data access, cleaning, and preparation?
Are iterative data quality discoveries accounted for?
What happens if data are messier or less complete than expected?

Technical Approach Alignment

Do preprocessing methods match the actual data complexity?
Are selected tools appropriate for data volume and format?
Is sufficient time allocated for exploratory analysis?

Planned Effort vs. Reality

How much time is allocated for method learning curves?
Are hyper-parameter tuning, validation, and robustness checks included?
Is sufficient time allocated for interpretation and documentation?

Technical Approach Alignment

Do selected methods match the team’s demonstrated skill level?
Are tools mature and well-supported, or experimental?
Is iteration explicitly planned rather than assumed away?

Planned Effort vs. Reality

How much time is allocated for UI/UX design and implementation?
Are testing, debugging, and revision cycles included?
Is time allocated for user feedback and refinement?

Technical Approach Alignment

Do framework choices align with development experience?
Is the focus on a minimum viable product or unnecessary feature expansion?
Are deployment and integration tasks explicitly planned?

4.6.3 Risk Assessment Integration

Common Risk Blind Spots

Technical Overconfidence: “I’ll figure it out as I go”
Timeline Optimism: “Everything will work the first time”
Scope Creep Denial: “The requirements won’t change”
Resource Assumptions: “Help will be available when needed”

Table 4.15 provides some examples of risks that affect overall project feasibility and possible mitigation strategies.

Table 4.15: Common areas for feasibility risks in projects.

Risk Category	Common Issues	Mitigation Strategies
Technical Risks	New tools, complex methods, integration challenges	Early proof-of-concept, fallback approaches
Data Risks	Access delays, quality issues, format instability	Early data exploration, backup data sources
Timeline Risks	Learning curves, debugging delays, scope expansion	Buffer time, phased delivery, scope negotiation
Resource Risks	Skill gaps, infrastructure limits, support constraints	Training plans, alternative tools, consultation

4.6.4 Adjustment Strategies

When feasibility analysis reveals problems in acceptability, affordability or feasibility, you have to adjust as in Figure 4.2.

graph TD
    A[PWS Requirements] --> B[Requirements Analysis]
    B --> C[Methods Selection]
    C --> D[Resource Assessment]
    D --> E[Feasibility Check]
    E -->|Feasible| F[Technical Approach]
    E -->|Not Feasible| G[Performance Adjustment]
    G --> B
    E -->|Not Feasible| H[Method Simplification]
    H --> C
    E -->|Not Feasible| I[Resource Negotiation]
    I --> D

Figure 4.2: Checking Project

Here are possible options for adjusting the project.

All of these require Project Manager approval.
Some of these (e.g., adjusting requirements, acceptance criteria or schedule delays) require collaboration and approval from the client.
Changes in budgets or resources may require client approval as well depending upon the contract type.

4.6.4.1 Option 1: Scope Adjustment

Reduce deliverable complexity
Focus on core requirements
Plan phased implementation

4.6.4.2 Option 2: Method Simplification

Choose more familiar techniques
Reduce analytical sophistication
Prioritize reliable over optimal

4.6.4.3 Option 3: Resource Reallocation

Extend timeline if possible
Seek additional expertise
Reduce other project commitments

4.6.4.4 Option 4: PWS Renegotiation

Communicate constraints early
Propose alternative success metrics
Document trade-off implications

Project Reality Triangle

You cannot optimize all three simultaneously:

Performance: What is the balance between meeting the acceptance criteria and going beyond the requirement or client’s expectations. “If the minimum performance to meet the acceptance criteria were not”good enough”, they would have set different acceptance criteria.”
Schedule: Is there benefit to a sophisticated or complicated approach that might delay delivery for a small increase in performance?
Resources: How much time/expertise is realistically available? Can you trade off a more expensive person for less expensive without undue risk?

Your technical approach must make explicit trade-offs between these factors.

After making any necessary adjustments, assess your feasibility analysis should conclude with a clear statement:

Example Feasibility Conclusions

Strong Feasibility: “Based on this analysis, the proposed technical approach can deliver PWS requirements within the planned timeline and resource allocation, with acceptable risk levels managed through identified mitigation strategies.”

Conditional Feasibility: “The technical approach is feasible with the following adjustments: [specific changes to scope/timeline/methods], which maintain PWS core objectives while ensuring realistic delivery expectations.”

Challenged Feasibility: “Analysis reveals significant feasibility concerns requiring PWS renegotiation around [specific requirements] to align expectations with available resources and realistic technical constraints.”

4.6.5 Summary

This final feasibility and risk review ensures the technical approach is not only conceptually sound, but also:

Deliverable within project constraints
Aligned with requirements and acceptance criteria
Realistic in terms of effort, complexity, and risk

Tip

Feasibility analysis is an ongoing process throughout project execution. The discipline of systematic thinking about requirements, constraints, and feasibility should inform decision-making at every project phase, not just the initial framing of the project.