Considerations for Responsible Data Science
American University
1/22/23
By the end of this discussion, you should be able to:
Laws (statutes) specify permissible and/or impermissible activities as well as potential punishments- (judicial, e.g., imprisonment)
Policies (regulations) provide guidance on implementing activities (within legal constraints) along with adjudication procedures and potential punishments - (non-judicial, e.g., debarment)
Legal issues in big data include how you gather, protect, and share data, and increasingly, how you use it.
Laws are “local” not universal: Pirates or Privateers
Laws do not keep pace with technology and can be difficult to interpret.
Ethical considerations arise when asking
“What should I do?”
“What is Right”?
Ethical Choices can be hard, especially when choices may require violating a law, regulation, or professional guideline.
Individual principles and cultural norms shape options and guide choices in complex situations.
Often, there is no universally-accepted or even a good “right answer”.
May have to choose between two bad outcomes as the The Trolley Problem has many analogs. (Merriam-Webster 2021)
May have to choose between individual and group outcomes.
Ethical choices can lead to feelings of guilt, group reprobation, civil action (torts), or criminal charges.
What will produce the most good and do the least harm? (Utilitarian)
What respects the rights of everyone affected by the decision? (Rights)
What treats people equally or proportionately? (Justice)
What serves the entire community, not just some members? (Common Good)
What leads me to act as the sort of person I want to be? (Virtues)
These are the source of “Unconscious Biases” or “Implicit Biases”.
Under stress, we tend to bypass the higher-level cognitive centers that evolved later and take more time to reason.
Humans get comfortable with patterns which can lead to systematic deviations from making rational judgments.
Ethical Challenges can arise from our own implicit biases or the implicit biases of others affecting our data, thoughts, and actions.
Not a new issue - goes back decades.
However, the explosion growth of AI/Machine Learning systems to support and even implement decisions is generating concerns in multiple fields.
Are algorithms really less biased than people?
How can you tell with “black box” models?
What are the trade offs among accuracy, explainability, and transparency?
Bias in the (training) data (historical, sampling, …) can drive biased outcomes.
Algorithms find “hidden” relationships among proxy variables that can distort the interpretations.
When is it ethical to use Machine Learning or other Big Data systems?
Active area for research and publication. Two examples:
Higher error rates in classifying the gender of darker-skinned women than for lighter-skinned men (O’Brien 2019)
Big Data used to generate unregulated e-scores in lieu of FICO scores for Credit in Lending (Bracey and Moeller 2018)
Contradictions and competition among legal, professional, and ethical guidelines
Using biased data (even unknowingly) or eliminating extreme values or small groups
Using Proxies for “Protected” Attributes (even unknowingly)
Protection of Intellectual Property versus Explainability, Transparency and Accountability
Law of Unintended Consequences - people will use your products and solutions in “creative” ways that
Ask a question: Is question about equity or equality? Who are the stakeholders? What are the trade offs across groups? What are our interests? Recency Bias? Confirmation Bias?
Frame the Analysis: What is the population? Are we using proxy variables? How are metrics for “fairness” affecting groups or individuals? Do we need IRB Review(APA 2022)?
Get Data: How was it collected? Was consent required/given for this use? Is there balanced representation of the population? Selection Bias? Availability Bias? Survivorship Bias?
Shape Data: Are we aggregating distinct groups? How do we treat missing data? Are we separating training and testing data?
Model and Analyze: Are we using proxies or over-fitting? How do we treat extreme values? Are we examining assumptions? Are we checking multiple performance metrics? Is our work reproducible?
Communicate Results: Are the graphs misleading? Do we cherry pick or data snoop? Are we reporting \(p\)-values and hyper-parameters?
Deploy/Implement: Is the deployment accessible to all?
Observe Outcomes: How can we validate assumptions and analyze outcomes for bias?
Published by the Royal Statistical Society and the Institute and Faculty of Actuaries in A Guide for Ethical Data Science
DoD AI Capabilities shall be:
Responsible. … exercise appropriate levels of judgment and care, while remaining responsible for the development, deployment, and use….
Equitable. … take deliberate steps to minimize unintended bias in AI capabilities.
Traceable. … develop and deploy AI capabilities such that relevant personnel have an appropriate understanding …, including with transparent and auditable methodologies, data sources, and design procedures and documentation.
Reliable. … AI capabilities will have explicit, well-defined uses, and the safety, security, and effectiveness … will be subject to testing and assurance within those defined uses ….
Governable. … design and engineer AI capabilities to fulfill their intended functions while possessing the ability to detect and avoid unintended consequences, and the ability to disengage or deactivate deployed systems that demonstrate unintended behavior.
(DoD 2020)
A guide to Building “Trustworthy” Data Products
Based on the golden rule: Treat others’ data as you would have them treat your data
Consent - Get permission from the owners or subjects of the data before …
Clarity - Ensure permission is based on a clear understanding of the extent of your intended usage
Consistency - Build trust by ensuring third parties adhere to your standards/agreements
Control (and Transparency) - Respond to data subject requests for access/modification/deletion, e.g., the right to be forgotten
Consequences (and Harm) - Consider how your usage may affect others in society and potential unintended applications.
We will not design or deploy AI in the following application areas: Weapons, Surveillance, …
(IBM 2019)
Integrate ethical decision making into your analysis life cycle.
To paraphrase American Frontiersman Davy Crockett,
Going forward, analyze data science work from legal, professional, and ethical perspectives and examine the choices you and others make while learning more about responsible data science.
“We must address, individually and collectively, moral and ethical issues raised by cutting-edge research in artificial intelligence and biotechnology, which will enable significant life extension, designer babies, and memory extraction.” - Klaus Schwab
“Action indeed is the sole medium of expression for ethics.” - Jane Adams