AU Winter Institute in Data Science
2024-01-09
Privacy, security, and ethics all fall within the realm of Responsible Data Science.
We will discuss Responsible Data Science through four topics:
Laws (statutes) specify permissible and/or impermissible activities as well as potential punishments (judicial, e.g., imprisonment).
Policies (regulations) provide guidance on implementing activities (within legal constraints) along with adjudication procedures and potential punishments (non-judicial, e.g., debarment).
Legal issues in big data include how you gather, protect, and share data, and increasingly, how you use it.
Ethical considerations arise when asking
What should I do?
What is the “right” or “moral” thing to do?
Ethical Choices can be hard, especially when choices may require violating a law, regulation, and/or professional guideline.
Individual principles and cultural norms shape options and guide choices in complex situations.
Often, there is no universally-accepted or even a good “right answer”.
May have to choose between two bad outcomes.
May have to choose between individual and group outcomes.
Ethical choices can lead to feelings of guilt, group reprobation, civil action (torts), or criminal charges.
These are the source of “Unconscious Biases” or “Implicit Biases”.
Under stress, we tend to bypass the higher-level cognitive centers that evolved later and take more time to reason.
Humans get comfortable with patterns which can lead to systematic deviations from making rational judgments.
Ethical Challenges can arise from our own implicit biases or the implicit biases of others affecting our data, thoughts, and actions.
Not a new issue - goes back decades. However, the explosive growth of AI systems to support and even make decisions is generating concerns.
Bias in the (training) data (historical, sampling, …) can drive biased outcomes.
Are algorithms really less biased than people? It depends …
Active area for research and publication.
Higher error rates in classifying the gender of darker-skinned women than for lighter-skinned men (O’Brien 2019)
Big Data used to generate unregulated e-scores in lieu of FICO scores for Credit in Lending (Bracey and Moeller 2018)
Contradictions and competition among legal, professional, and ethical guidelines.
Using biased data (even unknowingly)
Eliminating extreme values or combining small groups.
Using Proxies for “Protected” Attributes (even unknowingly).
Protection of Intellectual Property versus Explainability, Transparency and Accountability
Law of Unintended Consequences - people will use your products and solutions in “creative” ways that
What Can You Do? What Should You Do?
Get Data: How was it collected? Was informed consent required/given? Is there balanced representation? Selection Bias? Availability Bias? Survivorship Bias?
Shape Data: Are we aggregating distinct groups? How do we treat missing data? Are we separating training and testing data?
Model and Analyze: How are we documenting assumptions, treating extreme values, or checking over-fitting? Are we checking multiple fairness and performance metrics?
Communicate Results: Are the graphs misleading? Did we cherry pick or data snoop? Are we reporting \(p\)-values and hyper-parameters?
Deploy/Implement: Is the deployment accessible to all?
Observe Outcomes: Can we check assumptions and analyze outcomes for bias?
Published by the Royal Statistical Society and the Institute and Faculty of Actuaries in A Guide for Ethical Data Science
Department of Defense AI Capabilities shall be:
Responsible … exercise appropriate levels of judgment and care, while remaining responsible for the development, deployment, and use….
Equitable … take deliberate steps to minimize unintended bias in AI capabilities.
Traceable … develop and deploy AI capabilities such that relevant personnel have an appropriate understanding …, including with transparent and auditable methodologies, data sources, and design procedures and documentation.
Reliable … AI capabilities will have explicit, well-defined uses, and the safety, security, and effectiveness … will be subject to testing and assurance within those defined uses ….
Governable … design and engineer AI capabilities to fulfill their intended functions while possessing the ability to detect and avoid unintended consequences, and the ability to disengage or deactivate deployed systems that demonstrate unintended behavior.
(DoD 2020)
AI Principles
We will not design or deploy AI in the following application areas: Weapons, Surveillance, …
IBM Principles for Trust and Transparency
(IBM 2019)
A guide to Building “Trustworthy” Data Products
Based on the golden rule: Treat others’ data as you would have them treat your data
Consent - Get permission from the owners or subjects of the data before …
Clarity - Ensure permission is based on a clear understanding of the extent of your intended usage
Consistency - Build trust by ensuring third parties adhere to your standards/agreements
Control (and Transparency) - Respond to data subject requests for access/modification/deletion, e.g., the right to be forgotten
Consequences (and Harm) - Consider how your usage may affect others in society and potential unintended applications.
Integrate ethical decision making into your environment.
As Davy Crockett might say, “Try to be sure you are right, then Go Ahead!”
“Davy Crockett” (2024)
After completing this module you should now be able to demonstrate the LOs:
You should also have greater competence in considering choices that practice and promote responsible data science today, and in the future.
Finally, when it comes to responsible Data Science, Jane Addams reminds us that just thinking about ethics is not enough, …
“Action indeed is the sole medium of expression for ethics.”