AU Department of Mathematics and Statistics Colloquium
2024-01-30
Big Data and Artificial Intelligence (AI) surrounds us:
What is our role in preparing our graduates and students for the changing world?
How do we/should we inculcate ideas and practices of “Responsible Data Science”?
There are books, articles, and talks on “Responsible Data Science” but no standard definition.
We have defined Responsible Data Science (DS) in the context of a DS Life Cycle.
The goal for today is to have a “Conversation.”
How are we teaching Responsible Data Science?
Is what we are doing sufficient, useful? What else could/should we cover?
Is there a place for a Departmental strategy on teaching ethical practices?
Legal considerations address the criminal/civil risks for violating laws.
Laws (statutes) specify permissible and/or impermissible activities as well as potential punishments (judicial, e.g., imprisonment).
Policies (regulations) provide guidance on implementing activities (within legal constraints) along with adjudication procedures and potential punishments (non-judicial, e.g., debarment).
Legal issues in big data include how you gather, protect, and share data, and increasingly, how you use it.
Laws are “local” not universal: Pirates or Privateers
Professional considerations address the risk of activities related to organizations with which you affiliate.
Ethical considerations arise when asking
Ethical Choices can be hard, especially when choices may require violating a law, regulation, and/or professional guideline.
Individual principles and cultural norms shape options and guide choices in complex situations.
Often, there is no universally-accepted or even a good “right answer”.
May have to choose between two bad outcomes.
May have to choose between individual and group outcomes.
Ethical choices can lead to feelings of guilt, group reprobation, civil action (torts), or criminal charges.
Not a new issue - goes back decades. However, the explosive growth of AI systems to support and even make decisions is generating concerns.
Higher error rates in classifying the gender of darker-skinned women than for lighter-skinned men (O’Brien 2019)
Big Data used to generate unregulated e-scores in lieu of FICO scores for Credit in Lending (Bracey and Moeller 2018)
What Can You Do? What Should You Do?
Get Data: How was it collected? Was informed consent required/given? Is there balanced representation? Selection Bias? Availability Bias? Survivorship Bias?
Shape Data: Are we aggregating distinct groups? How do we treat missing data? Are we separating training and testing data?
Model and Analyze: How are we documenting assumptions, treating extreme values, or checking over-fitting? Are we checking multiple fairness and performance metrics?
Communicate Results: Are the graphs misleading? Did we cherry pick or data snoop? Are we reporting \(p\)-values and hyper-parameters?
Deploy/Implement: Is the deployment accessible to all?
Observe Outcomes: Can we check assumptions and analyze outcomes for bias?
Published by the Royal Statistical Society and the Institute and Faculty of Actuaries in A Guide for Ethical Data Science
Department of Defense AI Capabilities shall be:
Responsible … exercise appropriate levels of judgment and care, while remaining responsible for the development, deployment, and use….
Equitable … take deliberate steps to minimize unintended bias in AI capabilities.
Traceable … develop and deploy AI capabilities such that relevant personnel have an appropriate understanding …, including with transparent and auditable methodologies, data sources, and design procedures and documentation.
Reliable … AI capabilities will have explicit, well-defined uses, and the safety, security, and effectiveness … will be subject to testing and assurance within those defined uses ….
Governable … design and engineer AI capabilities to fulfill their intended functions while possessing the ability to detect and avoid unintended consequences, and the ability to disengage or deactivate deployed systems that demonstrate unintended behavior.
(DoD 2020)
AI Principles
We will not design or deploy AI in the following application areas: Weapons, Surveillance, …
IBM Principles for Trust and Transparency
(IBM 2019)
A guide to Building “Trustworthy” Data Products
Based on the golden rule: Treat others’ data as you would have them treat your data
Consent - Get permission from the owners or subjects of the data before …
Clarity - Ensure permission is based on a clear understanding of the extent of your intended usage
Consistency - Build trust by ensuring third parties adhere to your standards/agreements
Control (and Transparency) - Respond to data subject requests for access/modification/deletion, e.g., the right to be forgotten
Consequences (and Harm) - Consider how your usage may affect others in society and potential unintended applications.
Integrate ethical decision making into your environment.
As Davy Crockett might say, “Try to be sure you are right, then Go Ahead!”
“Davy Crockett” (2024)
:::
Should we incorporate into more Assignments and Projects?
Recent Papers/Examples/Case Studies?
Ideas for In-Class Exercises?
Other Teaching Approaches?
Math, Stat , and DS professions have common ground in ethical guidelines.
AU has a University-level learning outcome for Ethics covering all programs.
All undergraduates take a course in “Ethical Reasoning” with four learning outcomes.
Should we create a Math-Stat Department course in Ethical Reasoning?
Thank you for your contributions to the conversation on Teaching Responsible Data Science!
We have made progress over the past few years, but have a ways to go.
As the closing of the Data 413/613 module states…,
When it comes to Responsible Data Science, Jane Addams reminds us that just thinking about ethics is not enough, …
“Action indeed is the sole medium of expression for ethics.”