An Introduction to Data Science and Artificial Intelligence
Author
Michael Robinson and Richard Ressler
Published
May 23, 2025
Keywords
data science, R Progamming, Tidyverse, Artifical Intelligence
Preface
These lecture notes support a Mini-Course called: An Introduction to Data Science and Artificial Intelligence.
The notes are based on the R Statistical programming language (Team 2018) and the {tidyverse} package (Wickham et al. 2019). The notes use many other R packages.
- See Appendix B — Environment and R Packages for a more detailed description of the environment and the packages with their versions.
- This work was produced using Quarto “Quarto” (2025) from Posit. You can choose to read the notes in “dark mode” by using the toggle under the Title in the left margin.
Course Survey
When you are ready please provide your feedback by completing a short survey.
or at
Please send any additional corrections or recommendations to rressler@american.edu.
Copyright and License
This work is copyrighted by Michael Robinson and Richard Ressler 2025.
This work is licensed under Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0).
ASA, American Statistical Association. 1983. “Auto Data Set from the 1983 Exposition of Statistical Graphics Technology.” https://lib.stat.cmu.edu/data-expo/1983.html.
Barret, Tyson, Matt Dowle, Arun Srinivasan, Jan Goreck, Michael Chirico, Toby Hocking, Benjamin Schewndinger, and Ivan Krylob. 2025. “Data.table: Extension of ‘Data.frame‘.” https://r-datatable.com.
Bartley, Kevin. 2024. “Data Statistics (2025) - How Much Data Is There in the World?” Rivery. https://rivery.io/blog/big-data-statistics-how-much-data-is-there-in-the-world/.
Davenport, Thomas H., and D J. Patil. 2012. “Data Scientist: The Sexiest Job of the 21st Century.” Harvard Business Review October 2012: 70–76. http://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century/ar/1.
Firke, Sam. 2024. “Janitor: Simple Tools for Examining and Cleaning Dirty Data.” https://sfirke.github.io/janitor/.
Fox, John, and Sanford Weisberg. 2019. An {}R{} Companion to Applied Regression. Third. Sage. https://www.john-fox.ca/Companion/.
Friedman, Jerome, Trevor Hastie, and Robert Tibshirani. 2010. “Regularization Paths for Generalized Linear Models via Coordinate Descent{}.” Journal of Statistical Software 33 (1): 1–22. https://doi.org/10.18637/jss.v033.i01.
Godsey, Brian. 2017. Think Like a Data Scientist. Manning.
Grolemund, Garrett, and Hadley Wickham. 2011. “Lubridate: Dates and Times Made Easy with {Lubridate}.” Journal of Statistical Software 40 (3): 1–25. https://www.jstatsoft.org/v40/i03/.
———. n.d. R for Data Science. https://r4ds.had.co.nz/.
Horst, Allison, Alison Presmanes Hill, and Kristen B Gorman. 2020. “Palmerpenguins R Data Package.” https://doi.org/10.5281/zenodo.3960218.
Ian Fellows. 2022. “Wordcloud: Word Clouds.” Comprehensive R Archive Network. https://doi.org/10.32614/CRAN.package.wordcloud.
Institute of Statistics (INSTAT). 2025. “Statistical Database.” https://databaza.instat.gov.al:8083/pxweb/en/DST/.
Institute, QLB Brain. 2017. “Stunning Neuroscience Images.” https://qbi.uq.edu.au/blog/2017/07/stunning-neuroscience-images.
James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2023. An Introduction to Statistical Learning with Applications in R. 2nd ed. Springer. https://www.statlearning.com.
Kalinowski, Tomasz et al. 2022. “TensorFlow for R.” https://tensorflow.rstudio.com/.
Kalinowski, Tomasz, JJ Allaire, and François Chollet. 2025. “Keras3: R Interface to Keras.” https://keras3.posit.co/.
Kanwal, Maxinder S., Avinash S. Ramesh, and Lauren A. Huang. 2013. “A Novel Pseudoderivative-Based Mutation Operator for Real-Coded Adaptive Genetic Algorithms.” F1000Research. https://doi.org/10.12688/f1000research.2-139.v2.
Kaplan, Daniel, and Randall Pruim. 2023. “Ggformula: Formula Interface to the Grammar of Graphics.” https://www.mosaic-web.org/ggformula/.
Karatzoglou, Alexandros, Alex Smola, Kurt Hornik, National ICT Australia (NICTA), Michael A. Maniscalco, and Choon Hui Teo. 2024. “Kernlab: Kernel-Based Machine Learning Lab.” https://CRAN.R-project.org/package=kernlab.
Kelechava, Brad. 2018. “The SQL Standard - ISO/IEC 9075:2023 (ANSI X3.135).” The ANSI Blog. https://blog.ansi.org/sql-standard-iso-iec-9075-2023-ansi-x3-135/.
Lin, Hause, and Tawab Safi. 2025. “Ollamar: An R Package for Running Large Language Models.” Journal of Open Source Software. https://doi.org/10.21105/joss.07211.
Myer, David. 2024. “E1071: Misc Functions of the Department of Statistics, Probability Theory Group.” https://cran.r-project.org/web/packages/e1071.
Neuroscientifically Challenged. 2014. “2-Minute Neuroscience: The Neuron.” https://www.youtube.com/watch?v=6qS83wD29PY.
Nohe, Patrick. 2017. “What Is the Dark Web?” https://www.thesslstore.com/blog/what-is-the-dark-web/.
“Ollama.” 2025. https://ollama.com.
“Open Data Inventory—Global Index of Open Data - Open Data Inventory.” n.d. Accessed May 8, 2025. https://odin.opendatawatch.com/.
Posit. 2025a. “Posit Cheatsheets.” https://posit.co/resources/cheatsheets/.
———. 2025b. “RStudio IDE User Guide.” RStudio User Guide. https://docs.posit.co/ide/user/.
“Quarto.” 2025. Posit.co. https://quarto.org/.
Rashida048. 2020. “Machine Learning: Gradient Descent Concept – Regenerative.” https://regenerativetoday.com/machine-learning-gradient-descent-concept/.
Robinson, Julia Silge and David. n.d. Text Mining with R. Accessed April 12, 2020. https://www.tidytextmining.com/.
Ruman. 2023. “Convex Vs. Non-Convex Functions: Why It Matters in Optimization for Machine Learning.” Medium. https://rumn.medium.com/convex-vs-non-convex-functions-why-it-matters-in-optimization-for-machine-learning-39cd9427dfcc.
Silge, Julia, and David Robinson. 2022. Tidytext: Text Mining with R. O’Reilly Media, Inc. https://www.tidytextmining.com/.
———. 2023. “Tidytext: Text Mining Using Tidy Tools.” https://juliasilge.github.io/tidytext/.
Sjoberg, Daniel D., Karissa Whiting, Michael Curry, Jessica A. Lavery, and Joseph Lamarange. 2021. “Reproducible Summary Tables with the Gtsummary Package.” The R Journal 13 (1): 570–80. https://doi.org/10.32614/RJ-2021-053.
Team, R Core. 2018. “R: A Language and Environment for Statistical Computing.” R Foundation for Statistical Computing. https://www.R-project.org/.
Ushey, Kevin, JJ Allaire, and Yuan Tang. 2023. “Reticulate: R Interface to Python.” https://rstudio.github.io/reticulate/.
van Rossum, Guido. 2025. “Python.” https://www.python.org/.
Waring, Elin, Michael Quinn, Amelia McNamara, Eduardo Arino d la RUbia, Shu Hao, and Shannon Ellis. 2025. “Skimr: Compact and Flexible Summaries of Data.” https://docs.ropensci.org/skimr/.
Wickham, Hadley. 2016. Ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. https://ggplot2.tidyverse.org/.
———. 2023a. “Forcats: Tools for Working with Categorical Variables (Factors).” https://forcats.tidyverse.org/.
———. 2023b. “Modelr: Modelling Functions That Work with the Pipe.” https://modelr.tidyverse.org/.
———. 2023c. “Stringr: Simple, Consistent Wrappers for Common String Operations.” https://stringr.tidyverse.org/.
Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain Francois, Garret Grolemund, et al. 2019. “Welcome to the {Tidyverse}.” Journal of Open Source Software, no. 43: 1686. https://doi.org/10.21105/joss.01686.
Wickham, Hadley, Mine Cetinkaya-Rundel, and Garrett Grolemund. 2023. R for Data Science (2e). O’Reilly Media, Inc. https://r4ds.hadley.nz/.
Wickham, Hadley, Romain Francois, Lionel Henry, Kirill Muller, and Davis Vaughan. 2023. “Dplyr: A Grammar of Data Manipulation.” https://dplyr.tidyverse.org.
Wickham, Hadley, Jim Hester, and Jennifer Bryan. 2024. “Readr: Read Rectangular Text Data.” https://readr.tidyverse.org/.
Wickham, Hadley, Davis Vaughan, and Maximilian Girlich. 2023. “Tidyr: Tidy Messy Data.” https://tidyr.tidyverse.org.
Wikipedia. 2025. “Data Science.” Wikipedia. https://en.wikipedia.org/wiki/Data_science.