An Introduction to Data Science and Artificial Intelligence
Author
Michael Robinson and Richard Ressler
Published
May 20, 2026
Keywords
data science, R Progamming, Tidyverse, Artifical Intelligence
Preface
These lecture notes support a one-week course called: An Introduction to Data Science and Artificial Intelligence.
The notes are based on the R Statistical programming language (Team 2018) and the {tidyverse} package (Wickham et al. 2019). The notes use many other R packages.
- See Appendix B — Environment and R Packages for a more detailed description of the environment and the packages with their versions.
- This work was produced using Quarto Quarto (2025) from Posit. You can choose to read the notes in “dark mode” by using the toggle under the Title in the left margin.
The notes are also based on many references which are listed in Appendix A — References.
Please send any corrections or recommendations to datascience@american.edu
Copyright and License
This work is copyrighted by Michael Robinson and Richard Ressler 2026.
This work is licensed under Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0).
ASA, American Statistical Association. 1983. Auto Data Set from the 1983 Exposition of Statistical Graphics Technology. https://lib.stat.cmu.edu/data-expo/1983.html.
Barret, Tyson, Matt Dowle, Arun Srinivasan, et al. 2025. Data.table: Extension of ‘Data.frame‘. https://r-datatable.com.
Bartley, Kevin. 2024. “Data Statistics (2025) - How Much Data Is There in the World?” In Rivery. https://rivery.io/blog/big-data-statistics-how-much-data-is-there-in-the-world/.
Davenport, Thomas H., and D J. Patil. 2012. “Data Scientist: The Sexiest Job of the 21st Century.” Harvard Business Review October 2012: 70–76. http://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century/ar/1.
Firke, Sam. 2024. Janitor: Simple Tools for Examining and Cleaning Dirty Data. https://sfirke.github.io/janitor/.
Fox, John, and Sanford Weisberg. 2019. An Companion to Applied Regression. Third. Sage. https://www.john-fox.ca/Companion/.
Friedman, Jerome, Trevor Hastie, and Robert Tibshirani. 2010. “Regularization Paths for Generalized Linear Models via Coordinate Descent.” Journal of Statistical Software 33 (1): 1–22. https://doi.org/10.18637/jss.v033.i01.
Godsey, Brian. 2017. Think Like a Data Scientist. Manning.
Grolemund, Garrett, and Hadley Wickham. 2011. “Lubridate: Dates and Times Made Easy with Lubridate.” Journal of Statistical Software 40 (3): 1–25. https://www.jstatsoft.org/v40/i03/.
Grolemund, Garrett, and Hadley Wickham. n.d. R for Data Science. https://r4ds.had.co.nz/.
Horst, Allison, Alison Presmanes Hill, and Kristen B Gorman. 2020. Palmerpenguins R Data Package. https://doi.org/10.5281/zenodo.3960218.
Ian Fellows. 2022. Wordcloud: Word Clouds. Comprehensive R Archive Network. https://doi.org/10.32614/CRAN.package.wordcloud.
Institute of Statistics (INSTAT). 2025. Statistical Database. https://databaza.instat.gov.al:8083/pxweb/en/DST/.
Institute, QLB Brain. 2017. Stunning Neuroscience Images. https://qbi.uq.edu.au/blog/2017/07/stunning-neuroscience-images.
James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2023. An Introduction to Statistical Learning with Applications in R. 2nd ed. Springer. https://www.statlearning.com.
Kalinowski, Tomasz, JJ Allaire, and François Chollet. 2025. Keras3: R Interface to Keras. https://keras3.posit.co/.
Kanwal, Maxinder S., Avinash S. Ramesh, and Lauren A. Huang. 2013. A Novel Pseudoderivative-Based Mutation Operator for Real-Coded Adaptive Genetic Algorithms. 2:139. F1000Research. https://doi.org/10.12688/f1000research.2-139.v2.
Kaplan, Daniel, and Randall Pruim. 2023. Ggformula: Formula Interface to the Grammar of Graphics. https://www.mosaic-web.org/ggformula/.
Karatzoglou, Alexandros, Alex Smola, Kurt Hornik, National ICT Australia (NICTA), Michael A. Maniscalco, and Choon Hui Teo. 2024. Kernlab: Kernel-Based Machine Learning Lab. https://CRAN.R-project.org/package=kernlab.
Kelechava, Brad. 2018. “The SQL Standard - ISO/IEC 9075:2023 (ANSI X3.135).” In The ANSI Blog. https://blog.ansi.org/sql-standard-iso-iec-9075-2023-ansi-x3-135/.
Lin, Hause, and Tawab Safi. 2025. “Ollamar: An R Package for Running Large Language Models.” Journal of Open Source Software, ahead of print. https://doi.org/10.21105/joss.07211.
Myer, David. 2024. E1071: Misc Functions of the Department of Statistics, Probability Theory Group. https://cran.r-project.org/web/packages/e1071.
Nohe, Patrick. 2017. What Is the Dark Web? https://www.thesslstore.com/blog/what-is-the-dark-web/.
Ollama. 2025. https://ollama.com.
Ooms, Jeroen. 2014. “Jsonlite: The Jsonlite Package: A Practical and Consistent Mapping Between JSON Data and R Objects.” arXiv:1403.2805 [Stat.CO]. https://arxiv.org/abs/1403.2805.
Open Data Inventory—Global Index of Open Data - Open Data Inventory. n.d. Accessed May 8, 2025. https://odin.opendatawatch.com/.
Posit. 2025a. Posit Cheatsheets. https://posit.co/resources/cheatsheets/.
Posit. 2025b. “RStudio IDE User Guide.” In RStudio User Guide. https://docs.posit.co/ide/user/.
Quarto. 2025. Posit.co. https://quarto.org/.
Rashida048. 2020. Machine Learning: Gradient Descent Concept – Regenerative. https://regenerativetoday.com/machine-learning-gradient-descent-concept/.
Robinson, Julia Silge and David. n.d. Text Mining with R. Accessed April 12, 2020. https://www.tidytextmining.com/.
Ruman. 2023. “Convex Vs. Non-Convex Functions: Why It Matters in Optimization for Machine Learning.” In Medium. https://rumn.medium.com/convex-vs-non-convex-functions-why-it-matters-in-optimization-for-machine-learning-39cd9427dfcc.
Silge, Julia, and David Robinson. 2023. Tidytext: Text Mining Using Tidy Tools. https://juliasilge.github.io/tidytext/.
Silge, Julia, and David Robinson. 2025. Tidytext: Text Mining with R. O’Reilly Media, Inc. https://www.tidytextmining.com/.
Sjoberg, Daniel D., Karissa Whiting, Michael Curry, Jessica A. Lavery, and Joseph Lamarange. 2021. “Reproducible Summary Tables with the Gtsummary Package.” The R Journal 13 (1): 570–80. https://doi.org/10.32614/RJ-2021-053.
Team, R Core. 2018. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. https://www.R-project.org/.
Ushey, Kevin, JJ Allaire, and Yuan Tang. 2023. Reticulate: R Interface to Python. https://rstudio.github.io/reticulate/.
van Rossum, Guido. 2025. Python. https://www.python.org/.
Waring, Elin, Michael Quinn, Amelia McNamara, Eduardo Arino d la RUbia, Shu Hao, and Shannon Ellis. 2025. Skimr: Compact and Flexible Summaries of Data. https://docs.ropensci.org/skimr/.
Wickham, Hadley. 2016. Ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. https://ggplot2.tidyverse.org/.
Wickham, Hadley. 2023a. Forcats: Tools for Working with Categorical Variables (Factors). https://forcats.tidyverse.org/.
Wickham, Hadley. 2023b. Modelr: Modelling Functions That Work with the Pipe. https://modelr.tidyverse.org/.
Wickham, Hadley. 2023c. Stringr: Simple, Consistent Wrappers for Common String Operations. https://stringr.tidyverse.org/.
Wickham, Hadley, Mara Averick, Jennifer Bryan, et al. 2019. “Welcome to the Tidyverse.” Journal of Open Source Software, no. 43: 1686. https://doi.org/10.21105/joss.01686.
Wickham, Hadley, and Jennifer Bryan. 2023. Readxl: Read Excel Files. https://readxl.tidyverse.org.
Wickham, Hadley, Mine Cetinkaya-Rundel, and Garrett Grolemund. 2023. R for Data Science (2e). O’Reilly Media, Inc. https://r4ds.hadley.nz/.
Wickham, Hadley, Romain Francois, Lionel Henry, Kirill Muller, and Davis Vaughan. 2023. Dplyr: A Grammar of Data Manipulation. https://dplyr.tidyverse.org.
Wickham, Hadley, Jim Hester, and Jennifer Bryan. 2024. Readr: Read Rectangular Text Data. https://readr.tidyverse.org/.
Wickham, Hadley, Davis Vaughan, and Maximilian Girlich. 2023. Tidyr: Tidy Messy Data. https://tidyr.tidyverse.org.
Wikipedia. 2025. “Data Science.” In Wikipedia. https://en.wikipedia.org/wiki/Data_science.