Appendix C — Install Python on your Computer
C.1 Introduction
Configuring your computer to work with R is fairly straightforward.
- Download R from the Comprehensive R Network (CRAN) , the organization which manages the software for “base R”.
- Download Posit’s RStudio Desktop. You can use a command line interface for R or use other Integrated Development Environments (IDEs), e.g., VS Code or Jupyter, but RStudio is the IDE of choice for most data scientists working on R-centric solutions.
- Install whatever packages you want from CRAN or other sources, e.g., GitHub or Bioconductor. R is designed to minimize conflicts between packages and tells you when there is conflict. It also incorporates methods for handling conflicts.
- Open up RStudio and get to work.
Installing Python is a different experience as there are more choices.
- Python is a more general purpose software language than R and is widely used across a spectrum of needs (including data science).
- To meet these diverse needs, Python offers multiple installation methods and package management systems.
- Understanding these options will help you create a stable, efficient environment for your data science work.
C.2 Python Installation Options: Standard Python vs Anaconda
Option 1: Standard Python (“Vanilla” Python)
- Installed directly from python.org.
- Comes with
pip
, the default package manager. - Requires manual setup of virtual environments (venv or virtualenv) to avoid conflicts between packages.
- Lightweight and flexible, allowing full control over package management.
Option 2: Anaconda
- The full Anaconda distribution installs Python and 300 python packages commonly used in Data Science work.
- It includes Conda, an alternative package manager (to pip) that includes capabilities to simplify dependency management across packages.
- The graphical installer includes other elements such as Jupyter Labs, a browser-based IDE.
- The Miniconda distribution installs python and a small number of packages to provide a smaller install.
For our class, Anaconda (or Miniconda) is the preferred choice because it simplifies package installation,reduces dependency compatibility issues, and reduces reproducibility or portability issues.
C.3 Choosing an IDE for Python: JupyterLab vs. VS Code
As a data scientist or Python developer, selecting the right IDE is crucial for an efficient workflow. Two of the most popular choices are JupyterLab and VS Code, each offering unique strengths. Note: the IDE does not affect the code you write, but it shapes how you write, debug, and version control your code.
- JupyterLab: Good for Interactive Computing and Notebooks
JupyterLab is an interactive development environment designed for interactive Jupyter Notebooks for quick experimentation, visualization, and documentation, all in one place.
- Key Features
✅ Native Jupyter Notebook Support: Best for exploratory data analysis and iterative workflows.
✅ Quarto Integration: Easily create interactive reports and export to formats like HTML, PDF, and Word.
✅ Interactive Development: Run code cell by cell and immediately see outputs.
✅ Rich Visualization Support: Works seamlessly with matplotlib, seaborn, plotly, etc.
- Advanced Debugging in JupyterLab
Basic debugging support using xeus-python, allowing variable inspection and breakpoints.
Limitations: Debugging is less sophisticated than in VS Code.
- Git and GitHub with JupyterLab
- 📌 Basic Git Integration – Install the JupyterLab Git extension to manage repositories visually.
- 📌 Manual Workflow – Without the extension, Git must be used via the terminal (git add, git commit, git push).
- 📌 Versioning Notebooks – Jupyter Notebooks store output inside .ipynb files, which makes them difficult to track in Git. Best practice: use nbdime (Jupyter Notebook Diff and Merge tool) to handle differences.
- 📌 Collaborative Work – GitHub handles versioning, but resolving merge conflicts in notebooks can be challenging.
- VS Code: The Best for Hybrid Workflows (Notebooks + Scripts + Debugging)
VS Code is a lightweight yet powerful IDE with support for Python scripting, interactive notebooks, and Quarto. It is widely used for software development, debugging, and data science.
- Key Features
✅ Supports Both Python Scripts & Jupyter Notebooks: Run .py and .ipynb files within the same environment.
✅ Quarto Integration: Edit, preview, and render .qmd documents inside VS Code.
✅ Advanced Debugging: Full debugging support with breakpoints, call stacks, and variable inspection.
✅ IntelliSense & Code Navigation: Powerful autocompletion and refactoring tools.
✅ Integrated Terminal & Git Support: Manage repositories directly within the IDE.
- Advanced Debugging in VS Code
Breakpoints & Step-through Debugging: More advanced than JupyterLab.
Variable Explorer: Inspect variables while debugging.
Notebook Debugging: Debug Jupyter notebooks with a similar experience to scripts.
- Git and GitHub with VS Code
- 📌 Built-in Git Support: View changes, commit, push, and pull without using the terminal.
- 📌 Better Conflict Resolution: VS Code provides diff tools to compare and merge conflicting changes.
- 📌 Versioning Notebooks: Displays notebook diffs more clearly than GitHub alone.
- 📌 Collaborative Work: GitHub integration makes it easier to collaborate, review code, and manage projects.
C.4 Managing Environments and Packages
Both R and Python have multiple standardized repositories of diverse packages and both languages support using “environments” to manage the set of packages (and their versions) being used for a given project.
- CRAN is one example of a repository of open source software for the R programming language.
- The Python Package Index (pypi.org) is one example of a repository of open source software for the Python programming language.
One can do a lot of work in R without worrying about managing the computing environment other than updating versions of R and the packages on a regular basis.
- However, when you want to ensure you are controlling the environment for a project to ensure shared reproducibility over time, you can use the {renv} package.
- The renv package helps you create reproducible environments for your R projects.
- The {renv} workflow uses a YAML lock.file that is version controlled to allow one to share the environment configuration so other users can create an identical R environment.
Python best practices (and IDEs) are more explicit/insistent about creating “virtual environments” as part of every project. See the Python Packaging User Guide for details about the Python package life cycle.
- Virtual environments allow you to install Python packages in an isolated location for a particular project without disrupting the base python install.
When starting a new project, you create an environment for the project.
- The environment has its own installation directories (like {renv} for R) and it does not share libraries with other virtual environments.
- You then install just the packages you need for that environment.
Your choice of How to install packages depends upon which version of python you installed (Section C.2).
C.4.1 Standard Python uses venv and pip to Manage Environments and Packages.
The venv module creates lightweight “virtual environments”, each with their own independent set of Python packages installed in their site directories.
- A virtual environment is created on top of an existing Python installation, known as the virtual environment’s “base” Python.
- When used from within a virtual environment, common installation tools such as pip will install Python packages into a virtual environment without needing to be told to do so explicitly.
- pip is the default package installer for python.
C.4.2 Anaconda uses Conda to Manage Environments and Packages.
Anaconda comes with the Conda environment and package manager.
The Conda environment mangement allows you to use the terminal to:
- Create an environment you call ‘my-env’ with its own version of Python and/or packages with
conda create --name <my-env>
. - Update your ‘my-env’ environment with
conda env update --file environment.yml --prune
. - List with
conda env list
. - Remove the ‘my-env’ environment with
conda remove --name my-env --all
. - Activate your ‘my-env’ environment with
conda activate my-env
so you can install packages and work in it.- By default,
conda activate
will deactivate the current environment before activating the new environment and reactivate the previously deactivated environment when deactivating the current environment. - By default, the active environment—the one you are currently using—is shown in parentheses () or brackets [] at the beginning of your command prompt:
(my-env) $
.
- By default,
- Deactivate the current active environment with
conda deactivate
.- By default, it will return you to the previous environment.
- A better way to get to the base environment is with just
conda activate
with no name.
- Export and share a YAML
environment.yml
file describing the environment. - Create an environment from an
environment.yml
file.
Conda package management allows you to:
- Search for packages with
conda search scipy
. - Install packages such as scipy into your environment ‘my-env’ with
conda install --name my-env scipy
. - Update packages or python with
conda update packagename
. - Remove a package with
conda remove packagename
.
Use Conda to install as many packages as you can but if a package is not available under Conda, you can use pip.
See Conda: Using pip in an environment for details.
See Conda Cheat sheet to download a cheat sheet.
C.5 Installing and Configuring Python
This section covers creating a capability with VS Code, the full Anaconda distribution, Quarto, and the ability to use Git and GitHub.
- Go to Download Visual Studio Code and follow the instructions for your operating system and chip set.
- Go to Installing Anaconda Distribution and follow the instructions for your operating system.
- Go to Quarto Get Started and follow the instructions to download the Quarto Command Line interface.
- Go to Git Downloads and follow the instructions for your operating system.
- If you do not have a GitHub account, go to GitHub and sign up for a free account.
Open VS Code
- Go to the extensions view.
- Install the Python extension for Visual Studio Code.
- Install the Quarto Extension.
- See the Quarto Guide for VS Code.
- Install the Github Pull Requests and Issues Extension
- Check your GitHub configuration with
git config --global --list
- To if missing or you need to update your name and email, use:
git config --global user.name "Your Name"
git config --global user.email "your-email\@example.com"
- If you have already set up Git to Authenticate with GitHub you should be ready.
- For more details see Working with GitHub in VS Code.
C.6 Checking Your Configuration
Go to a terminal window.
- Verify Visual Studio Code Installation
- Open VS Code and confirm it launches successfully.
- Check the version in the terminal with
code --version
.
- Go the VS Code extensions view and ensure the extensions show as installed.
- Verify Anaconda Installation
- Check the version in the terminal with
conda --version
. - If it returns a version (e.g., conda 24.11.3), Anaconda is installed.
- Check Conda environments work by listing them.
- Go to the terminal and enter
conda env list
. - You should see at least a base environment.
- Try creating and activating a test environment:
conda create --name test-env python
. SayY
to execute.conda activate test-env
. Your prompt should change to show thetest-env
.- Run
which python
. You should see the version used in the create command.
- Verify Quarto
- Run
quarto --version
. You should get a version as the result. - Run
quarto check
to verify the installation.
- Verify Git with
git --version
. You should get a version. - Check GitHub authentication with
gh auth status
. - Verify GitHub Integration with VS Code.
- Open VS Code and check that Git is detected:
- Go to the Source Control view (Ctrl+Shift+G or Cmd+Shift+G).
- If Git is installed, you should see options to initialize a repository or clone an existing one.
- If Git is not detected, ensure it is installed and configured correctly.
- Activate the base environment with
conda activate
. The prompt should change to the base. - Remove the test-env environment with
conda remove --name test-env --all
.
If everything worked, you are now ready to go to your projects in VS Code.
C.7 Working with Python using VS Code
See Quick Start Guide for Python in VS Code for a guide to working with Python in VS Code.
Section C.4 showed the commands for managing environments using a terminal window.
VS Code has built-in capabilities for working with vanilla python or Anaconda distributions to create and manage environments.
- See Python environments in VS Code for details on how to use the VS Code Command Palette for managing environments.
C.8 Have the best of Both Worlds
While the above discussion was cast as either/or, many data scientists use both.
- Some have a standard Python installation and an Anaconda installation.
- Some use Jupyter Lab for interactive or exploratory analysis and switch to VS Code for focusing on more complex scripts and functions that might require more debugging capabilities.
Sometimes you need multiple versions of Python to support different projects.
If you have vanilla python, pyenv is one approach to easily switch between multiple versions of Python. “It’s simple, unobtrusive, and follows the UNIX tradition of single-purpose tools that do one thing well.”
If you have Anaconda, see Managing Python for using Conda to create your environment and install whatever version of python you need.