Appendix C — Projects, Files, and Paths
C.1 References
- Using RStudio Projects “Using RStudio Projects” (2025)
- {here} Package Kirill Müller (2025)
- {fs} Package Jim Hester, Hadley Wickham, and Gábor Csárdi (2025)
- What They Forgot to Teach you about R Jennifer Bryan et al. (2025)
C.2 Data Science Projects
Data science workflows are usually constructed around the concept of a “Project”.
From a conceptual perspective, a data science project is a coherent set of work structured to answer a set of questions, solve a problem, or create a capability.
- In typical settings, a project may be associated with a defined outcome and a budget.
- For this course, the entire set of lecture notes is a project. Each assignment will be its own project.
From a physical perspective, a data science “project” is a coherent set of files that are created or configured to support the work.
A best practice is to keep all project files within or beneath a single project folder (directory) on your computer.
- Many coding tools such as Git expect this structure.
- Putting your project under a single folder makes your work easier to maintain by future you or others.
However, it does not all go in a single folder. You can, and should, have multiple sub-folders to organize your work.
- Separate your files based on their purpose, i.e., separate raw data, data, R scripts, and analysis files.
- Depending upon the nature of the work, there may be a specific folder structured that is required, e.g., for an R package.
- Figure C.1 shows the potential structure for a project in R.
myproject/ # Top-level project folder
├── data_raw/ # Folder for raw, untouched, data and data cleaning scripts
│ ├── my_data.csv # Raw data files
│ └── clean_data.R # R script for cleaning raw data
├── data/ # Folder for cleaned data
│ └── my_data.rds # Cleaned data files
├── R/ # Folder for scripts and custom functions
└── analysis/ # Folder for analysis and reports documents
└── my_analysis.qmd # Quarto report
Most Integrated Development Environments (IDEs) expect you to organize your work under a single top-level folder, often referred to as the project root.
- VS Code and Positron expect you to open a folder to define your workspace.
- You can also open multiple folders and configure a workspace to access files from multiple projects at once.
- RStudio allows you to create an RStudio Project, which includes special .Rproj and configuration files to track the state of your project.
- Most modern workflows for developing R packages assume the project files are organized within an RStudio Project.
- Jupyter does not enforce a formal project structure. However, best practice is to launch Jupyter from your top-level project folder and create or activate a virtual environment within that directory.
- This ensures consistent paths and environment isolation across notebooks.
Figure C.2 shows the folder structure for an RStudio Project that is managed in Git. It also has files that support Quarto creating a website/book and publishing it on line.
myproject/ # Top-level project folder
├── .git/ # Git repository folder (hidden)
├── .gitignore # Git ignore file for excluding files from version control
├── myproject.Rproj # RStudio project file defining project settings
├── RProj.user # RStudio user-specific session data (auto-generated)
├── _quarto.yml # Quarto website/book configuration
├── _publish.yml # Publication config for quarto.pub
├── data_raw/ # Raw, untouched data and cleaning scripts
│ ├── my_data.csv # Raw data file
│ └── clean_data.R # Data cleaning script
├── data/ # Cleaned, processed data
│ └── my_data.rds # Cleaned data file
├── R/ # Custom R scripts and functions
├── chapter_01.qmd # First chapter content
├── chapter_01_images/ # Images for chapter 01
└── README.md # GitHub project overview
This are just examples and projects may take on many different shapes.
- You may have sub-folders for SQL or python, or different types of R scripts.
- If you have created modules, you may have subfolders for each module.
- If you have a website with multiple tabs or pages, you may have folders for each tab or page.
- If you are creating an R package to publish it on CRAN you will have specific guidelines for some elements of your project. See the GGPLOT2 GitHub Repository as one example.
Unless you are an expert in working with Git submodules, there is rarely a good reason to nest one project inside another. Doing so can lead to a number of problems:
- Git confusion: Git expects each repository to maintain its own isolated history. Nesting repositories can lead to unexpected behavior and complicate version control workflows.
- Broken relative paths: Tools like Quarto assume a single project root. Nested projects can disrupt these assumptions and break file references.
- Conflicting configurations: Having multiple .Rproj files, renv environments, or environment.yml files can create ambiguity about which settings or environment should be used.
✅ Best practice: Check before you create a project that your root directory is not already within another project. You can use git status
in the terminal and you want it to fail, or you can look up the directory tree to find .*Rproj
files.
The key is to organize your files in a way that is easy to understand (by you and others), separates or modularizes the work, facilitates configuration management, and meets deployment or publishing requirements.
C.2.1 RStudio Projects
When working in RStudio, you should convert every project of any duration into an “RStudio Project” to take full advantage of the IDE’s capabilities.
To convert a folder into an RStudio Project, you can use the IDE Menu File → New Project → New Directory → New Project
or the drop-down arrow in the top right of the IDE.
- If you have an existing folder, you can use it as the basis for the RStudio Project.
RStudio will create the two files (shown in Figure C.2) to keep track of your project.
- You can open the project in RStudio by opening the
.Rproj
file or by using the File menuFile/Open Project...
or the drop down arrow at the top right of the IDE. - RStudio will automatically set up your console working directory, your Files pane working directory, and your Terminal pane prompt working directory to be the root folder of the RStudio Project.
See Working with RStudio Projects for more details.
VSCode and Positron will also set the working directory of the environment to the project root when a Folder or Workspace is opened.
C.3 File Paths and Working Directories
Good practices include separating files into different folders. Thus, you will usually need to connect from the file you are in, be it .qmd
, .R
, or .py
, to another file that has data, images, or functions you want to read or write.
Connecting to another file requires you telling your file the “path” to the other file of interest.
- A file path tells your code where to find a file or folder. It’s like the GPS location for reading data, saving plots, or loading resources.
There are two types of File Paths: Absolute and Relative.
C.3.1 Absolute Paths
An absolute path specifies the full location of a file starting from the absolute root of the computer’s file system and working down through each layer of folder to get to the final file name.
- On macOS/Linux:
"/Users/jane/my_documents/DATA-413/Assignments/hw_01/data/mydata.csv"
- On Windows:
"C:/Users/Jane/Documents/DATA-413/Assignments/hw_01/data/mydata.csv"
The advantage of absolute paths is they are always accurate on your machine.
The major disadvantage is they are always broken on someone else’s machine` – your code is not portable to other systems or reproducible by other people.
If you can see your computer user name in a path, it is an absolute path and will not work on my machine! Change it.
The solution is to use a relative path.
C.3.2 Relative Paths
A relative path starts from the current working directory and navigates up (if needed) and then down through the folder levels to get to the file of interest.
Assume you are working in a .qmd
file in your RStudio Project analysis
folder and you want to read in some data from a file in the data
folder that is at the same level as the analysis
folder. The relative paths might look like:
- On macOS/Linux:
"../data/mydata.csv"
- On Windows:
"../data/mydata.csv"
The major advantage of relative paths is they will will work on someone else’s machine. You do not need to care what their computer’s folder structure looks like (as long as they are using the same folder structure for the project as you - which they should).
A minor disadvantage is if someone changes the project folder structure the path will not work on anyone’s machine – should be rare.
Building a relative path is straight forward.
- Use
.
for the current directory,..
to go up one level, and/name
to go down one level to a thename
folder or file. - You can connect these together as much as needed to traverse the levels of folders.
- You should only need to go up as many levels as needed and then down as many levels as needed to find the file.
C.3.3 Where is the Working Directory?
A working directory is the folder your computer uses as the default starting point to look for files the code wants to read or write.
- Your computer can have many working directories at once as each computing process (a file or interactive pane with a cursor e.g., the Console pane or a Terminal window) typically has its own, independent working directory.
- Understanding how working directories are set in different tools and file types is key to writing portable, reliable code.
RStudio has at least three possible working directories: one for the interactive Console pane, one for the Source file, e.g., .qmd
or .R
, and one for the Terminal window.
- Console Pane: RStudio will set the Console working directory to the project root when opening an RStudio project or to the default from the
Global Options - General
if not in a project.- It will show the absolute path starting from the User root folder, shown as
~
, at the top of the Console pane. - You can also run
getwd()
to see the path andsetwd("new_path")
to change the Console working directory. - You can also use the File pane
More
menu to change the working directory.
- It will show the absolute path starting from the User root folder, shown as
- Source Pane
- Quarto (
.qmd
) Files: When working interactively in a document code chunk or rendering a.qmd
file, RStudio sets the working directory to the location of the file, not the project root.- If the file is saved in the
analysis
folder, that is the working directory.
- If the file is saved in the
- R Scripts (
.R
files)- Running an R script uses the current Console Pane working directory, which defaults to the project root if opened as an RStudio Project.
- Python Scripts (
.py
files)- If you run Python in RStudio via {reticulate}, it uses the R Console Pane working directory.
- Quarto (
- Terminal Window: RStudio will set the Terminal window working directory to the RStudio project root when first opening a project or leave it at the last folder that was being used if restoring a project or opening outside a project.
- Whatever folder is showing in front of the cursor is the working directory for the window.
- Any new terminal windows in a project will open at the root of the project.
- You can see the full path starting from the user root, shown with
~
, at the top of the terminal window. - If you want to change the working directory - navigate using bash
cd newpath
where newpath is indicated with combinations of..
, and/foldername
as used in a relative path.
Avoid using setwd()
in Quarto documents or scripts as it is not reliable for ensuring your code works on anyone else’s computer.
- You will see lots of blog posts and tutorials about how to use
setwd()
but most are old. Newer posts recommend against it in favor of using relative paths or functions from the {here} package or Base R.
The multiple working directories can cause challenges when running code in a file versus running it in the console.
C.4 The {here} Package for Finding Files in R Projects
The {here} package was designed to help users find files in their projects for both notebooks such as Quarto files and R scripts and help their work be reproducible.
The package assumes your work is in some sort of project structure that has a root directory.
It uses a helper package called {rprojroot} for finding the project root directory.
- It does this by looking for standard files that indicate a project such as
.Rproj
,.git
,.here
,DESCRIPTION
, etc.
Once it knows the project root directory, it creates an absolute path for it for the machine that is running the code.
Then when working in a document, it checks the working directory for the document relative to the root.
This allows you to then specify the path from the root directory to any other file (in or outside the project) with the here::here()
function.
- The function will figure out from where where it is being called and the correct path to the file.
Rather than using the current working directory (getwd())
, here()
always interprets paths relative to the project root.
It then converts those paths into absolute paths for the computer running the code. This makes your file references consistent whether you or someone else is:
- running a script,
- rendering a Quarto document,
- or, executing code in an interactive R session.
The here()
syntax allows you to create a complete path from the project root to the file name or use multiple arguments for each level of folder to the file name that it will concatenate with the /
as the separator.
Assume we have a .qmd
file in an analysis
folder just below the project root and we want to load data from a data
folder at the same level. The following code allows us to do that.
```r
library(here) # 1
readr::read_csv(here("data", "file.csv")) # 2
```
- When you run
library(here)
in a.qmd
file located in an analysis subfolder, it automatically walks up the directory tree to find the project root. It then creates a variable with the absolute path to the root. - The
here("data", "file.csv")
concatenatesdata/file.csv
to the absolute path to the root to create a complete path to the file.
If you are wondering why a path is not working, call here::dr_here()
and it will provide a message that by default includes the reasonwhy here()
is set to a particular directory.
You may notice the help file for here()
states “This package is intended for interactive use only.” That does not mean you should not use here()
in your projects or scripts. That is where it is best.
Interpret that statement as a warning to be careful about using it inside a package you may be developing as it creates another dependency in the package.
- If you are building a package, consider using
filePath()
, a base R function for building file paths in a cross-platform way:
Also note that other packages such as {plyr} or {txtutils} have a here()
function so be careful about what gets loaded after {here}. You can always use the ::
operator to be precise, here::here()
.
C.4.1 The {fs} package
Given the discussion of working with files and directories, if you are trying to work programmatically with folders and files, the {fs} package may be of interest. It is a tidyverse package that provides a “cross-platform, uniform interface to file system operations.”
The {fs} package (I think of it as file/folder support) provides functions in four main categories:
path_
for manipulating and constructing pathsfile_
for filesdir_
for directorieslink_
for links
Like other tidyverse functions, it works well with the pipes, is vectorized, returns “tidy” results, and “fails nice” in that provides detailed error messages.
It works well with {purrr} so if you need to work on all files in a directory, {fs} may be right for you.
C.4.2 Summary
- Use well-organized Projects for clean, reproducible work for R or Python.
- When using RStudio, convert each project into an RStudio Project for ease of navigation.
- For R and Python, keep code portable with relative paths.
- Although python has different rules for determining the working directory, it still likes relative paths.
- If working in R, the {here} package can make it much easier to create reusable code.