Appendix C — Installing and Configuring Ollama

Published

April 20, 2026

Ollama is a local runtime framework for large language models that can be used from R without needing a cloud API key.

This appendix provides practical steps for installing and configuring Ollama on various operating systems, pulling and testing models, verifying the local API, and calling Ollama from R.

It also includes rough guidelines for model size and RAM tradeoffs.

The goals are to:

C.1 What Ollama Is

Ollama is a local runtime for large language models. It exposes models through:

  • a command line interface
  • a local HTTP API

In this course, we use it to run models locally and call them from R without needing a cloud API key.

  • The free plan allows you to run one model at a time.

C.2 Installation

C.2.1 macOS

Download Ollama from the official site or install using the official script.

C.2.1.1 Download manually

C.2.1.2 Install from Terminal

curl -fsSL https://ollama.com/install.sh | sh

C.2.1.3 macOS note

The Ollama download page indicates the current macOS release requires macOS 14 Sonoma or later.

C.3 Windows

Download Ollama from the official site or use the PowerShell install command.

C.3.0.1 Download manually

C.3.1 Install from PowerShell

irm https://ollama.com/install.ps1 | iex

C.3.1.1 Windows note

The Ollama download page indicates that the current Windows release requires Windows 10 or later.

C.3.2 Linux

On Linux, the simplest install is the official shell script.

curl -fsSL https://ollama.com/install.sh | sh

You can also run Ollama in Docker, especially on Linux systems with NVIDIA GPUs.

C.4 First Verification

After installation, verify that Ollama is available.

C.4.1 Check installed version

ollama --version

C.5 See available models

ollama list

If no models are installed yet, the list may be empty.

D Pulling Two Models

For class, it is useful to install:

  • one small general-purpose model
  • one coding-oriented or larger model

Below are two good examples for local experimentation.

D.1 Example 1: Llama 3.2

llama3.2 is a lightweight general-purpose family available in 1B and 3B sizes.

ollama pull llama3.2

If you want the smaller version explicitly you can add:

ollama pull llama3.2:1b

D.2 Example 2: Qwen 2.5 Coder

qwen2.5-coder is a coding-focused model family available in several sizes.

  • This is a smaller version for fast results.
ollama pull qwen2.5-coder:3b
  • If you want a larger (slower) coding model for more complex challenges, consider adding:
ollama pull qwen2.5-coder:7b
Suggested classroom combination

A practical two-model setup is:

  • llama3.2 for lightweight general prompting
  • qwen2.5-coder:3b for coding tasks
Model Size vs File Size

Model names often refer to the number of parameters (e.g., 1B, 3B, 7B), not the size of the file on disk.

For example, llama3.2:latest is typically a ~3B parameter model, but may appear as only ~2 GB on disk due to quantization (compression for efficient local use).

  • B (billions) means model size (capacity)
  • GB (gigabytes) means storage size (after compression)

These are related but not the same.

D.3 Running Models from the Terminal

D.3.1 Start an interactive session

ollama run llama3.2

Or:

ollama run qwen2.5-coder:7b

If the model responds in the terminal, Ollama is working correctly.

D.4 Stop the session

Use Ctrl + D or Ctrl + C.

D.5 Using Ollama from R

Ollama exposes a local API endpoint, usually at:

http://localhost:11434

This means we can call it from R with a normal HTTP request.

D.6 Minimal R function

library(httr2)
library(jsonlite)

call_ollama <- function(prompt, model = "llama3.2") {
  req <- request("http://localhost:11434/api/generate") |>
    req_method("POST") |>
    req_body_json(list(
      model = model,
      prompt = prompt,
      stream = FALSE
    ))

  resp <- req_perform(req)
  body <- resp_body_json(resp)

  body$response
}

D.7 Test from R

call_ollama("Explain what a workflow is in one sentence.")
[1] "A workflow is a series of connected tasks, processes, and activities that are designed to achieve a specific goal or objective, often involving multiple stakeholders, tools, and systems."

D.8 Switch models from R

call_ollama(
  "Write R code using dplyr to count rows in penguins. Return just the code and no explanation. Use the R native pipe as appropriate",
  model = "qwen2.5-coder:7b"
)
[1] "```R\nlibrary(dplyr)\n\npenguins %>% \n  count()\n```"

E Comparing Two Models

A useful early exercise is to send the same prompt to two models and compare:

  • clarity
  • code quality
  • latency
  • formatting consistency
prompt <- "Write R code using ggplot2 to plot mpg vs hp in mtcars."

call_ollama(prompt, model = "llama3.2")
[1] "Here's an example of how you can use ggplot2 to create a scatter plot of mpg vs hp from the mtcars dataset:\n\n```r\n# Load the required libraries\nlibrary(ggplot2)\nlibrary(dplyr)\n\n# Load the mtcars dataset\ndata(mtcars)\n\n# Create a new data frame with only mpg and hp columns\nnew_data <- mtcars %>%\n  select(mpg, hp) %>%\n  arrange(desc(mpg)) # Arrange the data by mpg in descending order\n\n# Create a scatter plot of mpg vs hp\nggplot(new_data, aes(x = hp, y = mpg)) +\n  geom_point() +\n  labs(title = \"Scatter Plot of MPG vs HP\",\n       subtitle = \"from the mtcars dataset\",\n       x = \"HP\",\n       y = \"MPG\")\n```\n\nThis code first loads the ggplot2 and dplyr libraries. It then loads the mtcars dataset using the `data()` function.\n\nNext, it creates a new data frame called `new_data` that includes only the mpg and hp columns from the original dataset, arranged in descending order by mpg.\n\nFinally, it uses the `ggplot()` function to create a scatter plot of mpg vs hp. The `aes()` function is used to map the x and y aesthetics to the hp and mpg variables, respectively. The `geom_point()` function creates the scatter points, and the `labs()` function is used to set the title, subtitle, x-axis label, and y-axis label for the plot.\n\nYou can customize this plot as needed by adding additional layers (e.g., a regression line) or modifying the aesthetics."
call_ollama(prompt, model = "qwen2.5-coder:7b")
[1] "Certainly! Below is an example of how you can use the `ggplot2` package in R to create a scatter plot of miles per gallon (mpg) versus horsepower (hp) for the cars dataset (`mtcars`):\n\n```r\n# Load necessary library\nlibrary(ggplot2)\n\n# Load mtcars dataset\ndata(mtcars)\n\n# Create a ggplot scatter plot\nggplot(mtcars, aes(x = hp, y = mpg)) +\n  geom_point() + \n  labs(title = \"Scatter Plot of MPG vs Horsepower\",\n       x = \"Horsepower (hp)\",\n       y = \"Miles Per Gallon (mpg)\") +\n  theme_minimal()\n```\n\nThis code will generate a basic scatter plot with the following features:\n- The x-axis represents horsepower.\n- The y-axis represents miles per gallon.\n- Points are plotted for each car in the `mtcars` dataset.\n- A minimalistic theme is applied to the plot.\n\nYou can customize the plot further by adding more layers, changing colors, and other graphical elements as needed."

E.1 Rough RAM Guidelines

Model fit depends on more than parameter count alone. Actual memory needs depend on:

  • quantization level
  • context length
  • CPU vs GPU execution
  • number of concurrent models
  • operating system overhead

Still, a rough planning table is useful.

E.1.1 Rule of thumb

In general:

  • smaller models are easier to run locally and respond faster
  • larger models often perform better, but require more RAM or VRAM

The table below is a rough planning guide, not a guarantee.

Model Size Class Example Model Sizes Rough System RAM Guidance Typical Use
Very small 0.5B–1.5B 8 GB may be enough simple prompting, demos, lightweight classification
Small 2B–4B 8–16 GB summaries, basic coding help, fast local tests
Medium 7B–9B 16 GB is a practical target better general chat, stronger code generation
Upper-medium 12B–14B 24 GB preferred more capable reasoning and coding
Large local 27B–32B 48 GB or more advanced local experimentation
Very large 70B+ 128 GB+ or specialized hardware not typical for classroom laptops
Important caveat

These RAM ranges are approximate planning heuristics, not official guarantees.

Readers should interpret them as:

  • “likely comfortable”
  • not “always sufficient under all settings”

E.1.2 Model families and available sizes

These official Ollama model families are useful reference points:

  • llama3.2: 1B and 3B
  • mistral: 7B
  • llama3: 8B and 70B
  • gemma2: 2B, 9B, and 27B
  • gemma3: 1B, 4B, 12B, and 27B
  • qwen2.5: 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B
  • qwen2.5-coder: 0.5B, 1.5B, 3B, 7B, 14B, and 32B

E.2 Common Troubleshooting

E.2.1 Problem: ollama command not found

Possible causes:

  • install failed
  • terminal needs restarting
  • PATH was not updated

Try:

  • restarting the terminal
  • re-installing from the official download page

E.2.2 Problem: model not found

You may be using a model name that has not been pulled yet.

Check:

ollama list

Then pull the model you want:

ollama pull qwen2.5-coder:7b

E.2.3 Problem: R cannot connect to localhost:11434

Possible causes:

  • Ollama is not running
  • firewall/security software is interfering
  • the local service failed to start

Test from the terminal first with:

ollama run llama3.2

If the terminal call works but R does not, verify the URL in your R function.

E.2.4 Problem: responses are very slow

Possible causes:

  • model is too large for available hardware
  • CPU-only inference
  • insufficient RAM causing swap

Solutions:

  • use a smaller model
  • reduce context and prompt length
  • avoid running other heavy applications

E.3 Suggested Classroom Defaults

If students have typical laptops, start with:

  • llama3.2
  • qwen2.5-coder:3b

If students have stronger machines:

  • llama3.2
  • qwen2.5-coder:7b

If the goal is only lightweight experimentation:

  • llama3.2:1b

E.4 Quick Checklist

Before class, verify that each student can:

  • run ollama list
  • pull a model
  • run ollama run llama3.2
  • call the model from R
  • switch the model argument in call_ollama()

E.5 References