Appendix C — Installing and Configuring Ollama

Published

April 10, 2026

Ollama is a local runtime framework for large language models that can be used from R without needing a cloud API key.

This appendix provides practical steps for installing and configuring Ollama on various operating systems, pulling and testing models, verifying the local API, and calling Ollama from R.

It also includes rough guidelines for model size and RAM tradeoffs.

The goals are to:

C.1 What Ollama Is

Ollama is a local runtime for large language models. It exposes models through:

  • a command line interface
  • a local HTTP API

In this course, we use it to run models locally and call them from R without needing a cloud API key.

  • The free plan allows you to run one model at a time.

C.2 Installation

C.2.1 macOS

Download Ollama from the official site or install using the official script.

C.2.1.1 Download manually

C.2.1.2 Install from Terminal

curl -fsSL https://ollama.com/install.sh | sh

C.2.1.3 macOS note

The Ollama download page indicates the current macOS release requires macOS 14 Sonoma or later.

C.3 Windows

Download Ollama from the official site or use the PowerShell install command.

C.3.0.1 Download manually

C.3.1 Install from PowerShell

irm https://ollama.com/install.ps1 | iex

C.3.1.1 Windows note

The Ollama download page indicates that the current Windows release requires Windows 10 or later.

C.3.2 Linux

On Linux, the simplest install is the official shell script.

curl -fsSL https://ollama.com/install.sh | sh

You can also run Ollama in Docker, especially on Linux systems with NVIDIA GPUs.

C.4 First Verification

After installation, verify that Ollama is available.

C.4.1 Check installed version

ollama --version

C.5 See available models

ollama list

If no models are installed yet, the list may be empty.

D Pulling Two Models

For class, it is useful to install:

  • one small general-purpose model
  • one coding-oriented or larger model

Below are two good examples for local experimentation.

D.1 Example 1: Llama 3.2

llama3.2 is a lightweight general-purpose family available in 1B and 3B sizes.

ollama pull llama3.2

If you want the smaller version explicitly you can add:

ollama pull llama3.2:1b

D.2 Example 2: Qwen 2.5 Coder

qwen2.5-coder is a coding-focused model family available in several sizes.

  • This is a smaller version for fast results.
ollama pull qwen2.5-coder:3b
  • If you want a larger (slower) coding model for more complex challenges, consider adding:
ollama pull qwen2.5-coder:7b
Suggested classroom combination

A practical two-model setup is:

  • llama3.2 for lightweight general prompting
  • qwen2.5-coder:3b for coding tasks
Model Size vs File Size

Model names often refer to the number of parameters (e.g., 1B, 3B, 7B), not the size of the file on disk.

For example, llama3.2:latest is typically a ~3B parameter model, but may appear as only ~2 GB on disk due to quantization (compression for efficient local use).

  • B (billions) means model size (capacity)
  • GB (gigabytes) means storage size (after compression)

These are related but not the same.

D.3 Running Models from the Terminal

D.3.1 Start an interactive session

ollama run llama3.2

Or:

ollama run qwen2.5-coder:7b

If the model responds in the terminal, Ollama is working correctly.

D.4 Stop the session

Use Ctrl + D or Ctrl + C.

D.5 Using Ollama from R

Ollama exposes a local API endpoint, usually at:

http://localhost:11434

This means we can call it from R with a normal HTTP request.

D.6 Minimal R function

library(httr2)
library(jsonlite)

call_ollama <- function(prompt, model = "llama3.2") {
  req <- request("http://localhost:11434/api/generate") |>
    req_method("POST") |>
    req_body_json(list(
      model = model,
      prompt = prompt,
      stream = FALSE
    ))

  resp <- req_perform(req)
  body <- resp_body_json(resp)

  body$response
}

D.7 Test from R

call_ollama("Explain what a workflow is in one sentence.")
[1] "A workflow is a series of coordinated tasks and activities that are performed in a logical order to achieve a specific goal or objective, often involving multiple stages, processes, and interactions between people, systems, and data."

D.8 Switch models from R

call_ollama(
  "Write R code using dplyr to count rows in penguins. Return just the code and no explanation. Use the R native pipe as appropriate",
  model = "qwen2.5-coder:7b"
)
[1] "```R\nlibrary(dplyr)\n\npenguins %>%\n  count()\n```"

E Comparing Two Models

A useful early exercise is to send the same prompt to two models and compare:

  • clarity
  • code quality
  • latency
  • formatting consistency
prompt <- "Write R code using ggplot2 to plot mpg vs hp in mtcars."

call_ollama(prompt, model = "llama3.2")
[1] "Here is a simple example of how you can use ggplot2 to create a scatterplot of MPG vs HP from the mtcars dataset:\n\n```r\n# Install and load necessary library if not already installed\ninstall.packages(\"ggplot2\")\nlibrary(ggplot2)\n\n# Load the mtcars dataset\ndata(mtcars)\n\n# Create a new dataframe with mpg and hp as columns\nmpg_hp_df <- data.frame(MPG = mtcars$mpg, HP = mtcars$hp)\n\n# Create a scatterplot of MPG vs HP\nggplot(mpg_hp_df, aes(x = MPG, y = HP)) +\n  geom_point() +\n  labs(title = \"Scatterplot of MPG vs HP\", x = \"Miles per Gallon (MPG)\", y = \"Horsepower (HP)\")\n```\n\nThis code will create a simple scatterplot with the miles per gallon on the X-axis and the horsepower on the Y-axis."
call_ollama(prompt, model = "qwen2.5-coder:7b")
[1] "To create a scatter plot of `mpg` (miles per gallon) versus `hp` (horsepower) from the `mtcars` dataset using `ggplot2`, you can use the following R code:\n\n```R\n# Load the ggplot2 library if it's not already loaded\nlibrary(ggplot2)\n\n# Create the scatter plot\nggplot(mtcars, aes(x = hp, y = mpg)) +\n  geom_point() +\n  labs(title = \"Miles per Gallon vs Horsepower\",\n       x = \"Horsepower\",\n       y = \"Miles per Gallon\") +\n  theme_minimal()\n```\n\nThis code will produce a scatter plot where the x-axis represents horsepower (`hp`) and the y-axis represents miles per gallon (`mpg`). The `geom_point()` function is used to create the actual scatter plot points, and `labs()` is used to add a title and labels for the axes. The `theme_minimal()` function provides a clean, minimalist theme for the plot.\n\nMake sure you have the `ggplot2` package installed in your R environment. If it's not installed, you can install it using:\n\n```R\ninstall.packages(\"ggplot2\")\n```\n\nThen load the library with:\n\n```R\nlibrary(ggplot2)\n```\n\nAnd run the above code to generate the plot."

E.1 Rough RAM Guidelines

Model fit depends on more than parameter count alone. Actual memory needs depend on:

  • quantization level
  • context length
  • CPU vs GPU execution
  • number of concurrent models
  • operating system overhead

Still, a rough planning table is useful.

E.1.1 Rule of thumb

In general:

  • smaller models are easier to run locally and respond faster
  • larger models often perform better, but require more RAM or VRAM

The table below is a rough planning guide, not a guarantee.

Model Size Class Example Model Sizes Rough System RAM Guidance Typical Use
Very small 0.5B–1.5B 8 GB may be enough simple prompting, demos, lightweight classification
Small 2B–4B 8–16 GB summaries, basic coding help, fast local tests
Medium 7B–9B 16 GB is a practical target better general chat, stronger code generation
Upper-medium 12B–14B 24 GB preferred more capable reasoning and coding
Large local 27B–32B 48 GB or more advanced local experimentation
Very large 70B+ 128 GB+ or specialized hardware not typical for classroom laptops
Important caveat

These RAM ranges are approximate planning heuristics, not official guarantees.

Readers should interpret them as:

  • “likely comfortable”
  • not “always sufficient under all settings”

E.1.2 Model families and available sizes

These official Ollama model families are useful reference points:

  • llama3.2: 1B and 3B
  • mistral: 7B
  • llama3: 8B and 70B
  • gemma2: 2B, 9B, and 27B
  • gemma3: 1B, 4B, 12B, and 27B
  • qwen2.5: 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B
  • qwen2.5-coder: 0.5B, 1.5B, 3B, 7B, 14B, and 32B

E.2 Common Troubleshooting

E.2.1 Problem: ollama command not found

Possible causes:

  • install failed
  • terminal needs restarting
  • PATH was not updated

Try:

  • restarting the terminal
  • re-installing from the official download page

E.2.2 Problem: model not found

You may be using a model name that has not been pulled yet.

Check:

ollama list

Then pull the model you want:

ollama pull qwen2.5-coder:7b

E.2.3 Problem: R cannot connect to localhost:11434

Possible causes:

  • Ollama is not running
  • firewall/security software is interfering
  • the local service failed to start

Test from the terminal first with:

ollama run llama3.2

If the terminal call works but R does not, verify the URL in your R function.

E.2.4 Problem: responses are very slow

Possible causes:

  • model is too large for available hardware
  • CPU-only inference
  • insufficient RAM causing swap

Solutions:

  • use a smaller model
  • reduce context and prompt length
  • avoid running other heavy applications

E.3 Suggested Classroom Defaults

If students have typical laptops, start with:

  • llama3.2
  • qwen2.5-coder:3b

If students have stronger machines:

  • llama3.2
  • qwen2.5-coder:7b

If the goal is only lightweight experimentation:

  • llama3.2:1b

E.4 Quick Checklist

Before class, verify that each student can:

  • run ollama list
  • pull a model
  • run ollama run llama3.2
  • call the model from R
  • switch the model argument in call_ollama()

E.5 References