Appendix C — Installing and Configuring Ollama

Published

July 13, 2026

Ollama is a local runtime framework for large language models that can be used from R without needing a cloud API key.

This appendix provides practical steps for installing and configuring Ollama on various operating systems, pulling and testing models, verifying the local API, and calling Ollama from R.

It also includes rough guidelines for model size and RAM tradeoffs.

The goals are to:

install Ollama on macOS, Windows, or Linux
pull and test at least two models
verify that the local API is working
call Ollama from R
understand rough model-size and RAM tradeoffs

C.1 What Ollama Is

Ollama is a local runtime for large language models. It exposes models through:

a command line interface
a local HTTP API

In this course, we use it to run models locally and call them from R without needing a cloud API key.

The free plan allows you to run one model at a time.

C.2 Installation

C.2.1 macOS

Download Ollama from the official site or install using the official script.

C.2.1.1 Download manually

Go to https://ollama.com/download/mac
Install the application
Launch Ollama

C.2.1.2 Install from Terminal

curl -fsSL https://ollama.com/install.sh | sh

C.2.1.3 macOS note

The Ollama download page indicates the current macOS release requires macOS 14 Sonoma or later.

C.3 Windows

Download Ollama from the official site or use the PowerShell install command.

C.3.0.1 Download manually

Go to https://ollama.com/download/windows
Run the installer
Launch Ollama

C.3.1 Install from PowerShell

irm https://ollama.com/install.ps1 | iex

C.3.1.1 Windows note

The Ollama download page indicates that the current Windows release requires Windows 10 or later.

C.3.2 Linux

On Linux, the simplest install is the official shell script.

curl -fsSL https://ollama.com/install.sh | sh

You can also run Ollama in Docker, especially on Linux systems with NVIDIA GPUs.

C.4 First Verification

After installation, verify that Ollama is available.

C.4.1 Check installed version

ollama --version

C.5 See available models

ollama list

If no models are installed yet, the list may be empty.

D Pulling Two Models

For class, it is useful to install:

one small general-purpose model
one coding-oriented or larger model

Below are two good examples for local experimentation.

D.1 Example 1: Llama 3.2

llama3.2 is a lightweight general-purpose family available in 1B and 3B sizes.

ollama pull llama3.2

If you want the smaller version explicitly you can add:

ollama pull llama3.2:1b

D.2 Example 2: Qwen 2.5 Coder

qwen2.5-coder is a coding-focused model family available in several sizes.

This is a smaller version for fast results.

ollama pull qwen2.5-coder:3b

If you want a larger (slower) coding model for more complex challenges, consider adding:

ollama pull qwen2.5-coder:7b

Suggested classroom combination

A practical two-model setup is:

llama3.2 for lightweight general prompting
qwen2.5-coder:3b for coding tasks

Model Size vs File Size

Model names often refer to the number of parameters (e.g., 1B, 3B, 7B), not the size of the file on disk.

For example, llama3.2:latest is typically a ~3B parameter model, but may appear as only ~2 GB on disk due to quantization (compression for efficient local use).

B (billions) means model size (capacity)
GB (gigabytes) means storage size (after compression)

These are related but not the same.

D.3 Running Models from the Terminal

D.3.1 Start an interactive session

ollama run llama3.2

Or:

ollama run qwen2.5-coder:7b

If the model responds in the terminal, Ollama is working correctly.

D.4 Stop the session

Use Ctrl + D or Ctrl + C.

D.5 Using Ollama from R

Ollama exposes a local API endpoint, usually at:

http://localhost:11434

This means we can call it from R with a normal HTTP request.

D.6 Minimal R function

library(httr2)
library(jsonlite)

call_ollama <- function(prompt, model = "llama3.2") {
  req <- request("http://localhost:11434/api/generate") |>
    req_method("POST") |>
    req_body_json(list(
      model = model,
      prompt = prompt,
      stream = FALSE
    ))

  resp <- req_perform(req)
  body <- resp_body_json(resp)

  body$response
}

D.7 Test from R

call_ollama("Explain what a workflow is in one sentence.")

[1] "A workflow is a series of connected tasks, processes, and activities that are designed to achieve a specific goal or objective, often involving multiple stakeholders, tools, and systems."

D.8 Switch models from R

call_ollama(
  "Write R code using dplyr to count rows in penguins. Return just the code and no explanation. Use the R native pipe as appropriate",
  model = "qwen2.5-coder:7b"
)

[1] "```R\nlibrary(dplyr)\n\npenguins %>% \n  count()\n```"

E Comparing Two Models

A useful early exercise is to send the same prompt to two models and compare:

clarity
code quality
latency
formatting consistency

prompt <- "Write R code using ggplot2 to plot mpg vs hp in mtcars."

call_ollama(prompt, model = "llama3.2")

[1] "Here's an example of how you can use ggplot2 to create a scatter plot of mpg vs hp from the mtcars dataset:\n\n```r\n# Load the required libraries\nlibrary(ggplot2)\nlibrary(dplyr)\n\n# Load the mtcars dataset\ndata(mtcars)\n\n# Create a new data frame with only mpg and hp columns\nnew_data <- mtcars %>%\n  select(mpg, hp) %>%\n  arrange(desc(mpg)) # Arrange the data by mpg in descending order\n\n# Create a scatter plot of mpg vs hp\nggplot(new_data, aes(x = hp, y = mpg)) +\n  geom_point() +\n  labs(title = \"Scatter Plot of MPG vs HP\",\n       subtitle = \"from the mtcars dataset\",\n       x = \"HP\",\n       y = \"MPG\")\n```\n\nThis code first loads the ggplot2 and dplyr libraries. It then loads the mtcars dataset using the `data()` function.\n\nNext, it creates a new data frame called `new_data` that includes only the mpg and hp columns from the original dataset, arranged in descending order by mpg.\n\nFinally, it uses the `ggplot()` function to create a scatter plot of mpg vs hp. The `aes()` function is used to map the x and y aesthetics to the hp and mpg variables, respectively. The `geom_point()` function creates the scatter points, and the `labs()` function is used to set the title, subtitle, x-axis label, and y-axis label for the plot.\n\nYou can customize this plot as needed by adding additional layers (e.g., a regression line) or modifying the aesthetics."

call_ollama(prompt, model = "qwen2.5-coder:7b")

[1] "Certainly! Below is an example of how you can use the `ggplot2` package in R to create a scatter plot of miles per gallon (mpg) versus horsepower (hp) for the cars dataset (`mtcars`):\n\n```r\n# Load necessary library\nlibrary(ggplot2)\n\n# Load mtcars dataset\ndata(mtcars)\n\n# Create a ggplot scatter plot\nggplot(mtcars, aes(x = hp, y = mpg)) +\n  geom_point() + \n  labs(title = \"Scatter Plot of MPG vs Horsepower\",\n       x = \"Horsepower (hp)\",\n       y = \"Miles Per Gallon (mpg)\") +\n  theme_minimal()\n```\n\nThis code will generate a basic scatter plot with the following features:\n- The x-axis represents horsepower.\n- The y-axis represents miles per gallon.\n- Points are plotted for each car in the `mtcars` dataset.\n- A minimalistic theme is applied to the plot.\n\nYou can customize the plot further by adding more layers, changing colors, and other graphical elements as needed."

E.1 Rough RAM Guidelines

Model fit depends on more than parameter count alone. Actual memory needs depend on:

quantization level
context length
CPU vs GPU execution
number of concurrent models
operating system overhead

Still, a rough planning table is useful.

E.1.1 Rule of thumb

In general:

smaller models are easier to run locally and respond faster
larger models often perform better, but require more RAM or VRAM

The table below is a rough planning guide, not a guarantee.

Model Size Class	Example Model Sizes	Rough System RAM Guidance	Typical Use
Very small	0.5B–1.5B	8 GB may be enough	simple prompting, demos, lightweight classification
Small	2B–4B	8–16 GB	summaries, basic coding help, fast local tests
Medium	7B–9B	16 GB is a practical target	better general chat, stronger code generation
Upper-medium	12B–14B	24 GB preferred	more capable reasoning and coding
Large local	27B–32B	48 GB or more	advanced local experimentation
Very large	70B+	128 GB+ or specialized hardware	not typical for classroom laptops

Important caveat

These RAM ranges are approximate planning heuristics, not official guarantees.

Readers should interpret them as:

“likely comfortable”
not “always sufficient under all settings”

E.1.2 Model families and available sizes

These official Ollama model families are useful reference points:

llama3.2: 1B and 3B
mistral: 7B
llama3: 8B and 70B
gemma2: 2B, 9B, and 27B
gemma3: 1B, 4B, 12B, and 27B
qwen2.5: 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B
qwen2.5-coder: 0.5B, 1.5B, 3B, 7B, 14B, and 32B

E.2 Common Troubleshooting

E.2.1 Problem: `ollama` command not found

Possible causes:

install failed
terminal needs restarting
PATH was not updated

Try:

restarting the terminal
re-installing from the official download page

E.2.2 Problem: model not found

You may be using a model name that has not been pulled yet.

Check:

ollama list

Then pull the model you want:

ollama pull qwen2.5-coder:7b

E.2.3 Problem: R cannot connect to `localhost:11434`

Possible causes:

Ollama is not running
firewall/security software is interfering
the local service failed to start

Test from the terminal first with:

ollama run llama3.2

If the terminal call works but R does not, verify the URL in your R function.

E.2.4 Problem: responses are very slow

Possible causes:

model is too large for available hardware
CPU-only inference
insufficient RAM causing swap

Solutions:

use a smaller model
reduce context and prompt length
avoid running other heavy applications

E.3 Suggested Classroom Defaults

If students have typical laptops, start with:

llama3.2
qwen2.5-coder:3b

If students have stronger machines:

llama3.2
qwen2.5-coder:7b

If the goal is only lightweight experimentation:

llama3.2:1b

E.4 Quick Checklist

Before class, verify that each student can:

run ollama list
pull a model
run ollama run llama3.2
call the model from R
switch the model argument in call_ollama()

E.5 References

Ollama downloads: https://ollama.com/download
Ollama library: https://ollama.com/library
Llama 3.2 on Ollama: https://ollama.com/library/llama3.2
Qwen 2.5 on Ollama: https://ollama.com/library/qwen2.5
Qwen 2.5 Coder on Ollama: https://ollama.com/library/qwen2.5-coder
Gemma 3 on Ollama: https://ollama.com/library/gemma3

C.1 What Ollama Is

C.2 Installation

C.2.1 macOS

C.2.1.1 Download manually

C.2.1.2 Install from Terminal

C.2.1.3 macOS note

C.3 Windows

C.3.0.1 Download manually

C.3.1 Install from PowerShell

C.3.1.1 Windows note

C.3.2 Linux

C.4 First Verification

C.4.1 Check installed version

C.5 See available models

D Pulling Two Models

D.1 Example 1: Llama 3.2

D.2 Example 2: Qwen 2.5 Coder

D.3 Running Models from the Terminal

D.3.1 Start an interactive session

D.4 Stop the session

D.5 Using Ollama from R

D.6 Minimal R function

D.7 Test from R

D.8 Switch models from R

E Comparing Two Models

E.1 Rough RAM Guidelines

E.1.1 Rule of thumb

E.1.2 Model families and available sizes

E.2 Common Troubleshooting

E.2.1 Problem: ollama command not found

E.2.2 Problem: model not found

E.2.3 Problem: R cannot connect to localhost:11434

E.2.4 Problem: responses are very slow

E.3 Suggested Classroom Defaults

E.4 Quick Checklist

E.5 References

E.2.1 Problem: `ollama` command not found

E.2.3 Problem: R cannot connect to `localhost:11434`