library(httr2)
library(jsonlite)
call_ollama <- function(prompt, model = "llama3.2") {
req <- request("http://localhost:11434/api/generate") |>
req_method("POST") |>
req_body_json(list(
model = model,
prompt = prompt,
stream = FALSE
))
resp <- req_perform(req)
body <- resp_body_json(resp)
body$response
}Appendix C — Installing and Configuring Ollama
Ollama is a local runtime framework for large language models that can be used from R without needing a cloud API key.
This appendix provides practical steps for installing and configuring Ollama on various operating systems, pulling and testing models, verifying the local API, and calling Ollama from R.
It also includes rough guidelines for model size and RAM tradeoffs.
The goals are to:
- install Ollama on macOS, Windows, or Linux
- pull and test at least two models
- verify that the local API is working
- call Ollama from R
- understand rough model-size and RAM tradeoffs
C.1 What Ollama Is
Ollama is a local runtime for large language models. It exposes models through:
- a command line interface
- a local HTTP API
In this course, we use it to run models locally and call them from R without needing a cloud API key.
- The free plan allows you to run one model at a time.
C.2 Installation
C.2.1 macOS
Download Ollama from the official site or install using the official script.
C.2.1.1 Download manually
- Go to https://ollama.com/download/mac
- Install the application
- Launch Ollama
C.2.1.2 Install from Terminal
C.2.1.3 macOS note
The Ollama download page indicates the current macOS release requires macOS 14 Sonoma or later.
C.3 Windows
Download Ollama from the official site or use the PowerShell install command.
C.3.0.1 Download manually
- Go to https://ollama.com/download/windows
- Run the installer
- Launch Ollama
C.3.1 Install from PowerShell
C.3.1.1 Windows note
The Ollama download page indicates that the current Windows release requires Windows 10 or later.
C.3.2 Linux
On Linux, the simplest install is the official shell script.
You can also run Ollama in Docker, especially on Linux systems with NVIDIA GPUs.
C.4 First Verification
After installation, verify that Ollama is available.
C.4.1 Check installed version
C.5 See available models
If no models are installed yet, the list may be empty.
D Pulling Two Models
For class, it is useful to install:
- one small general-purpose model
- one coding-oriented or larger model
Below are two good examples for local experimentation.
D.1 Example 1: Llama 3.2
llama3.2 is a lightweight general-purpose family available in 1B and 3B sizes.
If you want the smaller version explicitly you can add:
D.2 Example 2: Qwen 2.5 Coder
qwen2.5-coder is a coding-focused model family available in several sizes.
- This is a smaller version for fast results.
- If you want a larger (slower) coding model for more complex challenges, consider adding:
A practical two-model setup is:
llama3.2for lightweight general promptingqwen2.5-coder:3bfor coding tasks
Model names often refer to the number of parameters (e.g., 1B, 3B, 7B), not the size of the file on disk.
For example, llama3.2:latest is typically a ~3B parameter model, but may appear as only ~2 GB on disk due to quantization (compression for efficient local use).
- B (billions) means model size (capacity)
- GB (gigabytes) means storage size (after compression)
These are related but not the same.
D.3 Running Models from the Terminal
D.3.1 Start an interactive session
Or:
If the model responds in the terminal, Ollama is working correctly.
D.4 Stop the session
Use Ctrl + D or Ctrl + C.
D.5 Using Ollama from R
Ollama exposes a local API endpoint, usually at:
This means we can call it from R with a normal HTTP request.
D.6 Minimal R function
D.7 Test from R
D.8 Switch models from R
E Comparing Two Models
A useful early exercise is to send the same prompt to two models and compare:
- clarity
- code quality
- latency
- formatting consistency
prompt <- "Write R code using ggplot2 to plot mpg vs hp in mtcars."
call_ollama(prompt, model = "llama3.2")[1] "Here is a simple example of how you can use ggplot2 to create a scatterplot of MPG vs HP from the mtcars dataset:\n\n```r\n# Install and load necessary library if not already installed\ninstall.packages(\"ggplot2\")\nlibrary(ggplot2)\n\n# Load the mtcars dataset\ndata(mtcars)\n\n# Create a new dataframe with mpg and hp as columns\nmpg_hp_df <- data.frame(MPG = mtcars$mpg, HP = mtcars$hp)\n\n# Create a scatterplot of MPG vs HP\nggplot(mpg_hp_df, aes(x = MPG, y = HP)) +\n geom_point() +\n labs(title = \"Scatterplot of MPG vs HP\", x = \"Miles per Gallon (MPG)\", y = \"Horsepower (HP)\")\n```\n\nThis code will create a simple scatterplot with the miles per gallon on the X-axis and the horsepower on the Y-axis."
[1] "To create a scatter plot of `mpg` (miles per gallon) versus `hp` (horsepower) from the `mtcars` dataset using `ggplot2`, you can use the following R code:\n\n```R\n# Load the ggplot2 library if it's not already loaded\nlibrary(ggplot2)\n\n# Create the scatter plot\nggplot(mtcars, aes(x = hp, y = mpg)) +\n geom_point() +\n labs(title = \"Miles per Gallon vs Horsepower\",\n x = \"Horsepower\",\n y = \"Miles per Gallon\") +\n theme_minimal()\n```\n\nThis code will produce a scatter plot where the x-axis represents horsepower (`hp`) and the y-axis represents miles per gallon (`mpg`). The `geom_point()` function is used to create the actual scatter plot points, and `labs()` is used to add a title and labels for the axes. The `theme_minimal()` function provides a clean, minimalist theme for the plot.\n\nMake sure you have the `ggplot2` package installed in your R environment. If it's not installed, you can install it using:\n\n```R\ninstall.packages(\"ggplot2\")\n```\n\nThen load the library with:\n\n```R\nlibrary(ggplot2)\n```\n\nAnd run the above code to generate the plot."
E.1 Rough RAM Guidelines
Model fit depends on more than parameter count alone. Actual memory needs depend on:
- quantization level
- context length
- CPU vs GPU execution
- number of concurrent models
- operating system overhead
Still, a rough planning table is useful.
E.1.1 Rule of thumb
In general:
- smaller models are easier to run locally and respond faster
- larger models often perform better, but require more RAM or VRAM
The table below is a rough planning guide, not a guarantee.
| Model Size Class | Example Model Sizes | Rough System RAM Guidance | Typical Use |
|---|---|---|---|
| Very small | 0.5B–1.5B | 8 GB may be enough | simple prompting, demos, lightweight classification |
| Small | 2B–4B | 8–16 GB | summaries, basic coding help, fast local tests |
| Medium | 7B–9B | 16 GB is a practical target | better general chat, stronger code generation |
| Upper-medium | 12B–14B | 24 GB preferred | more capable reasoning and coding |
| Large local | 27B–32B | 48 GB or more | advanced local experimentation |
| Very large | 70B+ | 128 GB+ or specialized hardware | not typical for classroom laptops |
These RAM ranges are approximate planning heuristics, not official guarantees.
Readers should interpret them as:
- “likely comfortable”
- not “always sufficient under all settings”
E.1.2 Model families and available sizes
These official Ollama model families are useful reference points:
llama3.2: 1B and 3Bmistral: 7Bllama3: 8B and 70Bgemma2: 2B, 9B, and 27Bgemma3: 1B, 4B, 12B, and 27Bqwen2.5: 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72Bqwen2.5-coder: 0.5B, 1.5B, 3B, 7B, 14B, and 32B
E.2 Common Troubleshooting
E.2.1 Problem: ollama command not found
Possible causes:
- install failed
- terminal needs restarting
- PATH was not updated
Try:
- restarting the terminal
- re-installing from the official download page
E.2.2 Problem: model not found
You may be using a model name that has not been pulled yet.
Check:
Then pull the model you want:
E.2.3 Problem: R cannot connect to localhost:11434
Possible causes:
- Ollama is not running
- firewall/security software is interfering
- the local service failed to start
Test from the terminal first with:
If the terminal call works but R does not, verify the URL in your R function.
E.2.4 Problem: responses are very slow
Possible causes:
- model is too large for available hardware
- CPU-only inference
- insufficient RAM causing swap
Solutions:
- use a smaller model
- reduce context and prompt length
- avoid running other heavy applications
E.3 Suggested Classroom Defaults
If students have typical laptops, start with:
llama3.2qwen2.5-coder:3b
If students have stronger machines:
llama3.2qwen2.5-coder:7b
If the goal is only lightweight experimentation:
llama3.2:1b
E.4 Quick Checklist
Before class, verify that each student can:
- run
ollama list - pull a model
- run
ollama run llama3.2 - call the model from R
- switch the
modelargument incall_ollama()
E.5 References
- Ollama downloads: https://ollama.com/download
- Ollama library: https://ollama.com/library
- Llama 3.2 on Ollama: https://ollama.com/library/llama3.2
- Qwen 2.5 on Ollama: https://ollama.com/library/qwen2.5
- Qwen 2.5 Coder on Ollama: https://ollama.com/library/qwen2.5-coder
- Gemma 3 on Ollama: https://ollama.com/library/gemma3