When it comes to Large Language Models (LLMs), there are moments when I ask myself: Do I really need to send my data halfway across the globe to OpenAI just to get a summary? With Ollama, there is now a standard for running models like Llama 3 or Mistral locally. And the best part: These models can be controlled directly from R. This not only saves money but also solves the data privacy problem quite elegantly. This article is about how to best implement this in R. Spoiler: There isn’t just one way, but (at least) two very good packages with completely different philosophies.
Ollama acts as a backend server that abstracts away the complexity of loading model weights, tokenization, and hardware acceleration. It is based on llama.cpp, a library that allows LLMs to run efficiently on standard hardware (especially Apple Silicon and consumer GPUs). A key element here is the GGUF format (GPT-Generated Unified Format). GGUF is a binary file format optimized for fast loading and mapping into RAM. Unlike PyTorch models, which often require gigantic VRAM resources, GGUF models support quantization. Quantization reduces the precision of weights from 16-bit floating point numbers (FP16) to 4-bit integers (Int4) or even less. This drastically reduces memory requirements with often negligible loss of quality. By default, Ollama binds to localhost (127.0.0.1) on port 11434. This means the R client and the Ollama server run on the same machine.
A quick side note on hardware, because I’m often asked what this runs on: Macs with Apple Silicon (M1/M2/M3/M4) are better than expected for local LLMs. The reason is the Unified Memory Architecture. On a classic PC, memory is separated: The CPU has its RAM, the graphics card has its VRAM. A large model like Llama 3 70B needs about 40 GB of memory. An Nvidia RTX 4090—the flagship for gamers—has “only” 24 GB of VRAM. The model simply doesn’t fit. On a Mac, the CPU and GPU share the same memory. With my M4 Max and 128 GB of RAM, I effectively have nearly 100 GB of video memory available. This allows me to run huge models locally that would otherwise require expensive enterprise hardware in the PC world. So if you have a Mac with a lot of RAM: You are sitting on an AI workstation.
The Pragmatist: ollamar
If you just want results quickly, ollamar is the tool of choice. The package feels “natural,” like any other good R package. It doesn’t try to reinvent the wheel but maps the Ollama API almost 1:1. It is also extremely flexible with its output. You can get the result as simple text, as a list, or—my favorite—directly as a dataframe (tibble). This makes it incredibly easy to process the results directly in a dplyr pipeline.
R
library(ollamar)
df <- generate("llama3.1:70b", "Why is R better than Excel?", output = "df")
df$response
[1] "A question that interests many data analysts and scientists!\n\nR and Excel are both popular tools for data analysis and visualization, but they have different strengths and weaknesses. Here are some reasons why R can be better than Excel in certain aspects:\n\n1. **Flexibility and Extensibility**: R is a programmable environment..." <truncated>
The results aren’t necessarily as good as those from the OpenAI or Gemini apps, and web search is (still) missing—you would have to configure that yourself. But for some tasks, these models are definitely good enough.
The Scientist: rollama
Then there is rollama. This package comes from academia. Johannes Gruber developed it, and you notice immediately that reproducibility is the focus here.
LLMs are famously “stochastic parrots”—they babble something different every time. For scientific work (or if we need a reliable data pipeline), this is a nightmare. rollama makes it very easy to set the “seed” and strictly control parameters like “temperature.” For text classification or annotations, this is the better way. rollama also has a brilliant helper function called make_query that helps to cleanly assemble complex prompts with system instructions.
For the Purists: httr2
Of course, you can also build everything yourself. Since Ollama provides a simple REST API, the great httr2 package is actually completely sufficient. This makes sense if you really want to control every detail of the request or want to build streaming responses (word by word) into your own Shiny App.
R
library(httr2)
req <- request("http://localhost:11434/api/generate") |>
req_method("POST") |>
req_body_json(list(
model = "llama3.1:70b",
prompt = "Tell me a statistics joke!",
stream = FALSE
))
resp <- req_perform(req)
result_json <- resp |> resp_body_json()
cat(result_json$response)
Why did the statistician go to the disco?
Because she heard there was a 100% chance of meeting someone. When she got there, however, she was alone.
The answer: She was the mean and everyone else was the standard deviation!
Though most jokes aren’t really that funny 🙂
Mall
The package mall follows a completely different approach. It integrates LLMs directly into dplyr pipelines as verbs for data manipulation. Instead of “chatting with the model,” you apply NLP operations to columns of a dataframe.
Core Concepts:
- Row-wise Processing: Functions like
llm_sentiment(),llm_summarize(), orllm_translate()operate row-by-row over a text column. - Caching: A massive problem when working with LLMs is computation time.
mallimplements automatic caching. If the same text is processed again with the same settings, the result comes from the cache, not the model. This saves hours of computing time during iterative analyses. - Backend Agnostic:
malluses packages likeollamarorellmerinternally but abstracts them away. You can switch the backend (e.g., to OpenAI) without changing your analysis code.
Tidyverse Workflow: This is the most elegant way to integrate LLMs into existing ETL (Extract, Transform, Load) processes.
R
library(tibble)
library(dplyr)
library(mall)
llm_use("ollama", model = "llama3.1")
reviews <- tribble(
~id, ~kommentar,
1, "The installation was super easy and runs stably.",
2, "Total disaster, nothing works as promised.",
3, "Quite okay for the price, but the docs could be better."
)
results <- reviews |>
llm_sentiment(
col = kommentar,
options = c("positive", "neutral", "negative")
)
glimpse(results)
> results
# A tibble: 3 × 3
id kommentar .sentiment
<dbl> <chr> <chr>
1 1 The installation was super easy and runs stably. positive
2 2 Total disaster, nothing works as promised. negative
3 3 Quite okay for the price, but the docs could be better. neutral
>
- Website: mlverse.github.io/mall
chattr and ellmer: Interactive Assistants
While the packages above are intended for programming, chattr and ellmer aim at supporting the programmer.
chattr offers a Shiny-based interface (gadget) directly in RStudio (Viewer Pane). It knows the context of the R session (loaded dataframes) and can thus generate context-aware code. Since version 0.3, chattr uses the backend package ellmer(developed by Posit) to establish connections to LLMs. ellmer offers a robust Chat class that manages conversation history and token counting. The connection to Ollama is trivial. This setup is excellent for building your own “Copilot” that runs locally and doesn’t send sensitive code snippets to the cloud.
Conclusion: Your data stays with you, and your laptop gets pleasantly warm in the winter—win-win.