How to run Ollama on your Mac: complete setup guide
Step-by-step guide to installing and running Ollama on your Mac. Run AI models locally with no API costs, no internet required, and full privacy.
Running Ollama on your Mac lets you use powerful AI models completely locally — no API keys, no monthly bills, no data sent to anyone's servers. I set mine up in about 10 minutes and it changed how I think about using AI for my projects.
Here's the complete setup guide for Mac, written for people who've never done anything like this before.
What is Ollama, and why run it locally?
Ollama is a free, open-source tool that lets you download and run large language models (LLMs) directly on your computer. Instead of paying for API calls to OpenAI or Anthropic, you run the model yourself, completely offline if you want.
This matters for a few reasons. Privacy — nothing leaves your machine. Speed — no network latency once the model is loaded. Cost — completely free after setup. And flexibility — you can use it in your own tools, scripts, and vibe coding projects. You can also connect Ollama to MCP servers to extend what it can do; check out the MCP servers directory on Vibestack for ideas.
What you'll need
- A Mac running macOS 12 (Monterey) or later
- Apple Silicon (M1/M2/M3/M4) or Intel Mac — both work, but Apple Silicon is noticeably faster
- At least 8GB of RAM (16GB+ recommended for larger models)
- At least 10GB of free disk space (models range from 2GB to 20GB+)
Step 1: Download and install Ollama
Head to ollama.com and click the download button. You'll get a .zip file.
Unzip it and move the Ollama app to your Applications folder, just like any other Mac app. Double-click to open it. You'll see a small llama icon appear in your menu bar — that means Ollama is running.
That's it. Installation done.
Step 2: Open Terminal
Ollama is controlled from the Terminal. Don't worry — the commands are simple and I'll walk you through exactly what to type.
Open Terminal by pressing Cmd + Space, typing "Terminal", and pressing Enter.
Step 3: Pull your first model
In Terminal, type this command to download a model:
ollama pull llama3.2
This downloads Meta's Llama 3.2 model — a solid, capable model that runs well on most Macs. It's about 2GB. You'll see a progress bar as it downloads.
Other good starter models:
ollama pull mistral— fast and efficient, great for text tasksollama pull phi4— Microsoft's compact but capable modelollama pull gemma3— Google's Gemma 3, excellent for coding and writingollama pull qwen2.5— strong multilingual support
Step 4: Run the model
Once downloaded, start a conversation with:
ollama run llama3.2
You'll see a prompt where you can type messages. It works just like ChatGPT — type your message, hit Enter, and the model responds. The first response might take a few seconds while the model loads into memory; after that it's fast.
To exit, type /bye and press Enter.
Step 5: Check what models you have
To see all models you've downloaded:
ollama list
To delete a model you no longer need:
ollama rm model-name
Step 6: Use Ollama with other apps (optional but powerful)
Ollama runs a local API server at http://localhost:11434. This means any app that can talk to an OpenAI-compatible API can use Ollama instead — for free, with no internet connection.
Apps that work with Ollama out of the box include Cursor, Open WebUI (a browser-based ChatGPT-style interface), Continue.dev, and many others. You can explore tools that pair well with local AI setups in Vibestack's tool directory.
For a browser-based interface (so you can use Ollama like ChatGPT), install Open WebUI:
docker run -d -p 3000:80 --add-host=host.docker.internal:host-gateway ghcr.io/open-webui/open-webui:main
Then open localhost:3000 in your browser.
Troubleshooting common issues
The model is very slow: This is usually a RAM issue. Try a smaller model like phi4 or gemma3:2b. Quit other apps to free up memory.
Ollama says "no such host": Make sure the Ollama app is running (look for the llama icon in your menu bar). If not, open it from Applications.
"Model too large" error: You need more disk space, or the model requires more RAM than you have. Try a quantized version: ollama pull llama3.2:1b for a smaller variant.
Slow first response: Normal — the model needs to load into memory first. Subsequent responses in the same session are much faster.
Which Mac hardware works best?
Apple Silicon Macs (M1 and later) are significantly better for running local AI models. The unified memory architecture means GPU and CPU share the same memory pool, which allows larger models to run faster. An M2 Mac with 16GB RAM can comfortably run 7B parameter models. An M3 or M4 with 32GB+ can run 13B–30B models well.
Intel Macs work, but are slower and limited to smaller models.
What can you build with Ollama?
Once you've got Ollama running, the possibilities are genuinely exciting. You can use it as the brain for personal automation workflows, integrate it into vibe coding projects, power local chatbots, summarize documents without sending them to a third-party service, and more.
For ideas on how to use it in vibe coding specifically, check out the Vibestack guide to using Ollama for vibe coding projects.
FAQ
Do I need internet to use Ollama after setup? No — once a model is downloaded, it runs entirely offline. Great for privacy and for working on planes or in areas with poor connectivity.
Can I run multiple models at the same time? Technically yes, but each model uses a lot of RAM. On most Macs, running more than one model simultaneously will be very slow. Switch between models rather than running them in parallel.
Is Ollama free? Yes, completely free and open-source. The models themselves are also free (they're open-source releases from companies like Meta, Google, and Microsoft). You're only "paying" in disk space and RAM.
Ready to go deeper? Explore more local AI tools and vibe coding resources at vibestack.in — the curated directory for non-coders who want to build with AI.