DeepSeek V3/R1 Local Deployment: Run the Strongest Open-Source Models on Your Computer

No expensive H100 GPUs needed. Learn to deploy DeepSeek V3 and the reasoning model R1 locally with Ollama and vLLM.

In early 2026, one name dominated the AI world: DeepSeek.

This AI lab from China released DeepSeek-V3 and DeepSeek-R1, going head-to-head with GPT-4 and Claude 3.5 across various benchmarks—and more importantly, it’s fully open source. DeepSeek-R1’s exceptional reasoning capabilities make complex math and programming problems a breeze.

Today, we’re not just talking about it—we’re teaching you how to run it on your own machine.

Why Run DeepSeek Locally?

  1. Privacy: Your code, your documents—completely offline.
  2. Latency: No network delay; local inference speed depends on your GPU.
  3. No Censorship: Local models (usually) don’t have the strict moderation of cloud APIs.
  4. Free: No token costs beyond electricity.

Hardware Requirements

DeepSeek’s open-source versions offer various distilled model sizes, making them runnable on regular GPUs.

  • DeepSeek-R1-Distill-Llama-8B:
    • VRAM: ~6GB (4-bit quantized)
    • Recommended: RTX 3060 / 4060
  • DeepSeek-R1-Distill-Qwen-32B:
    • VRAM: ~20GB (4-bit quantized)
    • Recommended: RTX 3090 / 4090 or Mac M2/M3 Max (32GB+)
  • DeepSeek-V3 (671B MoE):
    • VRAM: Massive (multi-GPU H800 or high-memory Mac Studio). Normal users should use the API or distilled versions.

Ollama is currently the most popular tool for running local LLMs.

1. Install Ollama

Go to ollama.com to download and install.

2. Run DeepSeek Models

Open terminal and choose a command based on your setup:

Run 8B version (works on most computers):

ollama run deepseek-r1:8b

Run 32B version (for 24GB VRAM or M-chip Mac):

ollama run deepseek-r1:32b

Run 70B version (for dual 3090/4090):

ollama run deepseek-r1:70b

3. Test Reasoning Capabilities

DeepSeek-R1’s signature feature is that it “thinks” (Chain of Thought). Try asking a logic puzzle:

“A pound of cotton and a pound of iron—which has larger volume? Reason step by step.”

You’ll see it first output content wrapped in <think> tags, showing its detailed reasoning process, then give a conclusion.

Method 2: Using vLLM (High-Performance Deployment)

If you’re a developer wanting to deploy a high-concurrency API service, vLLM is a better choice.

1. Install vLLM

pip install vllm

2. Start the Server

python -m vllm.entrypoints.openai.api_server \
  --model deepseek-ai/DeepSeek-R1-Distill-Llama-8B \
  --trust-remote-code \
  --port 8000

3. Call the API

Now your local machine becomes an OpenAI-compatible API server:

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="empty")

response = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-R1-Distill-Llama-8B",
    messages=[{"role": "user", "content": "Write a Python snake game"}]
)
print(response.choices[0].message.content)

DeepSeek vs Other Models

ModelStrengthsSpeedHardware Requirements
DeepSeek-V3General chat, multilingualFast (MoE)High
DeepSeek-R1Math, coding, logical reasoningSlow (long thinking)Medium
Llama 3General chat, creative writingMediumLow

Summary

DeepSeek’s emergence breaks the monopoly of closed-source models. R1’s reasoning capabilities prove that reinforcement learning (RL) can massively boost smaller models’ intellectual limits. Now, run this AI from the future right in your terminal.


DeepSeek R1’s thinking process and its transparency represent an important step toward explainable AI.