PolarSPARC

Quick Bytes on Docker Model Runner


Bhaskar S *UPDATED*02/01/2026


Overview

The Docker platform allows application developers to develop, build, deploy, and run applications in containers on a single host that are isolated from one another.

The Docker platform now brings a new and exciting extension called the Docker Model Runner, which allows one to manage, run, and deploy AI models locally using Docker.

Just like Docker enabled developers to pull and run container images, Docker Model Runner allows one to pull and run LLM or other AI models directly from Docker Hub or any OCI-compliant registry.


Installation

The installation is on a Ubuntu 24.04 LTS based Linux desktop.

To refresh the package index files on the Linux system, execute the following command in a terminal window:


$ sudo apt-get update


To ensure the pre-requisite certs and utilities are installed, execute the following command in a terminal window:


$ sudo apt-get install -y ca-certificates curl jq


To ensure valid Docker cryptographic keys are setup, execute the following commands in a terminal window:


$ sudo install -m 0755 -d /etc/apt/keyrings

$ sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc

$ sudo chmod a+r /etc/apt/keyrings/docker.asc


To add the Docker package repository, execute the following command in a terminal window:


$ echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu $(. /etc/os-release && echo "${UBUNTU_CODENAME:-$VERSION_CODENAME}") stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null


Once again, to refresh the package index files on the Linux system, execute the following command in a terminal window:


$ sudo apt-get update


To install Docker on the system, execute the following command in a terminal window:


$ sudo apt-get install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin docker-model-plugin


To add the logged in user to the docker system group, execute the following commands in a terminal window:


$ sudo groupadd docker

$ sudo usermod -aG docker $(whoami)


To reboot the system, execute the following command in a terminal window:


$ sudo shutdown -r now


Once the system is up (after the reboot), execute the following command in a terminal window:


$ docker info


The following should be the typical trimmed output:


Output.1

Client: Docker Engine - Community
 Version:    29.2.0
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.31.1
    Path:     /usr/libexec/docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  2.37.1+ds1-0ubuntu2~24.04.1
    Path:     /usr/libexec/docker/cli-plugins/docker-compose
  model: Docker Model Runner (Docker Inc.)
    Version:  v1.0.9
    Path:     /usr/libexec/docker/cli-plugins/docker-model
.....[TRIM].....

With this, we complete the installation of the Docker platform with the Docker Model Runner on the Linux system.


Hands-on with Docker Model Runner

To display the version of the installed Docker Model Runner, execute the following command in the terminal window:


$ docker model version


The following would be a typical output:


Output.2

Docker Model Runner version v1.0.9
Docker Engine Kind: Docker Engine

To display the sub-commands available for the Docker Model Runner, execute the following command in the terminal window:


$ docker model help


The following would be a typical output:


Output.3

Usage:  docker model COMMAND

Docker Model Runner

Commands:
  bench            Benchmark a model's performance at different concurrency levels
  df               Show Docker Model Runner disk usage
  inspect          Display detailed information on one model
  install-runner   Install Docker Model Runner (Docker Engine only)
  list             List the models pulled to your local environment
  logs             Fetch the Docker Model Runner logs
  package          Package a GGUF file, Safetensors directory, DDUF file, or existing model into a Docker model OCI artifact.
  ps               List running models
  pull             Pull a model from Docker Hub or HuggingFace to your local environment
  purge            Remove all models
  push             Push a model to Docker Hub
  reinstall-runner Reinstall Docker Model Runner (Docker Engine only)
  requests         Fetch requests+responses from Docker Model Runner
  restart-runner   Restart Docker Model Runner (Docker Engine only)
  rm               Remove local models downloaded from Docker Hub
  run              Run a model and interact with it using a submitted prompt or chat mode
  search           Search for models on Docker Hub and HuggingFace
  start-runner     Start Docker Model Runner (Docker Engine only)
  status           Check if the Docker Model Runner is running
  stop-runner      Stop Docker Model Runner (Docker Engine only)
  tag              Tag a model
  uninstall-runner Uninstall Docker Model Runner (Docker Engine only)
  unload           Unload running models
  version          Show the Docker Model Runner version

Run 'docker model COMMAND --help' for more information on a command.

To list all the AI models on the local host, execute the following command in the terminal window:


$ docker model list


If the local host does NOT have NVidia GPU, the following would be a typical output:


Output.4

latest: Pulling from docker/model-runner
f784408d7713: Pull complete 
ed3b61b56cf4: Pull complete 
027c5e81f27e: Pull complete 
d70c5952b45f: Pull complete 
e06a12754a26: Pull complete 
a77e4ee68446: Pull complete 
b9b9c5dd5c4b: Pull complete 
1ff8feaee46b: Pull complete 
ff8244fefab4: Pull complete 
Digest: sha256:795efebeaf009009ccc9d4ee5671969fed8de46559062f6816ed53e0efe3b6bf
Status: Downloaded newer image for docker/model-runner:latest
Successfully pulled docker/model-runner:latest
Creating model storage volume docker-model-runner-models...
Starting model runner container docker-model-runner...
MODEL NAME  PARAMETERS  QUANTIZATION  ARCHITECTURE  MODEL ID  CREATED  CONTEXT  SIZE

However, if the local host does HAVE a NVidia GPU, the following would be a typical output:


Output.5

latest-cuda: Pulling from docker/model-runner
0622fac788ed: Already exists 
2b2da1c48640: Already exists 
b2276ea4fcfd: Already exists 
2311d82dd6d8: Already exists 
1bba15468fcc: Already exists 
62d562df65b5: Pull complete 
364bc44563a1: Pull complete 
5444f6e0d4f1: Pull complete 
c54176f7c0ef: Pull complete 
b0581931af32: Pull complete 
8e6b2bd1e51d: Pull complete 
4f4fb700ef54: Pull complete 
387e6d0853b2: Pull complete 
6413e37a99fd: Pull complete 
88cabecab9c9: Pull complete 
d40a38ce3086: Pull complete 
9023d5deab17: Pull complete 
Digest: sha256:212efa8e3805f6c47a66be2078b8ed38d9a0e1d64d1a5803276a7a9e54c8933c
Status: Downloaded newer image for docker/model-runner:latest-cuda
Successfully pulled docker/model-runner:latest-cuda
Starting model runner container docker-model-runner...
MODEL NAME  PARAMETERS  QUANTIZATION  ARCHITECTURE  MODEL ID  CREATED  CONTEXT  SIZE


As is evident from the Output.4 (or Output.5) above, there are no AI models currently downloaded on the local host.

Browse the currently available AI models in the Official Docker Models Catalog.

For the hands-on demonstration, we will pull the recently released Gemma-3n 2B LLM model.

To fetch the ai/gemma3n:2B-F16 LLM model from the Docker Hub registry and store it on the local host, execute the following command in the terminal window:


$ docker model pull ai/gemma3n:2B-F16


The above command will take a few seconds to download, so be a little patient !

The following would be a typical output:


Output.6

c2b52d60a238: Pull complete [==================================================>]  8.344kB/8.344kB
96c32662a056: Pull complete [==================================================>]  8.919GB/8.919GB
Model pulled successfully

Once again, to list all the AI models on the local host, execute the following command in the terminal window:


$ docker model list


The following would be a typical output:


Output.7

MODEL NAME      PARAMETERS  QUANTIZATION  ARCHITECTURE  MODEL ID      CREATED       CONTEXT  SIZE      
gemma3n:2B-F16  4.46 B      F16           gemma3n       f45ebd23a7bf  7 months ago           8.30 GiB 

To serve the just downloaded AI model for Gemma-3n 2B locally, execute the following command in the terminal window:


$ docker model run ai/gemma3n:2B-F16


The following would be a typical output:


Output.8

> Send a message (/? for help)

To chat with the running AI model, execute the following user prompt and press the ENTER key:


> what are the top three consumer GPUs in the market today ?


The following would be a typical output:


Output.9

Okay, here are the top three consumer GPUs as of late 2023/early 2024, considering performance, price, and availability.  It's important to note that the "best" GPU depends on your specific needs (gaming resolution, desired frame rates, budget).  This is a general ranking:

1.  **NVIDIA GeForce RTX 4090:** 
    *   **Performance:**  The undisputed king.  It offers the highest performance for gaming and professional workloads.  It consistently delivers the highest frame rates at 4K resolution and excellent performance at 1440p and even 1080p. It also has excellent ray tracing and DLSS 3 capabilities.
    *   **Price:**  Very expensive. It's the most costly consumer GPU available.
    *   **Availability:**  Generally available, but can still have occasional supply constraints.
    *   **Ideal For:** Enthusiast gamers, content creators, professionals who need the absolute best performance for demanding tasks.

2.  **NVIDIA GeForce RTX 4080 SUPER:**
    *   **Performance:** A significant step up from the 4080, the Super version offers a noticeable performance boost.  It's an excellent choice for 4K gaming and delivers very strong performance at 1440p. Ray tracing and DLSS 3 are also great here.
    *   **Price:**  More affordable than the 4090, but still a premium GPU.
    *   **Availability:**  Generally good availability.
    *   **Ideal For:** High-end gamers who want a great 4K experience without breaking the bank, and content creators who need a powerful GPU.

3.  **AMD Radeon RX 7900 XTX:**
    *   **Performance:**  AMD's top-tier card, the RX 7900 XTX, is a strong contender, particularly at 4K resolution. It often trades blows with the RTX 4080 SUPER in many games, and can sometimes outperform it depending on the game and settings.
    *   **Price:**  Generally priced competitively with the RTX 4080 SUPER, or slightly lower in some cases.
    *   **Availability:**  Generally good availability.
    *   **Ideal For:**  Enthusiast gamers looking for a strong alternative to NVIDIA, especially at 4K.  It's also a good choice for high-end 1440p gaming.

**Important Considerations:**

*   **DLSS vs. FSR:** NVIDIA's DLSS (Deep Learning Super Sampling) is generally considered superior in image quality and performance gains compared to AMD's FSR (FidelityFX Super Resolution). However, FSR has improved significantly and is now a viable option.
*   **Ray Tracing:**  NVIDIA's RTX cards have a more mature ray tracing implementation than AMD's RX 7000 series.
*   **Power Consumption:** The RTX 4090 is the most power-hungry, requiring a robust power supply. The RX 7900 XTX is also power-hungry but generally less so than the 4090.
*    **Price Fluctuations:** GPU prices can vary significantly depending on the retailer and market conditions.

**Where to find more detailed comparisons:**

*   **TechPowerUp:** [https://www.techpowerup.com/gpu-specs/](https://www.techpowerup.com/gpu-specs/)
*   **Tom's Hardware:** [https://www.tomshardware.com/gpu](https://www.tomshardware.com/gpu)
*   **GamersNexus:** [https://www.gamersnexus.net/](https://www.gamersnexus.net/)

I hope this helps! Let me know if you have any other questions.

Hooray !!! we have successfully tested the Docker Model Runner running the Gemma-3n 2B LLM model.

To list all the running Docker Models, execute the following command in a new terminal window:


$ docker model ps


The following would be a typical output:


Output.10

MODEL NAME      BACKEND    MODE        LAST USED       
gemma3n:2B-F16  llama.cpp  completion  46 seconds ago

To list all the running Docker Containers, execute the following command in a new terminal window:


$ docker ps


The following would be a typical output:


Output.11

CONTAINER ID   IMAGE                        COMMAND               CREATED         STATUS         PORTS                                                     NAMES
bff97003bde0   docker/model-runner:latest   "/app/model-runner"   7 minutes ago   Up 7 minutes   127.0.0.1:12434->12434/tcp, 172.17.0.1:12434->12434/tcp   docker-model-runner

As can be inferred from the Output.11 above, the LLM model is accessible via the network on the host localhost at port 12434.

To display the detailed information about the AI models accessible via the network, execute the following command in a new terminal window:


$ curl -s http://localhost:12434/models | jq


The following would be a typical output:


Output.12

[
  {
    "id": "sha256:f45ebd23a7bfc435bfbdac2f39fc1e798644ba40c8dd4dadcad284fd99230d32",
    "tags": [
      "docker.io/ai/gemma3n:2B-F16"
    ],
    "created": 1751014610,
    "config": {
      "format": "gguf",
      "quantization": "F16",
      "parameters": "4.46 B",
      "architecture": "gemma3n",
      "size": "8.30 GiB",
      "gguf": {
        "gemma3n.activation_sparsity_scale": "1.644854, 1.644854, 1.644854, 1.644854, 1.644854, 1.644854, 1.644854, 1.644854, 1.644854, 1.644854, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf",
        "gemma3n.altup.active_idx": "0",
        "gemma3n.altup.num_inputs": "4",
        "gemma3n.attention.head_count": "8",
        "gemma3n.attention.head_count_kv": "2",
        "gemma3n.attention.key_length": "256",
        "gemma3n.attention.layer_norm_rms_epsilon": "0.000001",
        "gemma3n.attention.shared_kv_layers": "10.000000",
        "gemma3n.attention.sliding_window": "512",
        "gemma3n.attention.sliding_window_pattern": "true, true, true, true, false, true, true, true, true, false, true, true, true, true, false, true, true, true, true, false, true, true, true, true, false, true, true, true, true, false",
        "gemma3n.attention.value_length": "256",
        "gemma3n.block_count": "30",
        "gemma3n.context_length": "32768",
        "gemma3n.embedding_length": "2048",
        "gemma3n.embedding_length_per_layer_input": "256",
        "gemma3n.feed_forward_length": "8192",
        "gemma3n.rope.freq_base": "1000000.000000",
        "general.architecture": "gemma3n",
        "general.base_model.0.name": "Gemma 3n E4b It",
        "general.base_model.0.organization": "Google",
        "general.base_model.0.repo_url": "https://huggingface.co/google/gemma-3n-E4b-it",
        "general.base_model.count": "1",
        "general.basename": "gemma",
        "general.file_type": "1",
        "general.finetune": "3n-E2B-it",
        "general.license": "gemma",
        "general.name": "Gemma 3n E2B It",
        "general.quantization_version": "2",
        "general.size_label": "4.5B",
        "general.tags": "automatic-speech-recognition, automatic-speech-translation, audio-text-to-text, video-text-to-text, image-text-to-text",
        "general.type": "model",
        "tokenizer.chat_template": "{{ bos_token }}\n{%- if messages[0]['role'] == 'system' -%}\n    {%- if messages[0]['content'] is string -%}\n        {%- set first_user_prefix = messages[0]['content'] + '\n\n' -%}\n    {%- else -%}\n        {%- set first_user_prefix = messages[0]['content'][0]['text'] + '\n\n' -%}\n    {%- endif -%}\n    {%- set loop_messages = messages[1:] -%}\n{%- else -%}\n    {%- set first_user_prefix = \"\" -%}\n    {%- set loop_messages = messages -%}\n{%- endif -%}\n{%- for message in loop_messages -%}\n    {%- if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%}\n        {{ raise_exception(\"Conversation roles must alternate user/assistant/user/assistant/...\") }}\n    {%- endif -%}\n    {%- if (message['role'] == 'assistant') -%}\n        {%- set role = \"model\" -%}\n    {%- else -%}\n        {%- set role = message['role'] -%}\n    {%- endif -%}\n    {{ '' + role + '\n' + (first_user_prefix if loop.first else \"\") }}\n    {%- if message['content'] is string -%}\n        {{ message['content'] | trim }}\n    {%- elif message['content'] is iterable -%}\n        {%- for item in message['content'] -%}\n            {%- if item['type'] == 'audio' -%}\n                {{ '' }}\n            {%- elif item['type'] == 'image' -%}\n                {{ '' }}\n            {%- elif item['type'] == 'text' -%}\n                {{ item['text'] | trim }}\n            {%- endif -%}\n        {%- endfor -%}\n    {%- else -%}\n        {{ raise_exception(\"Invalid content type\") }}\n    {%- endif -%}\n    {{ '\n' }}\n{%- endfor -%}\n{%- if add_generation_prompt -%}\n    {{'model\n'}}\n{%- endif -%}",
        "tokenizer.ggml.add_bos_token": "true",
        "tokenizer.ggml.add_eos_token": "false",
        "tokenizer.ggml.add_sep_token": "false",
        "tokenizer.ggml.add_space_prefix": "false",
        "tokenizer.ggml.bos_token_id": "2",
        "tokenizer.ggml.eos_token_id": "1",
        "tokenizer.ggml.model": "llama",
        "tokenizer.ggml.padding_token_id": "0",
        "tokenizer.ggml.pre": "default",
        "tokenizer.ggml.unknown_token_id": "3"
      }
    }
  }
]

To display the AI models accessible for interaction through the OpenAI API, execute the following command in a new terminal window:


$ curl -s http://localhost:12434/engines/llama.cpp/v1/models | jq


The following would be a typical output:


Output.13

{
  "object": "list",
  "data": [
    {
      "id": "docker.io/ai/gemma3n:2B-F16",
      "object": "model",
      "created": 1751014610,
      "owned_by": "docker"
    }
  ]
}

To interact with the running AI model using the OpenAI API interface, execute the following command in a new terminal window:


$ curl -s http://localhost:12434/engines/llama.cpp/v1/chat/completions -H 'Content-Type: application/json' \ -d '{ "model": "ai/gemma3n:2B-F16", "messages": [ { "role": "system", "content": "You are a helpful assistant." }, { "role": "user", "content": "List the top 3 consumer CPUs as a bullet-list item" } ] }' | jq


The following would be a typical output:


Output.14

{
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Okay, here are the top 3 consumer CPUs as of late 2024, based on performance, value, and popularity.  It's worth noting that \"top\" can vary slightly depending on the specific use case (gaming, productivity, etc.), but these are consistently highly regarded:\n\n*   **Intel Core i7-14700K:** This is often considered the top-performing consumer CPU overall. It offers excellent gaming performance and strong productivity capabilities. It's a powerful choice for enthusiasts and gamers.\n\n*   **AMD Ryzen 7 7700X:**  A strong competitor to the i7-14700K, the Ryzen 7 7700X delivers excellent performance in both gaming and content creation. It's known for its efficiency and great value.\n\n*   **Intel Core i5-14600K:** A great balance of price and performance, the i5-14600K is an excellent choice for gamers and everyday users. It provides a significant performance boost over previous generations and is a popular option for those building a mid-range system.\n\n\n\n**Important Considerations:**\n\n*   **Price:** Prices fluctuate, so check current pricing from retailers.\n*   **Motherboard Compatibility:**  Make sure the CPU you choose is compatible with your motherboard (e.g., Intel vs. AMD, socket type).\n*   **Cooling:**  High-performance CPUs like these require good cooling solutions (air coolers or liquid coolers).\n\n\n\nI hope this helps!\n\n\n\n**Disclaimer:** *CPU performance can vary depending on the specific workload and system configuration.*\n\n\n\n"
      }
    }
  ],
  "created": 1769985106,
  "model": "model.gguf",
  "system_fingerprint": "b1-34ce48d",
  "object": "chat.completion",
  "usage": {
    "completion_tokens": 344,
    "prompt_tokens": 29,
    "total_tokens": 373
  },
  "id": "chatcmpl-oHvJiiA8Uvni9qxIgbNvL3MQKu4uKmec",
  "timings": {
    "cache_n": 4,
    "prompt_n": 25,
    "prompt_ms": 15.26,
    "prompt_per_token_ms": 0.6103999999999999,
    "prompt_per_second": 1638.2699868938403,
    "predicted_n": 344,
    "predicted_ms": 7206.789,
    "predicted_per_token_ms": 20.949968023255813,
    "predicted_per_second": 47.73276975363092
  }
}

To delete the locally stored AI model ai/gemma3n:2B-F16 with the model ID f45ebd23a7bf, execute the following command in the terminal window:


$ docker model rm f45ebd23a7bf


The following would be a typical output:


Output.15

Untagged: docker.io/ai/gemma3n:2B-F16
Deleted: sha256:f45ebd23a7bfc435bfbdac2f39fc1e798644ba40c8dd4dadcad284fd99230d32

Once again, to list all the locally stored AI models, execute the following command in the terminal window:


$ docker model list


The following would be a typical output:


Output.16

MODEL NAME  PARAMETERS  QUANTIZATION  ARCHITECTURE  MODEL ID  CREATED  SIZE

With this, we conclude the hands-on demonstration on the Docker Model Runner .


References

Official Docker Model Runner Documentation



© PolarSPARC