| PolarSPARC |
Quick Bytes on Docker Model Runner
| Bhaskar S | *UPDATED*02/01/2026 |
Overview
The Docker platform allows application developers to develop, build, deploy, and run applications in containers on a single host that are isolated from one another.
The Docker platform now brings a new and exciting extension called the Docker Model Runner, which allows one to manage, run, and deploy AI models locally using Docker.
Just like Docker enabled developers to pull and run container images, Docker Model Runner allows one to pull and run LLM or other AI models directly from Docker Hub or any OCI-compliant registry.
Installation
The installation is on a Ubuntu 24.04 LTS based Linux desktop.
To refresh the package index files on the Linux system, execute the following command in a terminal window:
$ sudo apt-get update
To ensure the pre-requisite certs and utilities are installed, execute the following command in a terminal window:
$ sudo apt-get install -y ca-certificates curl jq
To ensure valid Docker cryptographic keys are setup, execute the following commands in a terminal window:
$ sudo install -m 0755 -d /etc/apt/keyrings
$ sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc
$ sudo chmod a+r /etc/apt/keyrings/docker.asc
To add the Docker package repository, execute the following command in a terminal window:
$ echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu $(. /etc/os-release && echo "${UBUNTU_CODENAME:-$VERSION_CODENAME}") stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
Once again, to refresh the package index files on the Linux system, execute the following command in a terminal window:
$ sudo apt-get update
To install Docker on the system, execute the following command in a terminal window:
$ sudo apt-get install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin docker-model-plugin
To add the logged in user to the docker system group, execute the following commands in a terminal window:
$ sudo groupadd docker
$ sudo usermod -aG docker $(whoami)
To reboot the system, execute the following command in a terminal window:
$ sudo shutdown -r now
Once the system is up (after the reboot), execute the following command in a terminal window:
$ docker info
The following should be the typical trimmed output:
Client: Docker Engine - Community
Version: 29.2.0
Context: default
Debug Mode: false
Plugins:
buildx: Docker Buildx (Docker Inc.)
Version: v0.31.1
Path: /usr/libexec/docker/cli-plugins/docker-buildx
compose: Docker Compose (Docker Inc.)
Version: 2.37.1+ds1-0ubuntu2~24.04.1
Path: /usr/libexec/docker/cli-plugins/docker-compose
model: Docker Model Runner (Docker Inc.)
Version: v1.0.9
Path: /usr/libexec/docker/cli-plugins/docker-model
.....[TRIM].....
With this, we complete the installation of the Docker platform with the Docker Model Runner on the Linux system.
Hands-on with Docker Model Runner
To display the version of the installed Docker Model Runner, execute the following command in the terminal window:
$ docker model version
The following would be a typical output:
Docker Model Runner version v1.0.9 Docker Engine Kind: Docker Engine
To display the sub-commands available for the Docker Model Runner, execute the following command in the terminal window:
$ docker model help
The following would be a typical output:
Usage: docker model COMMAND Docker Model Runner Commands: bench Benchmark a model's performance at different concurrency levels df Show Docker Model Runner disk usage inspect Display detailed information on one model install-runner Install Docker Model Runner (Docker Engine only) list List the models pulled to your local environment logs Fetch the Docker Model Runner logs package Package a GGUF file, Safetensors directory, DDUF file, or existing model into a Docker model OCI artifact. ps List running models pull Pull a model from Docker Hub or HuggingFace to your local environment purge Remove all models push Push a model to Docker Hub reinstall-runner Reinstall Docker Model Runner (Docker Engine only) requests Fetch requests+responses from Docker Model Runner restart-runner Restart Docker Model Runner (Docker Engine only) rm Remove local models downloaded from Docker Hub run Run a model and interact with it using a submitted prompt or chat mode search Search for models on Docker Hub and HuggingFace start-runner Start Docker Model Runner (Docker Engine only) status Check if the Docker Model Runner is running stop-runner Stop Docker Model Runner (Docker Engine only) tag Tag a model uninstall-runner Uninstall Docker Model Runner (Docker Engine only) unload Unload running models version Show the Docker Model Runner version Run 'docker model COMMAND --help' for more information on a command.
To list all the AI models on the local host, execute the following command in the terminal window:
$ docker model list
If the local host does NOT have NVidia GPU, the following would be a typical output:
latest: Pulling from docker/model-runner f784408d7713: Pull complete ed3b61b56cf4: Pull complete 027c5e81f27e: Pull complete d70c5952b45f: Pull complete e06a12754a26: Pull complete a77e4ee68446: Pull complete b9b9c5dd5c4b: Pull complete 1ff8feaee46b: Pull complete ff8244fefab4: Pull complete Digest: sha256:795efebeaf009009ccc9d4ee5671969fed8de46559062f6816ed53e0efe3b6bf Status: Downloaded newer image for docker/model-runner:latest Successfully pulled docker/model-runner:latest Creating model storage volume docker-model-runner-models... Starting model runner container docker-model-runner... MODEL NAME PARAMETERS QUANTIZATION ARCHITECTURE MODEL ID CREATED CONTEXT SIZE
However, if the local host does HAVE a NVidia GPU, the following would be a typical output:
latest-cuda: Pulling from docker/model-runner 0622fac788ed: Already exists 2b2da1c48640: Already exists b2276ea4fcfd: Already exists 2311d82dd6d8: Already exists 1bba15468fcc: Already exists 62d562df65b5: Pull complete 364bc44563a1: Pull complete 5444f6e0d4f1: Pull complete c54176f7c0ef: Pull complete b0581931af32: Pull complete 8e6b2bd1e51d: Pull complete 4f4fb700ef54: Pull complete 387e6d0853b2: Pull complete 6413e37a99fd: Pull complete 88cabecab9c9: Pull complete d40a38ce3086: Pull complete 9023d5deab17: Pull complete Digest: sha256:212efa8e3805f6c47a66be2078b8ed38d9a0e1d64d1a5803276a7a9e54c8933c Status: Downloaded newer image for docker/model-runner:latest-cuda Successfully pulled docker/model-runner:latest-cuda Starting model runner container docker-model-runner... MODEL NAME PARAMETERS QUANTIZATION ARCHITECTURE MODEL ID CREATED CONTEXT SIZE
As is evident from the Output.4 (or Output.5) above, there are no AI models currently downloaded on the local host.
Browse the currently available AI models in the Official Docker Models Catalog.
For the hands-on demonstration, we will pull the recently released Gemma-3n 2B LLM model.
To fetch the ai/gemma3n:2B-F16 LLM model from the Docker Hub registry and store it on the local host, execute the following command in the terminal window:
$ docker model pull ai/gemma3n:2B-F16
The above command will take a few seconds to download, so be a little patient !
The following would be a typical output:
c2b52d60a238: Pull complete [==================================================>] 8.344kB/8.344kB 96c32662a056: Pull complete [==================================================>] 8.919GB/8.919GB Model pulled successfully
Once again, to list all the AI models on the local host, execute the following command in the terminal window:
$ docker model list
The following would be a typical output:
MODEL NAME PARAMETERS QUANTIZATION ARCHITECTURE MODEL ID CREATED CONTEXT SIZE gemma3n:2B-F16 4.46 B F16 gemma3n f45ebd23a7bf 7 months ago 8.30 GiB
To serve the just downloaded AI model for Gemma-3n 2B locally, execute the following command in the terminal window:
$ docker model run ai/gemma3n:2B-F16
The following would be a typical output:
> Send a message (/? for help)
To chat with the running AI model, execute the following user prompt and press the ENTER key:
> what are the top three consumer GPUs in the market today ?
The following would be a typical output:
Okay, here are the top three consumer GPUs as of late 2023/early 2024, considering performance, price, and availability. It's important to note that the "best" GPU depends on your specific needs (gaming resolution, desired frame rates, budget). This is a general ranking:
1. **NVIDIA GeForce RTX 4090:**
* **Performance:** The undisputed king. It offers the highest performance for gaming and professional workloads. It consistently delivers the highest frame rates at 4K resolution and excellent performance at 1440p and even 1080p. It also has excellent ray tracing and DLSS 3 capabilities.
* **Price:** Very expensive. It's the most costly consumer GPU available.
* **Availability:** Generally available, but can still have occasional supply constraints.
* **Ideal For:** Enthusiast gamers, content creators, professionals who need the absolute best performance for demanding tasks.
2. **NVIDIA GeForce RTX 4080 SUPER:**
* **Performance:** A significant step up from the 4080, the Super version offers a noticeable performance boost. It's an excellent choice for 4K gaming and delivers very strong performance at 1440p. Ray tracing and DLSS 3 are also great here.
* **Price:** More affordable than the 4090, but still a premium GPU.
* **Availability:** Generally good availability.
* **Ideal For:** High-end gamers who want a great 4K experience without breaking the bank, and content creators who need a powerful GPU.
3. **AMD Radeon RX 7900 XTX:**
* **Performance:** AMD's top-tier card, the RX 7900 XTX, is a strong contender, particularly at 4K resolution. It often trades blows with the RTX 4080 SUPER in many games, and can sometimes outperform it depending on the game and settings.
* **Price:** Generally priced competitively with the RTX 4080 SUPER, or slightly lower in some cases.
* **Availability:** Generally good availability.
* **Ideal For:** Enthusiast gamers looking for a strong alternative to NVIDIA, especially at 4K. It's also a good choice for high-end 1440p gaming.
**Important Considerations:**
* **DLSS vs. FSR:** NVIDIA's DLSS (Deep Learning Super Sampling) is generally considered superior in image quality and performance gains compared to AMD's FSR (FidelityFX Super Resolution). However, FSR has improved significantly and is now a viable option.
* **Ray Tracing:** NVIDIA's RTX cards have a more mature ray tracing implementation than AMD's RX 7000 series.
* **Power Consumption:** The RTX 4090 is the most power-hungry, requiring a robust power supply. The RX 7900 XTX is also power-hungry but generally less so than the 4090.
* **Price Fluctuations:** GPU prices can vary significantly depending on the retailer and market conditions.
**Where to find more detailed comparisons:**
* **TechPowerUp:** [https://www.techpowerup.com/gpu-specs/](https://www.techpowerup.com/gpu-specs/)
* **Tom's Hardware:** [https://www.tomshardware.com/gpu](https://www.tomshardware.com/gpu)
* **GamersNexus:** [https://www.gamersnexus.net/](https://www.gamersnexus.net/)
I hope this helps! Let me know if you have any other questions.
Hooray !!! we have successfully tested the Docker Model Runner running the Gemma-3n 2B LLM model.
To list all the running Docker Models, execute the following command in a new terminal window:
$ docker model ps
The following would be a typical output:
MODEL NAME BACKEND MODE LAST USED gemma3n:2B-F16 llama.cpp completion 46 seconds ago
To list all the running Docker Containers, execute the following command in a new terminal window:
$ docker ps
The following would be a typical output:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES bff97003bde0 docker/model-runner:latest "/app/model-runner" 7 minutes ago Up 7 minutes 127.0.0.1:12434->12434/tcp, 172.17.0.1:12434->12434/tcp docker-model-runner
As can be inferred from the Output.11 above, the LLM model is accessible via the network on the host localhost at port 12434.
To display the detailed information about the AI models accessible via the network, execute the following command in a new terminal window:
$ curl -s http://localhost:12434/models | jq
The following would be a typical output:
[
{
"id": "sha256:f45ebd23a7bfc435bfbdac2f39fc1e798644ba40c8dd4dadcad284fd99230d32",
"tags": [
"docker.io/ai/gemma3n:2B-F16"
],
"created": 1751014610,
"config": {
"format": "gguf",
"quantization": "F16",
"parameters": "4.46 B",
"architecture": "gemma3n",
"size": "8.30 GiB",
"gguf": {
"gemma3n.activation_sparsity_scale": "1.644854, 1.644854, 1.644854, 1.644854, 1.644854, 1.644854, 1.644854, 1.644854, 1.644854, 1.644854, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf",
"gemma3n.altup.active_idx": "0",
"gemma3n.altup.num_inputs": "4",
"gemma3n.attention.head_count": "8",
"gemma3n.attention.head_count_kv": "2",
"gemma3n.attention.key_length": "256",
"gemma3n.attention.layer_norm_rms_epsilon": "0.000001",
"gemma3n.attention.shared_kv_layers": "10.000000",
"gemma3n.attention.sliding_window": "512",
"gemma3n.attention.sliding_window_pattern": "true, true, true, true, false, true, true, true, true, false, true, true, true, true, false, true, true, true, true, false, true, true, true, true, false, true, true, true, true, false",
"gemma3n.attention.value_length": "256",
"gemma3n.block_count": "30",
"gemma3n.context_length": "32768",
"gemma3n.embedding_length": "2048",
"gemma3n.embedding_length_per_layer_input": "256",
"gemma3n.feed_forward_length": "8192",
"gemma3n.rope.freq_base": "1000000.000000",
"general.architecture": "gemma3n",
"general.base_model.0.name": "Gemma 3n E4b It",
"general.base_model.0.organization": "Google",
"general.base_model.0.repo_url": "https://huggingface.co/google/gemma-3n-E4b-it",
"general.base_model.count": "1",
"general.basename": "gemma",
"general.file_type": "1",
"general.finetune": "3n-E2B-it",
"general.license": "gemma",
"general.name": "Gemma 3n E2B It",
"general.quantization_version": "2",
"general.size_label": "4.5B",
"general.tags": "automatic-speech-recognition, automatic-speech-translation, audio-text-to-text, video-text-to-text, image-text-to-text",
"general.type": "model",
"tokenizer.chat_template": "{{ bos_token }}\n{%- if messages[0]['role'] == 'system' -%}\n {%- if messages[0]['content'] is string -%}\n {%- set first_user_prefix = messages[0]['content'] + '\n\n' -%}\n {%- else -%}\n {%- set first_user_prefix = messages[0]['content'][0]['text'] + '\n\n' -%}\n {%- endif -%}\n {%- set loop_messages = messages[1:] -%}\n{%- else -%}\n {%- set first_user_prefix = \"\" -%}\n {%- set loop_messages = messages -%}\n{%- endif -%}\n{%- for message in loop_messages -%}\n {%- if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%}\n {{ raise_exception(\"Conversation roles must alternate user/assistant/user/assistant/...\") }}\n {%- endif -%}\n {%- if (message['role'] == 'assistant') -%}\n {%- set role = \"model\" -%}\n {%- else -%}\n {%- set role = message['role'] -%}\n {%- endif -%}\n {{ '' + role + '\n' + (first_user_prefix if loop.first else \"\") }}\n {%- if message['content'] is string -%}\n {{ message['content'] | trim }}\n {%- elif message['content'] is iterable -%}\n {%- for item in message['content'] -%}\n {%- if item['type'] == 'audio' -%}\n {{ '' }}\n {%- elif item['type'] == 'image' -%}\n {{ '' }}\n {%- elif item['type'] == 'text' -%}\n {{ item['text'] | trim }}\n {%- endif -%}\n {%- endfor -%}\n {%- else -%}\n {{ raise_exception(\"Invalid content type\") }}\n {%- endif -%}\n {{ '\n' }}\n{%- endfor -%}\n{%- if add_generation_prompt -%}\n {{'model\n'}}\n{%- endif -%}",
"tokenizer.ggml.add_bos_token": "true",
"tokenizer.ggml.add_eos_token": "false",
"tokenizer.ggml.add_sep_token": "false",
"tokenizer.ggml.add_space_prefix": "false",
"tokenizer.ggml.bos_token_id": "2",
"tokenizer.ggml.eos_token_id": "1",
"tokenizer.ggml.model": "llama",
"tokenizer.ggml.padding_token_id": "0",
"tokenizer.ggml.pre": "default",
"tokenizer.ggml.unknown_token_id": "3"
}
}
}
]
To display the AI models accessible for interaction through the OpenAI API, execute the following command in a new terminal window:
$ curl -s http://localhost:12434/engines/llama.cpp/v1/models | jq
The following would be a typical output:
{
"object": "list",
"data": [
{
"id": "docker.io/ai/gemma3n:2B-F16",
"object": "model",
"created": 1751014610,
"owned_by": "docker"
}
]
}
To interact with the running AI model using the OpenAI API interface, execute the following command in a new terminal window:
$ curl -s http://localhost:12434/engines/llama.cpp/v1/chat/completions -H 'Content-Type: application/json' \ -d '{ "model": "ai/gemma3n:2B-F16", "messages": [ { "role": "system", "content": "You are a helpful assistant." }, { "role": "user", "content": "List the top 3 consumer CPUs as a bullet-list item" } ] }' | jq
The following would be a typical output:
{
"choices": [
{
"finish_reason": "stop",
"index": 0,
"message": {
"role": "assistant",
"content": "Okay, here are the top 3 consumer CPUs as of late 2024, based on performance, value, and popularity. It's worth noting that \"top\" can vary slightly depending on the specific use case (gaming, productivity, etc.), but these are consistently highly regarded:\n\n* **Intel Core i7-14700K:** This is often considered the top-performing consumer CPU overall. It offers excellent gaming performance and strong productivity capabilities. It's a powerful choice for enthusiasts and gamers.\n\n* **AMD Ryzen 7 7700X:** A strong competitor to the i7-14700K, the Ryzen 7 7700X delivers excellent performance in both gaming and content creation. It's known for its efficiency and great value.\n\n* **Intel Core i5-14600K:** A great balance of price and performance, the i5-14600K is an excellent choice for gamers and everyday users. It provides a significant performance boost over previous generations and is a popular option for those building a mid-range system.\n\n\n\n**Important Considerations:**\n\n* **Price:** Prices fluctuate, so check current pricing from retailers.\n* **Motherboard Compatibility:** Make sure the CPU you choose is compatible with your motherboard (e.g., Intel vs. AMD, socket type).\n* **Cooling:** High-performance CPUs like these require good cooling solutions (air coolers or liquid coolers).\n\n\n\nI hope this helps!\n\n\n\n**Disclaimer:** *CPU performance can vary depending on the specific workload and system configuration.*\n\n\n\n"
}
}
],
"created": 1769985106,
"model": "model.gguf",
"system_fingerprint": "b1-34ce48d",
"object": "chat.completion",
"usage": {
"completion_tokens": 344,
"prompt_tokens": 29,
"total_tokens": 373
},
"id": "chatcmpl-oHvJiiA8Uvni9qxIgbNvL3MQKu4uKmec",
"timings": {
"cache_n": 4,
"prompt_n": 25,
"prompt_ms": 15.26,
"prompt_per_token_ms": 0.6103999999999999,
"prompt_per_second": 1638.2699868938403,
"predicted_n": 344,
"predicted_ms": 7206.789,
"predicted_per_token_ms": 20.949968023255813,
"predicted_per_second": 47.73276975363092
}
}
To delete the locally stored AI model ai/gemma3n:2B-F16 with the model ID f45ebd23a7bf, execute the following command in the terminal window:
$ docker model rm f45ebd23a7bf
The following would be a typical output:
Untagged: docker.io/ai/gemma3n:2B-F16 Deleted: sha256:f45ebd23a7bfc435bfbdac2f39fc1e798644ba40c8dd4dadcad284fd99230d32
Once again, to list all the locally stored AI models, execute the following command in the terminal window:
$ docker model list
The following would be a typical output:
MODEL NAME PARAMETERS QUANTIZATION ARCHITECTURE MODEL ID CREATED SIZE
With this, we conclude the hands-on demonstration on the Docker Model Runner .
References