Quick Bytes on Docker Model Runner

The Docker platform allows application developers to develop, build, deploy, and run applications in containers on a single host that are isolated from one another.

The Docker platform now brings a new and exciting extension called the Docker Model Runner, which allows one to manage, run, and deploy AI models locally using Docker.

Just like Docker enabled developers to pull and run container images, Docker Model Runner allows one to pull and run LLM or other AI models directly from Docker Hub or any OCI-compliant registry.

The installation is on a Ubuntu 24.04 LTS based Linux desktop.

To refresh the package index files on the Linux system, execute the following command in a terminal window:

$ sudo apt-get update

To ensure the pre-requisite certs and utilities are installed, execute the following command in a terminal window:

$ sudo apt-get install -y ca-certificates curl jq

To ensure valid Docker cryptographic keys are setup, execute the following commands in a terminal window:

$ sudo install -m 0755 -d /etc/apt/keyrings

$ sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc

$ sudo chmod a+r /etc/apt/keyrings/docker.asc

To add the Docker package repository, execute the following command in a terminal window:

$ echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu $(. /etc/os-release && echo "${UBUNTU_CODENAME:-$VERSION_CODENAME}") stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null

Once again, to refresh the package index files on the Linux system, execute the following command in a terminal window:

$ sudo apt-get update

To install Docker on the system, execute the following command in a terminal window:

$ sudo apt-get install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin docker-model-plugin

To add the logged in user to the docker system group, execute the following commands in a terminal window:

$ sudo groupadd docker

$ sudo usermod -aG docker $(whoami)

To reboot the system, execute the following command in a terminal window:

$ sudo shutdown -r now

Once the system is up (after the reboot), execute the following command in a terminal window:

The following should be the typical trimmed output:

Output.1

Client:
 Version:    27.5.1
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  0.20.1
    Path:     /usr/libexec/docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  2.33.0+ds1-0ubuntu1~24.04.1
    Path:     /usr/libexec/docker/cli-plugins/docker-compose
  model: Docker Model Runner (EXPERIMENTAL) (Docker Inc.)
    Version:  v0.1.32
    Path:     /usr/libexec/docker/cli-plugins/docker-model
.....[TRIM].....

With this, we complete the installation of the Docker platform with the Docker Model Runner on the Linux system.

Hands-on with Docker Model Runner

To display the version of the installed Docker Model Runner, execute the following command in the terminal window:

$ docker model version

The following would be a typical output:

Output.2

Docker Model Runner version v0.1.32
Docker Engine Kind: Docker Engine

To display the sub-commands available for the Docker Model Runner, execute the following command in the terminal window:

The following would be a typical output:

Output.3

Usage:  docker model COMMAND

Docker Model Runner (EXPERIMENTAL)

EXPERIMENTAL:
  docker model is an experimental feature.
  Experimental features provide early access to product functionality. These
  features may change between releases without warning, or can be removed from a
  future release. Learn more about experimental features in our documentation:
  https://docs.docker.com/go/experimental/

Commands:
  configure        Configure runtime options for a model
  df               Show Docker Model Runner disk usage
  inspect          Display detailed information on one model
  install-runner   Install Docker Model Runner (Docker Engine only)
  list             List the models pulled to your local environment
  logs             Fetch the Docker Model Runner logs
  package          Package a GGUF file into a Docker model OCI artifact, with optional licenses, and pushes it to the specified registry
  ps               List running models
  pull             Pull a model from Docker Hub or HuggingFace to your local environment
  push             Push a model to Docker Hub
  rm               Remove local models downloaded from Docker Hub
  run              Run a model and interact with it using a submitted prompt or chat mode
  status           Check if the Docker Model Runner is running
  tag              Tag a model
  uninstall-runner Uninstall Docker Model Runner
  unload           Unload running models
  version          Show the Docker Model Runner version

Run 'docker model COMMAND --help' for more information on a command.

To list all the AI models on the local host, execute the following command in the terminal window:

The following would be a typical output:

Output.4

MODEL NAME  PARAMETERS  QUANTIZATION  ARCHITECTURE  MODEL ID  CREATED  SIZE

As is evident from the Output.4 above, there are no AI models currently downloaded on the local host.

Browse the currently available AI models in the Official Docker Models Catalog.

For the hands-on demonstration, we will pull the recently released Gemma-3n 2B LLM model.

To fetch the ai/gemma3n:2B-F16 LLM model from the Docker Hub registry and store it on the local host, execute the following command in the terminal window:

$ docker model pull ai/gemma3n:2B-F16

The above command will take a few seconds to download, so be a little patient !

The following would be a typical output:

Output.5

Downloaded: 8300.00 MB
Model pulled successfully

Once again, to list all the AI models on the local host, execute the following command in the terminal window:

The following would be a typical output:

Output.6

MODEL NAME         PARAMETERS  QUANTIZATION  ARCHITECTURE  MODEL ID      CREATED     SIZE     
ai/gemma3n:2B-F16  4.46 B      F16           gemma3n       f45ebd23a7bf  8 days ago  8.30 GiB

To serve the just downloaded AI model for Gemma-3n 2B locally, execute the following command in the terminal window:

$ docker model run ai/gemma3n:2B-F16

The following would be a typical output:

Output.7

Interactive chat mode started. Type '/bye' to exit.
>

To chat with the running AI model, execute the following user prompt and press the ENTER key:

> what are the top three consumer GPUs in the market today ?

The following would be a typical output:

Output.8

Okay, here are the top three consumer GPUs as of late 2023/early 2024, considering performance, price, and availability. It's important to note that the "best" GPU depends on your specific needs (gaming resolution, desired frame rates, budget). This is a general ranking:

1. **NVIDIA GeForce RTX 4090:**
* **Performance:** The undisputed king. It offers the highest performance for gaming and professional workloads. It consistently delivers the highest frame rates at 4K resolution and excellent performance at 1440p and even 1080p. It also has excellent ray tracing and DLSS 3 capabilities.
* **Price:** Very expensive. It's the most costly consumer GPU available.
* **Availability:** Generally available, but can still have occasional supply constraints.
* **Ideal For:** Enthusiast gamers, content creators, professionals who need the absolute best performance for demanding tasks.

2. **NVIDIA GeForce RTX 4080 SUPER:**
* **Performance:** A significant step up from the 4080, the Super version offers a noticeable performance boost. It's an excellent choice for 4K gaming and delivers very strong performance at 1440p. Ray tracing and DLSS 3 are also great here.
* **Price:** More affordable than the 4090, but still a premium GPU.
* **Availability:** Generally good availability.
* **Ideal For:** High-end gamers who want a great 4K experience without breaking the bank, and content creators who need a powerful GPU.

3. **AMD Radeon RX 7900 XTX:**
* **Performance:** AMD's top-tier card, the RX 7900 XTX, is a strong contender, particularly at 4K resolution. It often trades blows with the RTX 4080 SUPER in many games, and can sometimes outperform it depending on the game and settings.
* **Price:** Generally priced competitively with the RTX 4080 SUPER, or slightly lower in some cases.
* **Availability:** Generally good availability.
* **Ideal For:** Enthusiast gamers looking for a strong alternative to NVIDIA, especially at 4K. It's also a good choice for high-end 1440p gaming.

**Important Considerations:**

* **DLSS vs. FSR:** NVIDIA's DLSS (Deep Learning Super Sampling) is generally considered superior in image quality and performance gains compared to AMD's FSR (FidelityFX Super Resolution). However, FSR has improved significantly and is now a viable option.
* **Ray Tracing:** NVIDIA's RTX cards have a more mature ray tracing implementation than AMD's RX 7000 series.
* **Power Consumption:** The RTX 4090 is the most power-hungry, requiring a robust power supply. The RX 7900 XTX is also power-hungry but generally less so than the 4090.
* **Price Fluctuations:** GPU prices can vary significantly depending on the retailer and market conditions.

**Where to find more detailed comparisons:**

* **TechPowerUp:** [https://www.techpowerup.com/gpu-specs/](https://www.techpowerup.com/gpu-specs/)
* **Tom's Hardware:** [https://www.tomshardware.com/gpu](https://www.tomshardware.com/gpu)
* **GamersNexus:** [https://www.gamersnexus.net/](https://www.gamersnexus.net/)

I hope this helps! Let me know if you have any other questions.

Hooray !!! we have successfully tested the Docker Model Runner running the Gemma-3n 2B LLM model.

To list all the running Docker Containers, execute the following command in a new terminal window:

The following would be a typical output:

Output.9

CONTAINER ID   IMAGE                        COMMAND               CREATED         STATUS         PORTS                                                     NAMES
97a401e9d1e1   docker/model-runner:latest   "/app/model-runner"   6 minutes ago   Up 6 minutes   127.0.0.1:12434->12434/tcp, 172.17.0.1:12434->12434/tcp   docker-model-runner

As can be inferred from the Output.9 above, the LLM model is accessible via the network on the host localhost at port 12434.

To display the detailed information about the AI models accessible via the network, execute the following command in a new terminal window:

$ curl -s http://localhost:12434/models | jq

The following would be a typical output:

Output.10

[
  {
    "id": "sha256:f45ebd23a7bfc435bfbdac2f39fc1e798644ba40c8dd4dadcad284fd99230d32",
    "tags": [
      "ai/gemma3n:2B-F16"
    ],
    "created": 1751014610,
    "config": {
      "format": "gguf",
      "quantization": "F16",
      "parameters": "4.46 B",
      "architecture": "gemma3n",
      "size": "8.30 GiB",
      "gguf": {
        "gemma3n.activation_sparsity_scale": "1.644854, 1.644854, 1.644854, 1.644854, 1.644854, 1.644854, 1.644854, 1.644854, 1.644854, 1.644854, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf",
        "gemma3n.altup.active_idx": "0",
        "gemma3n.altup.num_inputs": "4",
        "gemma3n.attention.head_count": "8",
        "gemma3n.attention.head_count_kv": "2",
        "gemma3n.attention.key_length": "256",
        "gemma3n.attention.layer_norm_rms_epsilon": "0.000001",
        "gemma3n.attention.shared_kv_layers": "10.000000",
        "gemma3n.attention.sliding_window": "512",
        "gemma3n.attention.sliding_window_pattern": "true, true, true, true, false, true, true, true, true, false, true, true, true, true, false, true, true, true, true, false, true, true, true, true, false, true, true, true, true, false",
        "gemma3n.attention.value_length": "256",
        "gemma3n.block_count": "30",
        "gemma3n.context_length": "32768",
        "gemma3n.embedding_length": "2048",
        "gemma3n.embedding_length_per_layer_input": "256",
        "gemma3n.feed_forward_length": "8192",
        "gemma3n.rope.freq_base": "1000000.000000",
        "general.architecture": "gemma3n",
        "general.base_model.0.name": "Gemma 3n E4b It",
        "general.base_model.0.organization": "Google",
        "general.base_model.0.repo_url": "https://huggingface.co/google/gemma-3n-E4b-it",
        "general.base_model.count": "1",
        "general.basename": "gemma",
        "general.file_type": "1",
        "general.finetune": "3n-E2B-it",
        "general.license": "gemma",
        "general.name": "Gemma 3n E2B It",
        "general.quantization_version": "2",
        "general.size_label": "4.5B",
        "general.tags": "automatic-speech-recognition, automatic-speech-translation, audio-text-to-text, video-text-to-text, image-text-to-text",
        "general.type": "model",
        "tokenizer.chat_template": "{{ bos_token }}\n{%- if messages[0]['role'] == 'system' -%}\n    {%- if messages[0]['content'] is string -%}\n        {%- set first_user_prefix = messages[0]['content'] + '\n\n' -%}\n    {%- else -%}\n        {%- set first_user_prefix = messages[0]['content'][0]['text'] + '\n\n' -%}\n    {%- endif -%}\n    {%- set loop_messages = messages[1:] -%}\n{%- else -%}\n    {%- set first_user_prefix = \"\" -%}\n    {%- set loop_messages = messages -%}\n{%- endif -%}\n{%- for message in loop_messages -%}\n    {%- if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%}\n        {{ raise_exception(\"Conversation roles must alternate user/assistant/user/assistant/...\") }}\n    {%- endif -%}\n    {%- if (message['role'] == 'assistant') -%}\n        {%- set role = \"model\" -%}\n    {%- else -%}\n        {%- set role = message['role'] -%}\n    {%- endif -%}\n    {{ '' + role + '\n' + (first_user_prefix if loop.first else \"\") }}\n    {%- if message['content'] is string -%}\n        {{ message['content'] | trim }}\n    {%- elif message['content'] is iterable -%}\n        {%- for item in message['content'] -%}\n            {%- if item['type'] == 'audio' -%}\n                {{ '' }}\n            {%- elif item['type'] == 'image' -%}\n                {{ '' }}\n            {%- elif item['type'] == 'text' -%}\n                {{ item['text'] | trim }}\n            {%- endif -%}\n        {%- endfor -%}\n    {%- else -%}\n        {{ raise_exception(\"Invalid content type\") }}\n    {%- endif -%}\n    {{ '\n' }}\n{%- endfor -%}\n{%- if add_generation_prompt -%}\n    {{'model\n'}}\n{%- endif -%}",
        "tokenizer.ggml.add_bos_token": "true",
        "tokenizer.ggml.add_eos_token": "false",
        "tokenizer.ggml.add_sep_token": "false",
        "tokenizer.ggml.add_space_prefix": "false",
        "tokenizer.ggml.bos_token_id": "2",
        "tokenizer.ggml.eos_token_id": "1",
        "tokenizer.ggml.model": "llama",
        "tokenizer.ggml.padding_token_id": "0",
        "tokenizer.ggml.pre": "default",
        "tokenizer.ggml.unknown_token_id": "3"
      }
    }
  }
]

To display the AI models accessible for interaction through the OpenAI API, execute the following command in a new terminal window:

$ curl -s http://localhost:12434/engines/llama.cpp/v1/models | jq

The following would be a typical output:

Output.11

{
  "object": "list",
  "data": [
    {
      "id": "ai/gemma3n:2B-F16",
      "object": "model",
      "created": 1751014610,
      "owned_by": "docker"
    }
  ]
}

To interact with the running AI model using the OpenAI API interface, execute the following command in a new terminal window:

$ curl -s http://localhost:12434/engines/llama.cpp/v1/chat/completions -H 'Content-Type: application/json' -d '{ "model": "ai/gemma3n:2B-F16", "messages": [ { "role": "system", "content": "You are a helpful assistant." }, { "role": "user", "content": "List the top 3 consumer CPUs as a bullet-list item" } ] }' | jq

The following would be a typical output:

Output.12

{
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Okay, here are the top 3 consumer CPUs as of late 2024, based on a combination of performance, price, and popularity.  Keep in mind that the \"best\" CPU depends heavily on individual needs and budget, but these are consistently highly-rated:\n\n*   **AMD Ryzen 7 8700X:**  A strong all-around performer, the 8700X offers excellent gaming performance and is capable of handling demanding productivity tasks. It's a good balance of price and performance.\n\n*   **Intel Core i7-14700K:** This is a top-tier CPU known for its exceptional gaming and content creation capabilities. It offers a significant performance boost over previous generations and is a favorite among enthusiasts.\n\n*   **AMD Ryzen 5 7600X:**  A great value option, the 7600X provides excellent performance for gaming and general use without breaking the bank. It's a solid choice for those looking for a capable CPU without spending a fortune.\n\n\n\n**Important Considerations:**\n\n*   **Motherboard Compatibility:**  You'll need to ensure the CPU you choose is compatible with your motherboard (socket type).\n*   **Cooling:** High-performance CPUs often require a good cooler (air cooler or liquid cooler).\n*   **Budget:** CPU prices fluctuate, so it's a good idea to compare prices from different retailers.\n*   **Specific Use Case:**  Consider what you'll primarily use the CPU for (gaming, video editing, general use) to determine which model is the best fit.\n\n\n\nI hope this helps! Let me know if you have any other questions.\n\n\n\n"
      }
    }
  ],
  "created": 1751745374,
  "model": "ai/gemma3n:2B-F16",
  "system_fingerprint": "b1-9c98bab",
  "object": "chat.completion",
  "usage": {
    "completion_tokens": 354,
    "prompt_tokens": 29,
    "total_tokens": 383
  },
  "id": "chatcmpl-r51ciTbEaKgMeB8xU5zhEMI8C6bewHU9",
  "timings": {
    "prompt_n": 1,
    "prompt_ms": 133.597,
    "prompt_per_token_ms": 133.597,
    "prompt_per_second": 7.485198020913643,
    "predicted_n": 354,
    "predicted_ms": 46981.2,
    "predicted_per_token_ms": 132.71525423728812,
    "predicted_per_second": 7.534928865163088
  }
}

To delete the locally stored AI model ai/gemma3n:2B-F16 with the model ID f45ebd23a7bf, execute the following command in the terminal window:

$ docker model rm f45ebd23a7bf

The following would be a typical output:

Output.13

Model sha256:f45ebd23a7bfc435bfbdac2f39fc1e798644ba40c8dd4dadcad284fd99230d32 removed successfully

Once again, to list all the locally stored AI models, execute the following command in the terminal window:

The following would be a typical output:

Output.14

MODEL NAME  PARAMETERS  QUANTIZATION  ARCHITECTURE  MODEL ID  CREATED  SIZE

With this, we conclude the hands-on demonstration on the Docker Model Runner .