OpenCode Decoded: The Essential Primer

OpenCode Decoded: The Essential Primer - Part 1

OpenCode is an open source AI coding assistant that is an alternative to the wildly popular commercial offering - Claude Code !!!

In short, OpenCode is a natural language, conversational, provider agnostic, agentic coding tool that integrates with the users terminal (command line interface) to assist developers with their tasks.

OpenCode can navigate and understand a codebase for any project, plan the architecture, and apply changes to the codebase.

While OpenCode excels at coding tasks, it can also help with anything one can do from the command line, such as, writing docs, running commands, searching files, researching topics, and much more !!!

While Claude Code is locked to the LLM models from Anthropic, OpenCode is truly provider agnostic, allowing one to try LLM models from a plethora of providers including using locally running LLM model(s).

Installation and Setup

The installation and setup will can on a Ubuntu 24.04 LTS based Linux desktop.

To install OpenCode, execute the following command in a terminal window:

$ curl -fsSL https://opencode.ai/install | bash

At the time of this article, the following was the typical output:

Note that the OpenCode binary is installed in the directory $HOME/.opencode/bin.

We will be using the llama.cpp platform for local model serving. Ensure that Docker is installed and setup on the desktop (see INSTRUCTIONS).

We will create the required models directory by executing the following command in a terminal window:

$ mkdir -p $HOME/.llama_cpp/models

From the llama.cpp docker RESPOSITORY, one can identify the current version of the docker image. At the time of this article, the latest version of the docker image ended with the version b8925.

We require the docker image with the tag word full. If the desktop has an Nvidia GPU, one can look for the docker image with the tag words full-cuda.

To pull and download the full docker image for llama.cpp with CUDA support, execute the following command in a terminal window:

$ docker pull ghcr.io/ggml-org/llama.cpp:full-cuda-b8925

The following should be the typical output:

Output.1

full-cuda-b8925: Pulling from ggml-org/llama.cpp
5a7813e071bf: Pull complete 
a102f36d092c: Pull complete 
05ec76e31584: Pull complete 
398182656c47: Pull complete 
73389fbd088f: Pull complete 
cbb9175a9bc5: Pull complete 
3d6ab8c799cd: Pull complete 
7209097bfb98: Pull complete 
545a3ada5b6b: Pull complete 
78b86fd7e3b2: Pull complete 
9cf4bad41205: Pull complete 
a6678c064c57: Pull complete 
4f4fb700ef54: Pull complete 
6c56250a02bb: Pull complete 
Digest: sha256:6854d27e47626172f239a518e5df52b3a16ac1c383dc2959d23b760f82ed09a9
Status: Downloaded newer image for ghcr.io/ggml-org/llama.cpp:full-cuda-b8925
ghcr.io/ggml-org/llama.cpp:full-cuda-b8925

For the OpenCode demostration, we will download and use the just released Qwen 3.6 LLM model from Huggingface - the bartowski/Qwen_Qwen3.6-35B-A3B-GGUF model.

Download Qwen 3.6 35B A3B (4-bit) model to the directory $HOME/.llama_cpp/models.

To start the llama.cpp server for serving the Qwen 3.6 35B A3B (4-bit) model, execute the following command in the terminal window:.

$ docker run --rm --name llama_cpp --gpus all --network host -v $HOME/.llama_cpp/models:/models ghcr.io/ggml-org/llama.cpp:full-cuda-b8925 --server --model /models/Qwen_Qwen3.6-35B-A3B-Q4_K_M.gguf --alias qwen3.6-a3b --host 192.168.1.25 --port 8000 --device CUDA0 --temp 1.0 --top_k 64 --top_p 0.95 --no-mmap --threads 4 --ctx-size 65536 --flash-attn on -ctk q4_0 -ctv q4_0

The following should be the typical trimmed output:

Output.2

ggml_cuda_init: found 1 CUDA devices (Total VRAM: 15944 MiB):
  Device 0: NVIDIA GeForce RTX 4060 Ti, compute capability 8.9, VMM: yes, VRAM: 15944 MiB
load_backend: loaded CUDA backend from /app/libggml-cuda.so
load_backend: loaded CPU backend from /app/libggml-cpu-haswell.so
main: n_parallel is set to auto, using n_parallel = 4 and kv_unified = true
build_info: b8925-0adede866
system_info: n_threads = 4 (n_threads_batch = 4) / 16 | CUDA : ARCHS = 500,610,700,750,800,860,890,1200 | USE_GRAPHS = 1 | PEER_MAX_BATCH_SIZE = 128 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | LLAMAFILE = 1 | OPENMP = 1 | REPACK = 1 | 
Running without SSL
init: using 15 threads for HTTP server
start: binding port with default address family
main: loading model
srv    load_model: loading model '/models/Qwen_Qwen3.6-35B-A3B-Q4_K_M.gguf'
...[TRIM]...
common_fit_params: successfully fit params to free device memory
common_fit_params: fitting params to free memory took 3.12 seconds
llama_model_load_from_file_impl: using device CUDA0 (NVIDIA GeForce RTX 4060 Ti) (0000:04:00.0) - 15336 MiB free
llama_model_loader: loaded meta data with 48 key-value pairs and 733 tensors from /models/Qwen_Qwen3.6-35B-A3B-Q4_K_M.gguf (version GGUF V3 (latest))
...[TRIM]...
llama_context: n_seq_max     = 4
llama_context: n_ctx         = 65536
llama_context: n_ctx_seq     = 65536
llama_context: n_batch       = 2048
llama_context: n_ubatch      = 512
llama_context: causal_attn   = 1
llama_context: flash_attn    = enabled
llama_context: kv_unified    = true
llama_context: freq_base     = 10000000.0
llama_context: freq_scale    = 1
llama_context: n_ctx_seq (65536) < n_ctx_train (262144) -- the full capacity of the model will not be utilized
llama_context:  CUDA_Host  output buffer size =     3.79 MiB
llama_kv_cache:      CUDA0 KV buffer size =   360.00 MiB
llama_kv_cache: size =  360.00 MiB ( 65536 cells,  10 layers,  4/1 seqs), K (q4_0):  180.00 MiB, V (q4_0):  180.00 MiB
llama_kv_cache: attn_rot_k = 1, n_embd_head_k_all = 256
llama_kv_cache: attn_rot_v = 1, n_embd_head_k_all = 256
llama_memory_recurrent:      CUDA0 RS buffer size =   251.25 MiB
llama_memory_recurrent: size =  251.25 MiB (     4 cells,  40 layers,  4 seqs), R (f32):   11.25 MiB, S (f32):  240.00 MiB
...[TRIM]...
main: model loaded
main: server is listening on http://192.168.1.25:8000
main: starting the main loop...
srv  update_slots: all slots are idle

We need to create a config directory for OpenCode by executing the following command in the terminal window:

$ mkdir -p $HOME/.config/opencode

Next, we will create a JSON config file called opencode.json with the following content:

{
  "$schema": "https://opencode.ai/config.json",
  "provider": {
    "ollama": {
      "npm": "@ai-sdk/openai-compatible",
      "name": "Ollama",
      "options": {
        "baseURL": "http://192.168.1.25:11434/v1"
      },
      "models": {
        "gemma4:e4b": {
          "name": "Gemma 4 4B"
        }
      }
    },
    "llama-cpp": {
      "npm": "@ai-sdk/openai-compatible",
      "name": "llama-cpp",
      "options": {
        "baseURL": "http://192.168.1.25:8000/v1"
      },
      "models": {
        "qwen3.6-a3b": {
          "name": "Qwen 3.6 35B A3B"
        }
      }
    }
  }
}

Note that we have defined configuration for two local model providers - one for Ollama and the other for llama-cpp !!!

Hands-on with OpenCode

Create a trusted directory for OpenCode projects by executing the following commands in the terminal window:

$ mkdir -p $HOME/MyProjects/OpenCode

$ cd $HOME/MyProjects/OpenCode

To list all the available provider models in OpenCode, execute the following command in the terminal window:

The following should be the typical output:

Output.3

opencode/big-pickle
opencode/gpt-5-nano
opencode/minimax-m2.5-free
opencode/nemotron-3-super-free
llama-cpp/qwen3.6-a3b
ollama/gemma4:e4b

Launch OpenCode by executing the following command in the terminal window:

The user would be presented with the following conversational screen:

Notice that the default provider is OpenCode in the cloud with the LLM model Big Pickle.

OpenCode includes a set of pre-defined tasks in the form of / (slash) commands. The following table summarizes some of the slash commands:

Slash Command	Description
/help	Display help information
/init	Initialize opencode within a project
/models	Allows one to choose a model
/new	Create a new session
/sessions	Switch between sessions (including past sessions)
/status	View the status
/themes	Select a theme
/exit	Exit claude

We will go ahead and type the command /models as shown below:

Press Enter and and navigate to the desired model as shown below:

Press Enter and we will be back to the main conversation screen as shown below:

To test the setup, type a request prompt as shown below:

Press Enter after typing the user prompt and OpenCode will respond as shown below:

AWESOME - we have successfully tested OpenCode using a local model !

To undo the just performed operation, type the command /undo as shown below:

Press Enter and we are taken to the conversation screen with the appropriate message at the top left as shown below:

To exit the OpenCode cli, type the command /exit command as shown below:

Press Enter and OpenCode cli terminates and we are back to the system terminal.

With this, we conclude the Part 1 of the OpenCode primer series !!!