PolarSPARC

Quick Tip :: Fix for Claude Code using Qwen 3.6


Bhaskar S 06/12/2026


As described in the article Claude Using Qwen 3.6, one can configure Claude Code to use Qwen 3.6 local model via llama.cpp !

However, if you recently upgraded Claude Code to the recent version, you will likely encounter the following error:


!!! ERROR !!!

* API Error: 400 Unable to generate parser for this template. Automatic parser generation failed: 
  ------------
  While executing CallExpression at line 85, column 32 in source:
  ...first %}            {{- raise_exception('System message must be at the beginnin...
                                             ^
  Error: Jinja Exception: System message must be at the beginning.

This is because Qwen 3.6 imposes a strict rule expecting to see the system messages before the human messages in the input request (aka prompt).


This quick tip is about providing a fix for this issue by following the following three steps:

Ensure that Python 3.x programming language is installed and setup on the desktop.

In addition, install the following necessary Python module by executing the following command:


$ pip install gguf


Assuming the Qwen 3.6 gguf model file Qwen3.6-35B-A3B-Q4_K_M.gguf is at $HOME/.llama_cpp/models.

Execute the following Python script to extract the prompt template as a Jinja file in the $HOME/Downloads directory:


#
# @Description: Create Qwen3 Jinja Template from GGUF
# @Author:      Bhaskar S
# @Blog:        https://polarsparc.github.io
# @Date:        12 June 2026
#

import gguf
import os

HOME = os.environ.get('HOME', '')

GGUF_FILE = HOME + '/.llama_cpp/models/Qwen_Qwen3.6-35B-A3B-Q4_K_M.gguf'
QWEN3_JINJA_FILE = HOME + '/Downloads/qwen-3.6.jinja'

def main():
  print(f'Reading GGUF file: {GGUF_FILE}')
  gguf_reader = gguf.GGUFReader(GGUF_FILE)
  for field in gguf_reader.fields.values():
    if "chat_template" in field.name:
      chat_template = field.parts[field.data[0]].tobytes().decode('utf-8')
      print(f'Ready to dump chat template to {QWEN3_JINJA_FILE}')
      with open(QWEN3_JINJA_FILE, 'w') as f:
        f.write(chat_template)
      print("Chat template dumped to Qwen3.jinja")
      break

if __name__ == "__main__":
  main()

In the Jinja chat template file, identify the following lines:


{%- if message.role == "system" %}
  {%- if not loop.first %}
    {{- raise_exception('System message must be at the beginning.')}}
  {%- endif %}

Modify the lines to look as follows:


{%- if message.role == "system" %}
  {# %- if not loop.first % #}
    {# {- raise_exception('System message must be at the beginning.')} #}
  {# %- endif % #}

Copy the modified Jinja chat template file to $HOME/.llama_cpp/models.

Serve the Qwen 3.6 model by specifiying the modified chat template file via llama.cpp by executing the following:


$ docker run --rm --name llama_cpp --gpus all --network host -v $HOME/.llama_cpp/models:/models ghcr.io/ggml-org/llama.cpp:full-cuda-b9544 --server --model /models/Qwen_Qwen3.6-35B-A3B-Q4_K_M.gguf --alias qwen3.6-a3b --host 192.168.1.25 --port 8000 --device CUDA0 --temp 1.0 --top_k 64 --top_p 0.95 --no-mmap --threads 4 --ctx-size 65536 --flash-attn on -ctk q4_0 -ctv q4_0 --chat-template-file /models/qwen-3.6.jinja


Claude Code will work without any exceptions now !!!


References

Claude Code Using Local Model



© PolarSPARC