| PolarSPARC |
Quick Tip :: Fix for Claude Code using Qwen 3.6
| Bhaskar S | 06/12/2026 |
As described in the article Claude Using Qwen 3.6, one can configure Claude Code to use Qwen 3.6 local model via llama.cpp !
However, if you recently upgraded Claude Code to the recent version, you will likely encounter the following error:
* API Error: 400 Unable to generate parser for this template. Automatic parser generation failed:
------------
While executing CallExpression at line 85, column 32 in source:
...first %} {{- raise_exception('System message must be at the beginnin...
^
Error: Jinja Exception: System message must be at the beginning.
This is because Qwen 3.6 imposes a strict rule expecting to see the system messages before the human messages in the input request (aka prompt).
This quick tip is about providing a fix for this issue by following the following three steps:
Extracting the built-in prompt template from the Qwen 3.6 gguf model file as a Jinja file
Modifying the extracted prompt template Jinja file to remove the strict rule
Using the modified prompt template Jinja file as the chat template with llama.cpp
Ensure that Python 3.x programming language is installed and setup on the desktop.
In addition, install the following necessary Python module by executing the following command:
$ pip install gguf
Assuming the Qwen 3.6 gguf model file Qwen3.6-35B-A3B-Q4_K_M.gguf is at $HOME/.llama_cpp/models.
Execute the following Python script to extract the prompt template as a Jinja file in the $HOME/Downloads directory:
#
# @Description: Create Qwen3 Jinja Template from GGUF
# @Author: Bhaskar S
# @Blog: https://polarsparc.github.io
# @Date: 12 June 2026
#
import gguf
import os
HOME = os.environ.get('HOME', '')
GGUF_FILE = HOME + '/.llama_cpp/models/Qwen_Qwen3.6-35B-A3B-Q4_K_M.gguf'
QWEN3_JINJA_FILE = HOME + '/Downloads/qwen-3.6.jinja'
def main():
print(f'Reading GGUF file: {GGUF_FILE}')
gguf_reader = gguf.GGUFReader(GGUF_FILE)
for field in gguf_reader.fields.values():
if "chat_template" in field.name:
chat_template = field.parts[field.data[0]].tobytes().decode('utf-8')
print(f'Ready to dump chat template to {QWEN3_JINJA_FILE}')
with open(QWEN3_JINJA_FILE, 'w') as f:
f.write(chat_template)
print("Chat template dumped to Qwen3.jinja")
break
if __name__ == "__main__":
main()
In the Jinja chat template file, identify the following lines:
{%- if message.role == "system" %}
{%- if not loop.first %}
{{- raise_exception('System message must be at the beginning.')}}
{%- endif %}
Modify the lines to look as follows:
{%- if message.role == "system" %}
{# %- if not loop.first % #}
{# {- raise_exception('System message must be at the beginning.')} #}
{# %- endif % #}
Copy the modified Jinja chat template file to $HOME/.llama_cpp/models.
Serve the Qwen 3.6 model by specifiying the modified chat template file via llama.cpp by executing the following:
$ docker run --rm --name llama_cpp --gpus all --network host -v $HOME/.llama_cpp/models:/models ghcr.io/ggml-org/llama.cpp:full-cuda-b9544 --server --model /models/Qwen_Qwen3.6-35B-A3B-Q4_K_M.gguf --alias qwen3.6-a3b --host 192.168.1.25 --port 8000 --device CUDA0 --temp 1.0 --top_k 64 --top_p 0.95 --no-mmap --threads 4 --ctx-size 65536 --flash-attn on -ctk q4_0 -ctv q4_0 --chat-template-file /models/qwen-3.6.jinja
Claude Code will work without any exceptions now !!!
References