Tshirtman

15 avril 2025

Local “copilot” like development with Vim.

why?

As many people, I’ve been curious (but hesitant) to jump on the trend of using LLMs for coding, one of my reluctance is that I didn’t want to depend on a 3rd party service, paid or not, during my development, I know all things are by nature ephemeral, but I would like, if possible, my tools to stay in my control.

I’ve also not been too good as using a separate tool, as stoping my workflow, to ask a question to an LLM, giving context, etc, only made sense when I was hitting a stumbling block, and in that case, I should rather think and do research, than ask an LLM for a magical solution (though sometimes it can help), and I had the impression than the more useful case was for mundane things, when I know full well what to write, but an LLM can also pretty quickly see where this is going, and complete the idea, saving me a lot of typing.

So I was more tempted to use local models, than remote ones, and I wanted things to integrate with Vim (no, not neovim, for reasons i won’t get into now, I’m sticking with the traditional one, at least for now), functioning as a completion engine.

How

After exploring a few solutions, here is what I found to work decently for me.

installing llama-cpp to run models.
adding Plug 'ggml-org/llama.vim' to my ~/.vim/plugin/plug-list.vim
adding a llama-config.vim (see below) file in ~/.vim/plugin/ to set my preferences
adding a copillot (see below) script in ~/.local/share/bin/ to start the engine with my prefered parameters.

llama-config.vim

" put before llama.vim loads
" let g:llama_config = { 'show_info': 0 }
highlight llama_hl_hint guifg=#f8732e ctermfg=209
highlight llama_hl_info guifg=#50fa7b ctermfg=119
let g:llama_config = {
    \ 'endpoint':         'http://127.0.0.1:8012/infill',
    \ 'api_key':          '',
    \ 'n_prefix':         512,
    \ 'n_suffix':         128,
    \ 'n_predict':        128,
    \ 't_max_prompt_ms':  500,
    \ 't_max_predict_ms': 500,
    \ 'show_info':        1,
    \ 'auto_fim':         v:true,
    \ 'max_line_suffix':  8,
    \ 'max_cache_keys':   250,
    \ 'ring_n_chunks':    16,
    \ 'ring_chunk_size':  64,
    \ 'ring_scope':       1024,
    \ 'ring_update_ms':   1000,
    \ }

copillot

#!/usr/bin/env sh

# pretty slow supposedly better?
# MODEL="Qwen/Qwen2.5-Coder-32B-Instruct-GGUF"
# also a bit slow
# MODEL="ggml-org/Qwen2.5-Coder-14B-Q8_0-GGUF"
# pretty fast!
MODEL="Qwen/Qwen2.5-Coder-3B-Instruct-GGUF"
# really fast!
# MODEL="Qwen/Qwen2.5-Coder-1.5B-Instruct-GGUF"

PORT=8012 
BATCH_SIZE=2048
GPU_LAYERS=99
CTX_SIZE=0 # 0 = use model max context
CACHE_REUSE=256

llama-server \
    -hf $MODEL \
    --port $PORT \
    -ngl $GPU_LAYERS \
    -fa \
    -ub $BATCH_SIZE \
    -b $BATCH_SIZE \
    --ctx-size $CTX_SIZE \
    --cache-reuse $CACHE_REUSE

(need to run chmod +x ~/.local/bin/copillot)

How does it work together?

For now, I manually run copillot in a terminal when I need/want to, and mostly forget about it. Then I simply edit any file with vim, and the plugin will use the shared port to get suggestions. When I type in insert mode, the model will generate one, and the plugin will use virtual text to display it, at this point, I can either: – keep typing, ignoring it. – press to complete only the current line – press to insert the whole suggestion.

As I selected a small variant of the model, I trade accuracy for speed, the model is not going to suggest very smart things, but it’ll usually answer in much less than a second when I pause my typing, and since most of the code I type is not ground breaking, it often sees where I’m going, and can save me a few lines of typing (and the typoes that come with them), even if I might need to edit them (after all, I’m using vim, editing is what we are good at), and let’s not fool ourselves, I’d have to edit them anyway.

If I type from click import

my buffer immediately looks like this

from click import |command, option, argument
from typing import List
@command()
@option('--name', default='World', help='Name to greet')
def greet(name: str) -> None:                                                                                                                                                                                                                                                                                                                                                                                                                                                                       """Greet someone."""

While my cursor is still on the space after import I can decide to accept this suggestion, which will give me the start of a quick hello world with click, neat! If I accept it, I’ll get the rest of it as a followup suggestion.

But of course, that’s a very simple demo, if I have more context, with multiple buffers, classes defined in them, etc, it can relatively smartly use them and infill my current line depending on what’s being done elsewhere in the file. It’s not very smart, I still need to type some code (or sometimes, a comment) to indicate where I’m going, but I’m quite impressed by how much of the day to day stuff it can churn.

There is a rule though, when I get a completion, it should look like what I’m expecting, if not, I should be able to read and understand it (of course, one must understand the code they commit), and if not, I should really look up what part of it I don’t know, and see if it fits. The danger of “vibe coding” is that you get a lot of code you don’t understand and can’t debug, and that’s a terrible place to get your project to, it’s not really a new danger, copy/pasting code from somewhere, and tinkering to make it work has been the practice of many coders for many ways, and the cose of many regrets.

But sometimes, too, it does teach me a simpler way to do things, than I was about to do, and after checking that it really does work, I do appreciate it just like I would if a coworkers had shared it in a paring session.

16 août 2024

Let's do a bit more testing, shall we?

Can we put an image in there

A sculpture of a baby elephant, sitting on the street, in London

Picture of a building, made of bricks, in london

These images are hosted on google photos, and integrated using markdown, the process to get a link to a google photo image takes a few steps, but is not too hard: – first you have to share a picture, or a series of pictures, with a public link – open that link, and select the (or a, if you shared mulitple ones) picture from the gallery to view it. – right click it, copy the link to the image, or open the image in a new tab, and copy the link from url (or right click) from there. – type ![]() in your blog post. – paste the link to the image inside the parentheses. – type an image description inside the brackets. – if you want the description to be visible when the mouse overs the image, you can add it again, after the url, inside quotes. – the end result will look like ![this is an image description](https://example.com/url/to/image.jpeg "this is the image title") – you are done!

16 août 2024

Testing this nice piece of software.

Well, it's not vim, i can tell you that, but indeed it's still comfortable to write and read, certainly a good place to dump one's thoughts without too much complication.

I used to have a blog somewhere, should i try to unhearth these old posts? Not sure, a couple posts were interresting, i think, but half is probably mostly outdated, anyway.

Anyway, one thing i'm not totally sure about this software, is the localisation situation, i don't see anyway to set up the language in configuration, but it seems to display some parts of the UI in french (like dates), to please my browser's settings surely, but other elements seem to be in English.