Introducing Llama.vim
Local LLM-Assisted Text Completion
Llama.vim revolutionizes text completion by introducing a local LLM-assisted feature, which enhances the coding experience with advanced text suggestion capabilities.
Features
- Auto-suggest activated by cursor movement in Insert mode
- Manual toggle for suggestions using Ctrl+F
- Accept suggestions with Tab
- Accept the first line of a suggestion with Shift+Tab
- Control over maximum text generation time
- Configurable scope around the cursor
- Ring context with data from open and edited files and yanked text
- Supports large contexts on low-end hardware through smart context reuse
- Displays performance stats
Installation
Plugin Setup
vim-plug: Add Plug 'ggml-org/llama.vim' to your vim setup.
Vundle:
cd ~/.vim/bundle git clone https://github.com/ggml-org/llama.vim
Then, add Plugin 'llama.vim' to your .vimrc within the vundle#begin() section.
llama.cpp Setup
The plugin requires a running llama.cpp server instance at g:llama_config.endpoint.
For Mac OS: brew install llama.cpp
For Other OS: Build from source or use the latest binaries from llama.cpp Releases.
llama.cpp Settings
Recommended settings based on VRAM:
More than 16GB VRAM:
llama-server \ -hf ggml-org/Qwen2.5-Coder-7B-Q8_0-GGUF \ --port 8012 -ngl 99 -fa -ub 1024 -b 1024 -dt 0.1 \ --ctx-size 0 --cache-reuse 256
Less than 16GB VRAM:
llama-server \ -hf ggml-org/Qwen2.5-Coder-3B-Q8_0-GGUF \ --port 8012 -ngl 99 -fa -ub 1024 -b 1024 -dt 0.1 \ --ctx-size 0 --cache-reuse 256
Less than 8GB VRAM:
llama-server \ -hf ggml-org/Qwen2.5-Coder-1.5B-Q8_0-GGUF \ --port 8012 -ngl 99 -fa -ub 1024 -b 1024 -dt 0.1 \ --ctx-size 0 --cache-reuse 256
For further assistance, refer to :help llama.
Recommended LLMs
The plugin supports FIM-compatible models available in the HF collection.
Examples of Usage
When utilizing llama.vim on an M1 Pro (2021) with Qwen2.5-Coder 1.5B Q8_0, the interface suggests text dynamically, displaying performance metrics such as token usage and generation time. Similarly, the plugin showcases its capabilities on an M2 Ultra with Qwen2.5-Coder 7B Q8_0, maintaining a global context over multiple files to demonstrate efficiency in large codebases.
Implementation Details
Llama.vim is crafted to be straightforward and lightweight, targeting high-quality local FIM completions even on consumer-grade hardware. Further technical insights can be accessed in the documentation on initial implementation and classic Vim support.