Introducing Llama.vim

Local LLM-Assisted Text Completion

Llama.vim revolutionizes text completion by introducing a local LLM-assisted feature, which enhances the coding experience with advanced text suggestion capabilities.

Features

  • Auto-suggest activated by cursor movement in Insert mode
  • Manual toggle for suggestions using Ctrl+F
  • Accept suggestions with Tab
  • Accept the first line of a suggestion with Shift+Tab
  • Control over maximum text generation time
  • Configurable scope around the cursor
  • Ring context with data from open and edited files and yanked text
  • Supports large contexts on low-end hardware through smart context reuse
  • Displays performance stats

Installation

Plugin Setup

vim-plug: Add Plug 'ggml-org/llama.vim' to your vim setup.

Vundle:

cd ~/.vim/bundle
git clone https://github.com/ggml-org/llama.vim

Then, add Plugin 'llama.vim' to your .vimrc within the vundle#begin() section.

llama.cpp Setup

The plugin requires a running llama.cpp server instance at g:llama_config.endpoint.

For Mac OS: brew install llama.cpp

For Other OS: Build from source or use the latest binaries from llama.cpp Releases.

llama.cpp Settings

Recommended settings based on VRAM:

More than 16GB VRAM:

llama-server \ 
-hf ggml-org/Qwen2.5-Coder-7B-Q8_0-GGUF \ 
--port 8012 -ngl 99 -fa -ub 1024 -b 1024 -dt 0.1 \
--ctx-size 0 --cache-reuse 256

Less than 16GB VRAM:

llama-server \ 
-hf ggml-org/Qwen2.5-Coder-3B-Q8_0-GGUF \ 
--port 8012 -ngl 99 -fa -ub 1024 -b 1024 -dt 0.1 \
--ctx-size 0 --cache-reuse 256

Less than 8GB VRAM:

llama-server \ 
-hf ggml-org/Qwen2.5-Coder-1.5B-Q8_0-GGUF \ 
--port 8012 -ngl 99 -fa -ub 1024 -b 1024 -dt 0.1 \
--ctx-size 0 --cache-reuse 256

For further assistance, refer to :help llama.

Recommended LLMs

The plugin supports FIM-compatible models available in the HF collection.

Examples of Usage

When utilizing llama.vim on an M1 Pro (2021) with Qwen2.5-Coder 1.5B Q8_0, the interface suggests text dynamically, displaying performance metrics such as token usage and generation time. Similarly, the plugin showcases its capabilities on an M2 Ultra with Qwen2.5-Coder 7B Q8_0, maintaining a global context over multiple files to demonstrate efficiency in large codebases.

Implementation Details

Llama.vim is crafted to be straightforward and lightweight, targeting high-quality local FIM completions even on consumer-grade hardware. Further technical insights can be accessed in the documentation on initial implementation and classic Vim support.

Other IDE