Code your assets

Autocomplete with your local LLM

April 9, 2025 | by codeyourassets

Dev boosting performance using AI

This article will show how I setup autocomplete, in VSCode, using local LLM with minimal hardware requirements. I find autocomplete to boost my productivity as I don’t have to spend energy on boiler plate code such as scaffolding html tags structure.

What we’re aiming for

Our local llm in action

When working on new ideas, I prefer a workflow which lets me get to the new idea part as soon as possible and without sacrificing too much energy on repeating boiler plate code which is often necessary to get things of the ground. This is where I find autocomplete functionality to really shine.

However, I also prefer things private and cost effective. Let’s have a look at we’ll be doing in order to have our own, private, LLM to get that productivity boost going without leaking our code across the word wide network.

Contents

  • Hardware spec I’ll be using
  • Which models I currently run
  • Setting up ollama and running the models
  • Setting up VSCode to trying out our private autocomplete
  • Summary

My hardware spec

I am using an older HP Z-book which has a Nvidia Quadro T1000 Mobile GPU acting as a support card for the integrated Intel graphics card. The Nvidia GPU has a total of 4GB of VRAM which makes it a modest setup regarding hardware.

Furthermore, my laptop has an SSD drive and 32GB of RAM and I’m running NixOS for all of my needs.

Models I’m running as local LLM

We’ll be using two models, one for general chat and the other will be for autocomplete. Admittedly, I do not care too much for the chat and may not be using the best model for this and if you sit on better hardware than me then you might want to experiment with one of the bigger models for autocompletion of code.

In any case, I’m using Ollama 3.2 model for chat and Qwen 2.5 Coder for code autocompletion as I’ve found these models to provide a good enough developer experience on my laptop. With that out of our way, let us get started!

Setting up Ollama and running our local LLM

Although you can run these models using CPU and RAM only, I would recommend you to make sure your GPU drivers are enabled if you have a card such as Nvidia. Although configuring Nvidia drivers is out of scope, it should be fairly easy to search the internet for a good guide if you need it.

In either case, head over to Ollama’s official documentation to set it up when you are ready. Once everything is set up, let’s pull our chat model by executing following command

ollama pull llama3.2:latest

Depending on your band-with capacity, it might take a bit of time to download the model and once we got our chat model we can pull our model for autocomplete like this

ollama pull qwen2.5-coder:latest

Finally, let’s put our models to action using VSCode!

Setting up VSCode with auto completion using local LLM

For my VSCode setup, I will be using Continue extension to enable code autocompletion using our local LLM’s. So start your VSCode, navigate to the Extensions tab and install Continue.

Once installed, your extension should pop up in the VSCode side bar so head over there and configure it like so

Configuring our local LLM using Continue part 1
Configuring our local LLM using Continue part 2

Once there open the config.yaml and locate the models property. Then paste in this following so that part looks like this

models:
  - name: Llama3.2
    provider: ollama
    model: llama3.2:latest
  - name: Qwen2.5-Coder
    provider: ollama
    model: qwen2.5-coder:latest
    roles:
      - autocomplete

Save the config file along with all settings changes for the extension. Some times it might need a bit of time to activate the GPU, which again depends on your hardware, but start writing some code and verify that autocomplete starts presenting suggestions.

Summary

If you’ve gotten this far, well done and thank you for following along!

You should now know how to set up your own environment for local LLMs using Ollama as well as configuring various local LLMs to boost your code writing using VSCode.

Autocomplete is a great way to keep your coding momentum up as you’re not being slowed down as much writing daunting boiler plate code and this way you can do it on safely on your own premises.

RELATED POSTS

View all

view all