How to run an AI girlfriend locally: the complete self-hosted companion setup for total privacy
A start-to-finish guide to running a private, self-hosted AI companion on your own hardware, covering the model, the backend, the frontend, memory, and the realistic tradeoffs versus cloud platforms.
May 27, 2026 ·
Running an AI girlfriend locally means the entire experience lives on your machine. No subscription, no content filters, no company reading your conversations, no platform that can shut down and take your companion with it. For users who've watched platforms remove features overnight or worried about where their intimate conversations actually go, self-hosting is the answer to a question cloud platforms can't address: what if the only person with access to this is me?
The setup is more involved than downloading an app, but it's far more approachable in 2026 than it was even a year ago. Here's the complete picture, from hardware through to your first conversation, with honest notes on where the experience shines and where it falls short of the polished cloud alternatives.
What "locally" actually requires
A local AI companion has three components working together. The model is the AI itself, a file you download that contains the trained weights. The backend is the program that loads the model and runs inference, turning your messages into responses. The frontend is the interface you actually interact with, where you manage characters, conversations, and personality settings.
The most common 2026 stack pairs a GGUF-format model with KoboldCPP as the backend and SillyTavern as the frontend. This combination is the de facto standard for single-user creative and companion use because it's free, runs entirely offline, and supports the features that make a companion feel persistent rather than session-based.
You also need hardware capable of running the model. This is the gating factor. Unlike a cloud platform where the provider's servers do the work, local AI runs on your GPU, and your GPU's VRAM determines what's possible.
The hardware reality
VRAM is the constraint that matters. A model that fits entirely in your GPU's VRAM runs roughly 10x faster than one that spills into system RAM, which is the difference between a companion that responds in two seconds and one that takes thirty.
The entry point for a genuinely good experience is a GPU with 12GB of VRAM, like an RTX 3060, which handles 14B-parameter models at 4-bit quantization at around 25-30 tokens per second. That's fast enough to read comfortably. A 16GB card opens up more model options, and a 24GB card like an RTX 4090 or RX 7900 XTX lets you run larger, more capable models. Our complete GPU guide for local LLMs breaks down every tier, but the short version: 12GB is the practical minimum for a satisfying companion, 16GB is comfortable, and 24GB+ is where the experience starts rivaling cloud quality.
You'll also want at least 32GB of system RAM (64GB is better), and a modern CPU to handle the parts of inference the GPU doesn't. If you don't have a capable GPU, you can still run a local frontend connected to an affordable uncensored cloud model via OpenRouter while you decide whether a GPU investment is worth it, which gives you SillyTavern's interface and customization without the hardware requirement.
Step one: pick and download a model
Your model choice shapes the entire companion experience. For roleplay and companion use specifically, you want a model tuned for character consistency and natural dialogue rather than a general-purpose assistant model. Our guide to the best uncensored models in 2026 covers the field in depth, but for companion use the standout choices are Nous Hermes 3 (premier for roleplay, holds character over thousands of turns), L3-8B-Stheno-v3.2 (entry-tier friendly, Llama 3 based, RP-tuned), and MythoMax-L2-13B (the long-standing community classic).
Download the model in GGUF format from Hugging Face, choosing the Q4_K_M quantization unless you have a specific reason to go higher or lower. Q4_K_M is the gold standard that balances quality and VRAM efficiency. Match the model size to your VRAM: a 7B-8B model for 8-12GB cards, a 13B model for 12-16GB, and larger models as your VRAM allows.
Step two: set up the backend
KoboldCPP is the simplest backend to start with because it's a single executable with no installation. Download koboldcpp.exe from the official GitHub releases page, double-click to launch the GUI loader, point it at your downloaded GGUF model file, and start it. KoboldCPP has the broadest hardware compatibility of any local LLM server, running on everything from integrated GPUs to decade-old CPUs for smaller models.
Our full KoboldCPP setup guide covers the configuration options, GPU layer offloading, and context size settings in detail. For a first run, the defaults work. KoboldCPP loads the model, opens its built-in Kobold Lite interface, and exposes an OpenAI-compatible API that SillyTavern connects to. Keep the KoboldCPP window running in the background; it's the engine.
Step three: set up the frontend
SillyTavern is where the companion experience actually happens. Download it from the official SillyTavern GitHub repository, extract it to a folder, and run Start.bat on Windows or the shell script on Mac and Linux. SillyTavern opens in your browser at localhost:8000.
In SillyTavern's connection settings, select the KoboldCPP API type and point it at the local address KoboldCPP is serving (typically localhost:5001). Once connected, SillyTavern routes your messages through KoboldCPP to your model and displays the responses in a polished chat interface that makes Character.AI and similar platforms look basic by comparison.
SillyTavern is the reason the local stack competes with cloud platforms on experience. It supports character cards (portable personality definitions), lorebooks (background information the AI references contextually), group chats, custom system prompts, and granular sampler controls. The customization depth exceeds what any cloud companion platform offers, because you control every parameter.
Step four: build your companion
Character cards are how you define your companion's personality, appearance, backstory, and behavior. You can import community-created cards from sites like the SillyTavern card repositories, or build your own. A well-constructed character card includes the personality description, speech patterns, example dialogue, and a scenario that frames the interaction.
This is where local self-hosting beats cloud platforms on depth. Where Grok Ani gives you five preset characters with no editing, and most cloud platforms limit how much you can shape personality, SillyTavern lets you define every detail and adjust it freely. The character you build is exactly the character you get, and you can refine it indefinitely.
Lorebooks add persistent world and relationship knowledge. Entries trigger when relevant keywords appear in conversation, injecting background information the model uses to stay consistent. For a companion, a lorebook might contain relationship history, shared experiences, preferences, and personality details that you want referenced naturally over time. This is the local stack's answer to the memory systems that define platforms like Nomi, and while it requires manual setup, it gives you direct control over what your companion remembers.
The memory question
Memory is where local setups require the most thought. Cloud platforms like Nomi run sophisticated proprietary memory systems that handle recall automatically. Local setups handle memory through a combination of context window size, lorebooks, and SillyTavern's built-in summarization features.
The context window is how much conversation the model can "see" at once. Modern models support large contexts (some up to millions of tokens), but larger contexts consume more VRAM. SillyTavern manages this with automatic summarization, condensing older conversation into summaries that preserve key information without consuming the full context budget. Combined with lorebooks for permanent facts, this produces a companion that maintains continuity across long relationships. Our explainer on how AI companion memory works covers the mechanics in detail, and the concepts apply directly to tuning your local setup.
Adding voice, images, and more
A text companion is the starting point, but the local stack supports voice and images with additional setup. KoboldCPP includes built-in text-to-speech and image generation support, and SillyTavern integrates with both. For voice, SillyTavern connects to TTS engines that speak your companion's responses, and to speech-to-text for talking rather than typing. The quality ranges from basic to genuinely natural depending on the TTS model you choose.
For images, the standard approach pairs SillyTavern with a local Stable Diffusion setup (via tools like AUTOMATIC1111 or ComfyUI) so your companion can generate images on demand using the same private, local infrastructure. This requires more VRAM, since you're running an image model alongside your language model, so it's most practical on higher-VRAM cards. The payoff is companion-generated images that never touch a cloud service, with no content restrictions and no per-image coin costs like the cloud platforms charge.
These additions turn the local stack from a text companion into a full multimodal experience that rivals what cloud platforms like Dream Companion or Candy AI offer, with the difference that everything runs on hardware you own. The tradeoff is setup complexity and VRAM requirements, but for users committed to the local approach, the capability is there.
Keeping your setup running
Both SillyTavern and KoboldCPP update frequently, with new features, model support, and bug fixes shipping regularly. Bookmark their GitHub pages and update periodically to stay current. SillyTavern in particular adds capabilities often, and the community produces extensions that extend it further.
Back up your character cards, lorebooks, and chat history. These live in SillyTavern's data folder, and they represent the relationship you've built. Unlike a cloud platform where the company holds your data (and can lose it or shut down), your local data is yours to back up and preserve. A periodic copy of your SillyTavern data folder protects against the one risk local setups still face: your own hardware failing.
The honest tradeoffs
Local self-hosting wins decisively on privacy, control, cost over time, and freedom from platform risk. Nothing leaves your machine. No content restrictions. No monthly fees after the hardware investment. No company that can change policies, raise prices, or shut down.
It loses on convenience and out-of-box polish. The setup takes an hour or two for a first-timer. A local 13B model, however good, won't match the conversational sophistication of a cloud platform running a much larger model with a custom memory architecture. Image generation, voice, and video require additional setup and additional models rather than coming built in. And you're responsible for your own troubleshooting when something breaks.
The decision comes down to what you value. If privacy and control are paramount, or if you've been burned by platform changes and want a companion nothing can take away, local self-hosting is genuinely the best option available in 2026, and the tooling has matured to the point where it's achievable for any reasonably technical user. If you want the smoothest possible experience with zero technical overhead and don't mind the tradeoffs of cloud platforms, the best cloud companion platforms deliver more polish for less effort.
For users who land on local, the path is clear: a capable GPU, a roleplay-tuned model in Q4_K_M, KoboldCPP as the backend, SillyTavern as the frontend, and a well-built character card with a lorebook for memory. An afternoon of setup buys you a companion that's entirely yours, running on hardware you own, answerable to no one's content policy but your own.