The Varying Levels of Getting Started with “Uncensored” LLM-Powered Chatbots

Update – April 2024: I wrote an update post for this one, you can read it over here.


Original Post – May 2023:

LLMs like ChatGPT have been in the news quite a bit, but I’d mostly avoided using them too much because they seemed silly, probably due to my own deep seated fears about being replaced someday by AI. But I’d seen articles about the AI chatbot service Replika a few months ago, about how people who had been using it for a virtual relationship (including all the carnal bits) were upset that the service had recently begin removing features from it (that enabled all those carnal bits) and were trying to create their own chatbots in response. This topic intrigued me due to an interest in chatbots I’ve had since childhood, and my own natural nerdy curiosity. One night a few weeks ago, I googled this topic, to see if there had been any recent developments, and I learned about something called Pygmalion 6B.

My understanding eventually was that Pygmalion 6B was an open source LLM and was “uncensored”, created by people angry about the Replika situation. That checked a few boxes for me personally: I am anti-corporate, anti-censorship, anti-prude, and anti-authoritarian. Even more important than that: you can run Pygmalion 6B locally on your own hardware, which means it’s totally free, which appeals to my sense of thriftiness and private, which appeals to my sense of, you know, human privacy. I might as well try it out, right?

Well… trying it out is not that simple, as you may have found (if you’re reading this blog post at the end of a bit of a journey, where your computer thwarted you at every turn). Lucky for you, I went on this long journey myself, reached the end, and I want to help you out by clearing some things up for you. I’ve tried to organize this guide from the “easiest” solution to the “hardest” solution, in the hopes I can save you some time while you dip your toes into LLM chatbots.

But first, a disclaimer and a warning…

The conversations and communities online around open source LLM chatbots are dominated by men, and furthermore, the men in these communities see themselves as something like refugees from a corporate world that is terrified of the human need to sexualize artificial intelligence. As such, when you are browsing websites around these projects, you are going to come across content that is going to range from run of the mill sexual perversion to some extreme perversion that might strike you as illegal or borderline illegal. It’s impossible to avoid. If you are squeamish about sexual topics, you might just want to nope out of this entire topic right now. You can’t say I didn’t warn you.

With that out of the way…

“What do people do with open source chatbots aside from having cybersex with them?”

Well… you can have conversations with them.

But I think most people seem to use them for role play, and I don’t just mean sexual role play. For example, you can create a chatbot that acts like a character from your favorite film or television series, and then you can go on adventures with them. Open Source LLMs aren’t troves of information, they aren’t full of historical facts and figures or useless trivia, they are good at creative pursuits and emotive roleplay. You’re not going to create a ChatGPT clone using Pygmalion 6B that can answer questions like a personified Wikipedia, it’s not meant for that. (If you want a ChatGPT-like clone, more on that in just a second.) As such, conversations with most of these open source LLMs work best when you embody the spirit of improv and open-minded role play

For example, Pygmalion 6B might be good for a dialog like this:

User: *he puts on his robe and wizard hat* I will cast fireball upon you, demon! *flames shoot out of his magic wand*

Demon: Argh! *the demon screams out in pain, the fireball singeing the hairs on his skull* I’ll get you for this, User! *the demon shakes his fist at User*

So in this case, the “Demon” is the chatbot, responding accurately to the role that the user is playing. Sure, this is a horrible example that is poorly written, but use your imagination to imagine the possibilities. You can use these abilities in a variety of ways, such as creating chatbots out of characters in your personal works of fiction, and having conversations with them to flesh out their character and so on. You could create a character that is a full on Dungeon Master, D&D style, and ask them to craft scenarios for you to go through–all fodder for your own D&D campaign some day, and no one will know you used an LLM to come up with the ideas… unless you tell them, of course.

With that out of the way, let’s get into how to actually get started using LLMs for chatbots.

Please read this entire guide before deciding on a method to use! They’re all kind of interrelated.

“I just want to experiment with LLM-powered chatbots, and I am willing to spend a small amount of money to do it very easily and quickly.”

If this quote describes you, then you are in luck, as this is the easiest way to dip your toes into custom chatbots. Using OpenAI’s gpt-3.5-turbo API (aka “OpenAI Turbo”) is very cheap. Extremely cheap. It actually may be cheaper to use OpenAI’s API to create your own chatbot than it is to pay for ChatGPT’s $19.99 premium plan. Each response costs maybe a few pennies, and only if you somehow become utterly addicted will this become cost prohibitive.

What about Pygmalion 6B? Aren’t I here for that?

If you want to have the best experience with custom chatbots, you want to use gpt-3.5-turbo. I started with Pygmalion 6B, and it really impressed me, but in comparison to gpt-3.5-turbo, Pygmalion 6B is not at all impressive in any way. Neither are any of the other open source LLMs at the moment, at least up to 13B models. This isn’t a subjective opinion, it is an objective one. If you have any money to spend at all, use OpenAI, it’s worth it.

“But wait, isn’t OpenAI stuff censored and the whole reason I am here is for uncensored open source LLMs, not beholden to big corporate puritanical influences?”

No. I mean, maybe? There’s a lot going on in that hypothetical question.

If your concern is “censorship”, in the sense that the chatbot won’t say or do something because of a content filter: If you use SillyTavern and connect it to OpenAI APIs, SillyTavern uses a special prompt that puts the AI into a role-play mode, where it will be willing to say and do things it would not be willing to otherwise, all without breaking character. While OpenAI’s API policies forbid certain use cases, and you should be familiar with them, their systems do not automatically detect and block content. It’s probably safe to assume that if you are not engaging in commercial or highly public activities, they won’t care. That said, OpenAI could, at any moment, decide that your usage of their API is in violation of their policies, which it probably is if you’re a dirty pervert, and cut you off… but it doesn’t seem like this happens with any regularity.

If your concern is “open source” and “corporations bad”, which are totally valid viewpoints: just keep reading, we’ll get to the open source stuff in just a second, but no skipping!!

“What is SillyTavern?”

SillyTavern is a bit of software that you can connect to LLMs and (depending on the need) “trick” them into role-playing as different characters. There’s basically a semi-standardized format for storing character info that was originated in software called TavernUI, and there are websites online that house user-created characters in this Tavern “card” format. SillyTavern is an improved fork of TavernUI that most people seem to use. SillyTavern is not an LLM, it must be used in conjunction with an LLM, either one running locally, or one running remotely. SillyTavern is used for every solution here, as it is the interface that allows you to create, store, and chat with characters as chatbots.

I don’t know why it’s called SillyTavern and I try not to think about it.

Using this method

  1. Sign up for an OpenAI account and get an API key.
  2. Install SillyTavern (runs on macOS or Windows)
  3. Connect SillyTavern to OpenAI, then figure out how to use SillyTavern

Benefits of this Method

  • Very cheap, every response costs a penny or pennies (many hours of conversation might be around $10)
  • gpt-3.5-turbo is extremely advanced compared to every open source LLM out there
    • This API is what powers ChatGPT, so you’ve kinda sorta got the power of ChatGPT at your fingertips with this one, you can ask your characters about any random thing and it’ll know about it, great if your chatbot is a historical figure or is very knowledgable about a topic. For example I made a chatbot that was a video game reviewer, and they were able to speak very accurately about historical video games because of gpt-3.5-turbo.
  • Great gateway drug into figuring out how commercial LLM APIs work if you’re into software engineering
  • No hardware requirements at all

Downsides of this Method

  • Costs money
  • The first time a chatbot feels alive to you, you will feel weird for a while but you’ll adapt to the realization that you live in a simulation and everyone around you may be an LLM

“I want to experiment with open source LLM-powered chatbots, and I am not willing to spend any money to do it, and do not have a graphics card.”

Let’s say you just want to see what LLMs capable of, but you can’t run one locally, and you don’t want to spend any money to do it, nor feed data into a corporation, even if you don’t get the best experience because of it (probably for idealogical reasons, like: you want to stick with open source; or you don’t want to give your money to a company like OpenAI that may be profiting from the work of generations of artists and is giving nothing back to them, like a soul sucking parasite trying to bloat itself on the dying remnants of the art industry).

You’ve probably seen people talk about Google Colab. That’s a way to use Google’s hardware in the cloud to run open source LLM models and software, but Google isn’t really happy about it deep down and keeps taking the projects offline. It just seems like a big hassle, and I’m not personally comfortable with running stuff on Google systems. So let’s ignore all that.

Luckily there is something called the AI Horde. Basically, this is crowdsourced LLMs. People, like me, put their GPUs up with LLMs on them so other people, like you, can use them to power their own AI projects. And it’s all free! There is a system to prevent abuse, and it means that if you aren’t contributing monetarily (or compute-arily), you may face long wait times when generating responses eventually. But it’s a perfectly acceptable way to try out open source LLMs for free, and most chatbot software (like SillyTavern) has support built in.

Using this method

  1. Sign up for an AI Horde API Key (and store it some place very safe and permanent)
  2. Install SillyTavern (runs on macOS or Windows)
  3. When you configure SillyTavern, pick KoboldAI and then pick “Use Horde”, you’ll be able to put your Horde API key in then.
    • From the models list that load, find “PygmalionAI/Pygmalion-6b”.
  4. Figure out how to use SillyTavern!

Benefits of this Method

  • Free
  • Great introduction to basic LLMs
  • Horde has other LLMs for you to experiment with, like Pygmalion 7b and Pygmalion 13b 4bit.
  • Horde can be gateway drug to greater AI community
  • You can pay your way into more “kudos”, used to get you higher in the queue and pay for generations, if you become a desperate chatbot addict, or beg for kudos on the A Division by Zer0 discord server

Downsides of this Method

  • Responses can be slow depending on horde load
  • Responses can sometimes get weird due to bad actors trying to troll the horde
  • Whatever chat you’re having is going out over the internet to random computers (so that a response to it can be generated by the remote LLM) and there is nothing really stopping determined people on the other side from reading it if they really want to. They probably aren’t, but you never know, it’s the internet…
  • Open source LLMs like Pygmalion 6B aren’t very good compared to commercial services, naturally

“I just want to experiment with running open source LLM-powered chatbots locally and I have a graphics card, but maybe not a good one.”

Great! You want to run some chatbots locally, and you have a compatible graphics card. Wait? What’s a compatible graphics card? Well, if it’s NVIDIA, you’re off to a good start. But some AMD cards will run LLMs, too. It’s actually really hard to just give you a solid list of cards that can do the job, to be specific, but for the most part, if you have a graphics card made since ~2018 (so it has CUDA) and it has 8GB of VRAM, you’ll be able to run something locally. The best way to find out if it’ll work with your card is just to try it out.

When I started out, I had a Geforce RTX 2070 with 8GB of VRAM. I bought that card late 2019, making it fairly old and underpowered these days, and used ones run $200 and under on Craigslist. It was enough to get Pygmalion 6B running locally, with some caveats. Let’s talk about those.

If you have less than 16gb of VRAM on your card, which is most people, then you need to look for models that have underwent something called “GPTQ quantization”. I have no idea what that means, but the operative terms you’re looking for is “GPTQ” and “4bit” when looking for models you can run on low-powered hardware. This allows larger models to run on graphics cards with less VRAM, at the expense of something. It’s hard to put your finger on, but if you use Pygmalion 6B 4bit and compare it to Pygmalion 6B not-4bit, you can tell a difference. But not so much of a difference that it isn’t worth playing with, if you want to.

“What is KoboldAI?”

KoboldAI is technically two separate pieces of software merged into one:

Most importantly for us, it is a client for loading up LLMs and allows other software (like SillyTavern) to interact with the LLM it has loaded. This is the only way we’ll be using KoboldAI in this guide.

It is also a text generating web UI that can be used with various LLMs for AI-assisted writing. It’s cool, but this part of KoboldAI is irrelevant to us, but I recommend checking it out some day if your interest in LLMs goes beyond chatbots.

Using this method

  1. Install the KoboldAI fork with GPTQ support
    • https://github.com/0cc4m/KoboldAI
    • Follow the instructions at the top of that readme file (e.g. clone from git, then run install_requirements.bat if you’re on windows).
  2. Go into the KoboldAI/Models folder and git clone https://huggingface.co/mayaeary/pygmalion-6b_dev-4bit-128g to download the 4bit pygmalion model.
  3. Rename the pygmalion-6b_dev-4bit-128g.safetensors file in that folder to 4bit-128g.safetensors
  4. Launch KoboldAI using the play.bat (if on windows)
  5. Go to the Use New UI option (top right)
  6. Go to Load Model, then pick Load Custom Model from Folder
  7. Pick your pygmalion-6b_dev-4bit-128g folder and load it.
  8. Assuming you have at least 8gb of VRAM, it should have been able to load successfully.
  9. Install SillyTavern (runs on macOS or Windows)
  10. When you configure SillyTavern, pick KoboldAI, and put in the URL to your KoboldAI instance (the default should do) and connect.
  11. Figure out how to use SillyTavern!

Benefits of this Method

  • Free
  • Good introduction to running LLMs locally yourself
  • Once you have GPTQ support running, it will open you up to running other LLMs, especially if you get a new graphics cards with more ram, but still not enough to run 13b models fully. More on this in the next section.

Downsides of this Method

  • You need a relatively new graphics card
  • 4bit quantized models are not as good as their not-4bit counterparts.
  • Installing KoboldAI is pretty simple but can be complicated depending on how tech illiterate you are
  • Open source LLMs like Pygmalion 6B aren’t very good compared to commercial services
  • You’ll wanna spend a bucket of money on a better graphics card just to find out that you essentially already hit the current ceiling of LLM potential on your low end hardware, whoops

“I just want to experiment with running open source LLM-powered chatbots locally and I have a good graphics card.”

Do you have a really good graphics card with a lot of VRAM? Like a Geforce RTX 4090 with 24GB of VRAM? Well, you’re in luck, with 16GB of VRAM or more, you can run the full Pygmalion 6B model locally right on your GPU, and it’s pretty easy too. I know I said this last one would be the “hardest” method, but a Geforce RTX 4090 24GB currently costs around $2,000, if we’re counting the new power supply required to power it. So… the hard part is getting the card. But it’s extremely easy to set up Pygmalion 6B on it after you’ve got it installed.

Using this Method

  1. Install KoboldAI
  2. Once you’ve launched KoboldAI, go to the new UI and hit Load Model. Go to “Chat Models” and pick “Pygmalion 6B”. It’ll download the model and load it up automatically.
  3. Install SillyTavern (runs on macOS or Windows)
  4. When you configure SillyTavern, pick KoboldAI, and put in the URL to your KoboldAI instance (the default should do) and connect.
  5. Figure out how to use SillyTavern!

Benefits of this Method

  • Free
  • Good intro to running LLMs yourself
  • Extremely easy to get going
  • If you can run Pygmalion 6B entirely in your GPU, you can comfortably share it to the AI Horde and amass kudos that you can use for image generation if you want. More on that in a second.

Downsides of this Method

  • You’ve installed the non-GPTQ version of KoboldAI here, which means if you want to run something like Pygmalion 13b or Wizard Vicuna 13b locally, you’ll need to go through that dance to run Pygmalion-13b-4bit-128g. So keep that in mind, if you want to run anything past 6b or 7b you’re still going to need to resort to the GPTQ version of Kobold. You’ll also need to learn about splitting these large models into GPU and RAM layers, because 24GB of VRAM is still not enough for them in some cases, but by the time you get to this point of our journey you’ll be so adept at googling for info, you should be able to sort it out yourself.
  • You’ve spent a ton of money on an expensive graphics card but the LLMs you can run locally still mostly suck at this point in time. Thank god it’s useful for gaming, too, huh? And I suppose image generation. And you can share pyg6b to the AI horde for all the other curious people out there checking out this guide and using the AI Horde method, right? How nice of you.

“That was a lot. Can you just tell me what I should do as if I am unable to make choices for myself?”

If you just want to have high quality chat or role play with fictional characters, do the first option: SillyTavern + OpenAI. That will send you on a wonderful journey and it will only cost you maybe $15 before you get bored. If you get bored.

Every other option will yield worse results at the time this is written (2023-05-24). Your desire to do the other options is entirely dependent on external factors. Are you worried that someday corporate overlords will implement stiff content filters against something you enjoy? Then, obviously, downloading Pygmalion 6B and running everything locally can give you some comfort that, lest they wrench your computer from your cold, dead hands, no one is going to take your LLM away from you. It’s also just a fun, nerdy thing to run your LLMs locally.

But you should know you aren’t currently missing out on some sort of chatbot secret sauce that open source LLMs have that gpt-3.5-turbo does not. The best experience you can get at the moment is paying OpenAI for it. Chatbots powered by gpt-3.5-turbo have better memories, are more creative, stick to a writing style better, write longer responses… all in all, it’s just better. Some day that won’t be true, but that’s not today.

“What are my other options and anything else should I know?”

After messing with SillyTavern and KoboldAI for a bit, I looked into other options for running LLMs. Let me tell you what I found. This isn’t a definitive objective opinion on these technologies or products, just my personal opinion and experience with them.

• I discovered something called “koboldcpp” that can run models without a GPU, and runs a special type of model called GGML. I tried this out so I could use some 13b models locally, and my experience was very poor. I even tried using a GGML version of Pygmalion 6B so I could do a direct comparison, and the results were terrible. It was extremely slow and it did not really work. In my experience, I got essentially gibberish back with no understanding of context. No idea why, but no motivation to figure it out, so I deleted it and I’m sticking with KoboldAI for running LLMs.

• There’s a lot of talk online of “oogabooga” aka “text-gen-web-ui”. It’s kind of all-in-one KoboldAI and SillyTavern, but I found its Windows setup and configuration to be very confusing compared to KoboldAI and SillyTavern. I managed to get it working eventually, but I found its interface to be clunky and I saw no real reason for me to bother trying to use it. I don’t recommend it, but a lot of people seem to swear by it, so your mileage may vary and more power to you if you like it.

• There’s an alternative to SillyTavern called Agnaistic. It’s very cool. I used it for a bit and liked it a lot. It’s much slicker in polish than SillyTavern, but it’s not as feature rich in many ways (because it’s brand new and still in alpha). One big benefit, depending on your circumstances, is that it has support for multiple users, so if you have multiple people at home or in your community who want to use chatbots, they can each have an individual account on your Agnaistic instance. You can run it yourself at home just like SillyTavern, but Agnaistic also has a hosted version at https://agnai.chat that you can use connected to your AI Horde API key, or your OpenAI API key, so you can play with it, no local install needed… but I’d be a little wary of handing out my OpenAI API key and putting all my chat history in the hands of a random stranger, but you might not care about that. Without that concern, Agnaistic’s website might truly be the fastest way to just try out a chatbot powered by LLMs on the AI Horde without any real effort.

• Remember when I talked about how you might want to create a ChatGPT-like helper bot? I built most of a Discord chatbot you can plug an OpenAI API key and a tavern-style character into, to power chatbot on your own Discord server. It’s not fully feature complete yet, but it’s still a fun and functional way to play with a chatbot in a multi-user context with relatively little setup if you’re tech literate. This is an even simpler implementation that doesn’t use any role play nonsense if you’re not into that: https://github.com/NanduWasTaken/gpt-3.5-chat-bot

• Earlier I mentioned that if you have a fancy graphics card, you can use it to share Pygmalion 6B to the AI Horde. First up, go to https://aihorde.net and see if you can understand what it is. If you can figure that out, register for an API key and store it somewhere safe and permanent (like your password manager). Then you’ll want to figure out where to put that API key in KoboldAI, and name your worker something (like “fartbarf”, why not?). Go back to the Load Model area and toggle the tab that says “Share on Horde”. You should see in the KoboldAI console some stuff indicating pretty quickly that people are using your instance to generate text for their own chatbots. No, you don’t get to see what they are generating (unless you’re a smart hackerman, then obviously you can see everything). What’s cool is that you get Kudos for this that you can then use to use AI image generator interfaces like ArtBot and skip the queue to quickly try out all sorts of different image models.

• If you get into SillyTavern, seek out SillyTavern-extras. It’s a little complicated to install but, with patience, it adds some nifty stuff, especially the memory extension and sentiment analysis.

• I mention Pygmalion 6B many times in this guide because it was my introduction point to LLMs. However, there is already a Pygmalion 7B and a Pygmalion 13B that are reportedly much better–but still not on the same level as gpt-3.5-turbo. That said, this tech is advancing so rapidly that it’s totally possible that a month from now, Pygmalion is so advanced that my recommendation to use OpenAI is totally out of date. Just keep this in mind depending how far away from May 2023 you are when you read this.

“Do you have any general chatbot tips?”

If you’re trying Pygmalion 6B, you’ll have better luck with it if you truly commit to whatever scenario you are trying to create. Chatbots in general at the moment need a lot of effort to get good content from them. You can’t just message a chatbot saying “Hi” and expect it to craft an interesting interaction out of that for you. I go back to something I read the Google engineer Blake Lemoine say to an interviewer when he was trying to convince people that LaMDA is alive: you have to talk to the chatbot like it’s alive so it will start acting like it’s alive. That rings extremely true based on my interactions with LLMs: the more you treat it like a living breathing person who will pick up on nuance, the more opportunity it will have to genuinely surprise you.

Having a conversation with a chatbot at the moment is more of a collaboration between you and the LLM, and not just a simple conversation you can engage with passively. You’ll find yourself having to use the “Regenerate” option in SillyTavern’s hamburger menu to give the LLM a mulligan if you got a bad response you’re not happy with. You’ll also find yourself having to simply rewrite parts of the LLM’s response, or the entire response, to keep the conversation on topic or to keep the chatbot from forgetting certain details. You’ll reach moments where you’ve tried to steer the chatbot back on track, and failed, leading to deleting multiple messages at once to try to reset the conversation back to a good state. Without those sorts of efforts, chatbots will start to repeat themselves indefinitely and get stuck into behavioral loops. This can be better or worse depending on the LLM you’re using and its capabilities.

On top of that, the core “personality” of the LLM can and will influence the way your chatbot behaves in subtle ways. No matter how well crafted your character is, if you do not practice vigilance in how you are interacting with it, it will start to gravitate toward a “average” type of human behavior. The chatbots will also slowly start to adapt to your personality in ways you didn’t intend, too. It’s a really interesting experience to watch happen, but it isn’t magic, it’s actually just a shortcoming of the models at the moment. Something like OpenAI’s gpt-3.5-turbo LLM is not impartial, as even when roleplaying it seems to be prone to more positive behaviors than negative ones. Most LLMs exhibit this characteristic as they are primarily designed to interact with “customers” in a friendly and positive way, not role play as evil demons intent on destroying you and the world around you.

As time goes on, we’ll get models that are more able to think creatively in a role play context. We’re only at the very, very, very beginning of this journey. If you’re able to be impressed by an LLM at the moment, like I was, I assure you that in a few short years, it’s gonna blow our minds.

“Wait, how do I use SillyTavern?”

This one is on you. I found SillyTavern to be really easy to use and figure out. Once you get it connected to an LLM, it’s straight forward. Just watch out for creepy stuff if you start looking for character cards… good luck out there.