cdx Posted September 3 Share Posted September 3 Hi, I'll put here what I learned about making styled portraits with Stable Diffusion. Both creating ones from scratch and restyling existing images. Others could share tips as well. I used SD1.5 and BG1 style. Install automatic1111 I used automatic1111 from here https://github.com/AUTOMATIC1111/stable-diffusion-webui/ on Windows. For Win10 and Nvidia card it was just downloading and executing the installer, it downloads everything it needs. It takes a while the first time it runs. If you have a compatible Nvidia video card (if it's fairly recent it should be, mine is 8 years old and is) you should probably enable xformers to speed up how fast things work. Go to and right-click and edit "INSTALLDIR/webui/webui-user.bat". Change the line "set COMMANDLINE_ARGS=" to "set COMMANDLINE_ARGS=--xformers". There are two dashes before xformers. Start "INSTALLDIR/run.bat", then open a browser and go to address http://127.0.0.1:7860/ Install checkpoints, lora, embeddings The web interface is now running, but we need models to use it. The main models are "checkpoints", also there are various additional models that interact with these for effects, e.g. fixing hands, faces, changing styles, applying better facial expression, etc some of which are "LORA"s which we'll also install. The ones I mention use "SD1.5", meaning all are derived from working with Stable Diffusion v1.5. This isn't the latest and greatest, but it works fast and allows quick iteration. The original Stable Diffusion was SD, SD1.5 is much improved, SDXL follows and is better, but needs expensive graphics cards or runs quite slow. To get the models, go to civitai.com and download https://civitai.com/models/123777/realistic-fantasy. This is the checkpoint (actually a mix of them) I've used most, let me know if you find something better! It's around 4GB. Put the file in "INSTALLDIR/models/Stable-diffusion/". In the web interface, at the top, click the blue refresh button and select realisticFantasy_v20. This is our base model. (Another one I've used for good results is https://civitai.com/models/189867/realisticmix666?modelVersionId=238684 v2, you can get that or newer v4 and try it.) When there are both a full and a pruned model, you can ignore the full one, unless you want to use it for training, the pruned should be smaller and faster for same results. Next is the BG1 style LORA, which is here https://civitai.com/models/147640?modelVersionId=164668. Download it and put the file in "INSTALLDIR/webui/models/Lora/". Do the same for a facial expressions LORA, which makes some portraits a lot more alive https://civitai.com/models/308591/expressions. Here is an example of a full prompt I used. Before we can use it to full effect we need to do a few more things. You can copy everything in one go, paste it in the first text box below "txt2img" tab label and then press the blue square button under Generate and all the settings will be correctly applied (except Batch count/size, which don't get changed by this button, but they are ok). <lora:Baldur's Gate 1 Portraits:0.5>, Baldur's Gate official portrait, character, fantasy_character, 1024x768, upper body portrait, male, halfling, rogue, grey leather armor, hood, curly hair,side look, tilted head, abstract background of roof shapes at night, facial expression, smile, masterpiece, highest quality, highest quality visuals, highest details Negative prompt: ng_deepnegative_v1_75t, Beyond Negative-neg, garbage quality, erroneous, incorrect, atrocious quality, faulty, synthetic, dissatisfactory, quality deficient, junk, below par, lousy, below average, subpar, substandard, abominable, godawful, grody, unacceptable, defective details, slipshod, erroneous, inexpertness, callowness. incompleteness, inferior quality, crappy quality, second-rate, poor quality, cheap, pathetic, low effort, standard Definition, worst quality, unfinished, low resolution, jpeg artifacts, logo, signature, watermark, cropped, low quality, medium quality, blurry, blur, ui, gui, username, depth of field, subtitles, censored, monochrome, poorly drawn, poorly painted, topless, animal ears, crosseyed Steps: 30, Sampler: DPM++ 2M, Schedule type: Karras, CFG scale: 7, Seed: 69725074, Size: 512x768, Model hash: 8a5ee35fbe, Model: Realisticmix666_v20, Clip skip: 2, Lora hashes: "Baldur's Gate 1 Portraits: 0aa4f1c96e50", Version: v1.10.1 A short explanation of the parts of the prompt. No need to remember these, but it can be useful if you're wondering about something: Spoiler Main prompt: "<lora:Baldur's Gate 1 Portraits:0.5>" - this is how you set how much the Lora applies to the base model. I find 0.5 to 0.6 work best. Lower than 0.5 seems to deviate from the style too much and much higher than 0.6 introduces artifacts and/or makes the portraits look too much like one of the originals. In every prompt but may change value. "Baldur's Gate official portrait, character, fantasy_character, 1024x768," - these are trigger worlds, telling the LORA to do what it should. Always there. "male" - this is also a trigger word, telling which set of weights to use. Use "male" or "female" herel. Always there. "halfling, rogue, grey leather armor, hood, curly hair,side look, tilted head," - the description of the character. I made this up. Seems to work, maybe you can find things that work better. Let me know! "abstract background of roof shapes at night," - I made this up for the background. The word "abstract" helps the style be closer to the originals and the rest may show hints of relevant environment, up to you to see what you like. Always there but changes per prompt. "facial expression, smile," - trigger words for the Expressions LORA, read the page on how to use these. "Facial expression" should always be there, plus the emotion you want. Combination of different trigger words leads to different effects. "masterpiece, highest quality, highest quality visuals, highest details" - this was here in the example image prompt and I've kept it. Always there. Nepative prompts: - this indicates that the rest of the row gets moved to the next text box. "ng_deepnegative_v1_75t, Beyond Negative-neg," - these are triggers for embeddings, which I'll explain below. Always there. "garbage quality, erroneous, incorrect, atrocious quality, faulty, synthetic, dissatisfactory, quality deficient, junk, below par, lousy, below average, subpar, substandard, abominable, godawful, grody, unacceptable, defective details, slipshod, erroneous, inexpertness, callowness. incompleteness, inferior quality, crappy quality, second-rate, poor quality, cheap, pathetic, low effort, standard Definition, worst quality, unfinished, low resolution, jpeg artifacts, logo, signature, watermark, cropped, low quality, medium quality, blurry, blur, ui, gui, username, depth of field, subtitles, censored, monochrome, poorly drawn, poorly painted," - this was in the example prompt as well. Very wordy, repetitive. I tried deleting it and results get a bit worse, so I've kept it for every prompt. Always. There. "topless, animal ears, crosseyed" - at the end of negative prompts I put negatives that change. When I put horns for warriors "animal ears" fixes those, "topless" reduces hairy male chests displays and "crosseyed" is often necessary. Run details: - other settings of the prompt Steps - I usually use 20-30, this affects time a lot, depends on the checkpoint you use. Look at the checkpoint recommendations for those settings. Sampler: DPM++ 2M, Schedule type: Karras, - technical stuff, they change results so you can experiment for speed/quality, but these seem to work well CFG scale: 7, - How creative it is. I change this sometimes around 7 seem to work best. I change this often when processing other images, depending on how much the image needs to change. Higher values make the image more likely to contain what you specified, but it gets more disjointed. Size: 512x768 - for SD1.5, seems like an ideal. No point in getting bigger, as they'll get made smaller anyway. Shouldn't do smaller to allow the model to work correctly. Needs to be multiples of 64. Model hash: 8a5ee35fbe, Model: Realisticmix666_v20, - this is another model I've used. When trying a prompt if I don't like the results of one checkpoint after several tries I sometimes try another. Clip skip: 2 - technical stuff, one model may prefer 2 vs 1. Pay attetion to this and change it when the model requires it. It's done automatically if you use the blue button. Lora hashes: "Baldur's Gate 1 Portraits: 0aa4f1c96e50", Version: v1.10.1 - this just lists LORAs used with <Lora:...> and their version id; then the version of my a1111. Before we use the prompt, we need to isntall embeddings (and the checkpoint used, if you want to be able to get more variety in results) used in the prompt: These help with common distortions like strange eyes, terrible hands, etc. For "ng_deepnegative_v1_75t" (SEE NOTE) download the file from https://civitai.com/models/4629/deep-negative-v1x and put it into "INSTALLDIR/webui/embeddings/". NOTE: this is a "PickleTensor" as opposed to "safe tensor" which all other files have been. This is an older type. It says it's possible to contain harmful code. I read a bit about these and my assumption is that this one is safe. It's very popular and it should have been caught if it had bad effects. You can go head without it if you want and just get the next one. For "Beyond Negative-neg" download from https://civitai.com/models/108821/beyond-negative-embedding and put it in the same embeddings folder as above. These are all the settings I've used so far for the BG1 portraits. If you want a character from another game in these portraits you could download more character LORAs, put value around 0.7 and have that character in BG1 style. Or you could use a different LORA for a different style (should be SD1.5 based) or a different checkpoint altogether. By the way, there is another LORA for BG portraits for the newer models SDXL, but I like the mixed BG1/2 style less and using SDXL, together with all matching LORAs, embeddings, etc is much slower. The prompt can now be run fully. You can now create and edit existing images to BG1 style. If you use the prompt and the model from the prompt you should get the exact same result. The checkpoint doesn't get auto-updated when pasting prompt info, you need to change that manually if you want. Creating new images from text prompts without a source image Some examples of images made from scratch: The colors are made to match the in-game ones, the backgrounds to generally match the light / environment, expressions should match the personalities. Bubbles is annoyed, Carbos and Shank are goofy and argumentative, Chrovale is angry, Basillus is creepy, Missy is suspicious, Brus is careless/funny. To me how the image is cropped affects the style a lot, so I try to make it consistent and crop the image if needed after creation. There are exceptions though, Brus's bigger head conveys childishness, while the blue djinni portrait could be ok to show alienness and extra strength. Beefier characters in general get slightly bigger heads, like in some original portraits. By the way, TaxCollector isn't the tax collector, but it's an interesting face to use somewhere. I mentioned a celebrity in the prompt and got a unique face (they start repeating a lot) which is far enough from the celebrity to not be immersion breaking. So to make an image from scratch, copy the prompt above, paste it in the "Prompt" text box, press the blue button under Generate to correctly fill the prompt settings and press Generate. A useful tip: if you drag an image that was generated with auto1111 (or downloaded from civitai.com) into the "Prompt" text box, it will copy all the prompt data into the "Prompt" text box. To get different generated images, press the dice icon next to "Seed" or write "-1" or a random number. "-1" uses a random number automatically. You can see the result, adjust settings and go again. I usually run the initial iteration in batch of 1, look for obvious fixes, edit the prompt, try again, and when I'm generally happy with the prompt, I put the batch on 4, so I get 4 different results in one go. Depending on your video card you may do more or less. To emphasise things in the prompt, say, if I want the character to smile but there is no smile, you can increase the weight by changing "smile" to "(smile)", equivalent to weight 1.1 or ((smile)), equivalent to weight 1.1 * 1.1 = 1.21, etc, or just write (smile:1.5) for much increased weight. Going with very high numbers can mess up the images. Adjusting existing images Here are example of processed images, originals on the left, processed ones on the right or below: Some changed a lot, others little. NORTUA needed cropping to fix the size of the face. BDQUILA, arumaid worked pretty much right away, FFWENCH needed negative prompt (((elven ear))) because it kept making way too long ears, the white-beard old man took a lot of prompting to get a good beard. The process is very similar to the creating of a new image but you need to go to "img2img" tab first. Then you can copy the prompt, paste it into the "Prompt" text box, press the blue button under Generate to populate all settings. Then drag your image into the image area on the left. If the image ratio isn't 2:3, you should check "Resize and fill". It does a good job for the 169x266 px images. The prompt should describe the current image/result. This may need adjusting and this time around, Denoising strength get changed a lot as well, and sometimes the value of <lora:Baldur's Gate 1 Portraits:0.5>. If you want smaller changes, e.g. arumaid, Denoising should be right at around 0.45 (small changes have big effect), if you want bigger changes e.g. jerresid it's at around 0.5 (which is around the higher end) and <lora:Baldur's Gate 1 Portraits:0.7>. BGQUILA needed higher Denoising 0.55. The BG1 lora goes up or down depending how much the colors and lighting need changing, while the Denoising goes up for bigger areas/larger detail changes. I used Denoising 0.5 and <lora:...0.6> most of the time. Alfer experimenting for a while you'll get a feel of how changes in the values affect changes in the image. As with txt2img, I'd do batch of 1 attempt, fix obvious things, e.g. changing "white beard" to "(white bushy beard:1.4)" when seeing that the beard got messed up a lot while processing, do more iterations until it's almost ready, then do 1 or 2 batches of 4 and pick one of the outputs. Automation (for editing existing images, not great) What I've tried is to do a batch job for 38x60 images. The batch sub-tab is the last one just above the left image box. Using a folder is supposed to work but doesn't for me, however you could just drag and drop any number of files in the image area. I tried 500+ and it was fine. I ran that with a general prompt, omitting the description of the character and it didn't work well. The problem is that the LORA needs "male" or "female" or it often mangles the faces, female faces appearing inside helmets instead of male, women getting beards, etc. This could improve with doing only male or female portraits in a batch, with the keyword included. Also, with 169x266 images it may work better. I tried getting automatic AI text descriptions from auto1111 itself (CLIP/DeepBooru), there's also an external script to do that in a batch, but the derived text descriptions were very poor and didn't help. Auto-cropping to a consistent face size would be great, but I haven't looked into that. This could also be fairly easily done manually. To run the prompts variations on the same general prompt you can go to the bottom at "Script" and pick "Prompts from file or textbox", then put on new lines the extra bit that you want for the variations. That would run the same main prompt with the extra line appended or prepended. Flexible automation which is fairly easy and works well is dynamic prompts (from here https://github.com/adieyal/sd-dynamic-prompts). @zenblack, I hope this is useful to you for the prompts/models, as well as anybody else who is interested in Stable Diffusion for the portraits mods. I'd be happy to take part/contribute to making a portrait mod with non-copyrighted images, if others are interested. So we can get an official release. Hopefully for EE and non-EE games. @Shipwreck Jones, @ktchong, @Holden. We could create original portraits as well as use Creative Commons portraits that are free to modify and require no attribution (e.g. https://www.pexels.com/). Quote Link to comment
Thacobell Posted September 3 Share Posted September 3 https://www.nature.com/articles/d41586-024-00478-x https://www.scientificamerican.com/article/art-anti-ai-poison-heres-how-it-works/ Quote Link to comment
cdx Posted September 4 Author Share Posted September 4 Interesting articles, thank you @Thacobell. I'm hoping we'll get lots of new nuclear and desalination plants quickly to sort out the environmental problems. We'll need them for electric transport for sure. As for the artists, it's depressing, the big picture. The contribution value of their efforts falling to pretty much zero, not right now, but eventually. It's fair to protect their work as per the article, however it won't do much good, even if we don't count circumvention. I went for BG1 style model trained on commercial BG1 images, but I could have used a Rubens model + "more color" to make a similar copyleft one. Stable Diffusion has copyright concerns, but sooner rather than later there will be a free base model which will be clean. I'm assuming you are against AI, having posted these links? Would you consider it a moral obligation to wait until a working clean model is available, before using AI? I guess I didn't consider it all that much. Would you consider it ok to use AI with clean models, once we have them? I will have no issues AI outcompeting artists once clean models exist, but I guess I was too carefree deciding it would be ok to have an "official" AI-based release in the current AI environment. Quote Link to comment
zenblack Posted September 4 Share Posted September 4 In researching specifically the liabilities of utilizing AI the only program and company that has explicitly stated that they cover the liability and provides protection to it's users as well as a method of dispute to remove content or specific data that I can find is Microsoft with Copilot. While, in the United Stated, the laws regarding AI are still being determined, written and enforced especially regarding copyright and intellectual property this allows the users of Copilot to be shielded from potential lawsuits by Microsoft moving forward. That doesn't mean those who object to AI utilized with Copilot are wrong but it provides a path forward for those who are damaged to move forward. Here is an article in regards to it https://blogs.microsoft.com/on-the-issues/2023/09/07/copilot-copyright-commitment-ai-legal-concerns/ and this is the terms of Copilots licensing agreement which provides an opportunity to remove works if you feel they are included unlawfully but also the legality of what is produced which boils down to no ownership, no ability to limit reuse and such works being free for anyone to use anytime. https://www.bing.com/new/termsofuse?FORM=GENTOS Quote Link to comment
Thacobell Posted September 4 Share Posted September 4 Stealing art bad, destroying the environment bad. Quote Link to comment
cdx Posted September 5 Author Share Posted September 5 7 hours ago, Thacobell said: Stealing art bad, destroying the environment bad. I agree with the general statements. How much do they apply? Excessive energy and water problems are bad, which we should fix with clean power plants and desalinated water, rather than energy rationing. Also, I'm quite happy playing a computer game without worrying about its effect on the environment and SD uses as much power. I hate stealing, too. I wouldn't steal an actual painting. Yet I'm fine with pirating things. I'm trying to decide how to think about AI. It isn't stealing, but it is co-opting their skill without their permission. I guess I wouldn't describe the model training images as stolen but "pirated" probably applies. So like pirating a copy of humanity's artists' skill for personal use. Quote Link to comment
Thacobell Posted September 5 Share Posted September 5 There's a big difference between copying files from a large corporation after devs have already been paid, and stealing art from individual artists that depend on it to eat. Quote Link to comment
cdx Posted September 6 Author Share Posted September 6 I agree there's a difference. Using the word "steal" might not be quite right but the idea stands. I guess I'll wait for a guilt-free base model before resuming tinkering with this. Quote Link to comment
Recommended Posts
Join the conversation
You are posting as a guest. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.