Imagen on MiocAI: A full rundown
A full walkthrough of image generation on MiocAI. This article is a guide to MiocAI's image generation feature, explaining the basic generation flow, under-the-hood processes like prompt generation and image processing, and advanced prompt construction. It covers best practices for crafting effective prompts, including camera angles, dos and don'ts, and techniques for emphasis. Finally, the article aims to improve users' understanding and control over image generation within the platform.
<p>MiocAI's image generation feature is one of the most considered features, and as such one that we work on the most. As such its packed with features and details. This can make it overwhelming or even problematic to navigate at times. while we ARE working on making it more usable for new users, its still advised to dive into our documentation a bit to improve your understanding of our image generation.</p>
<h3>Basic Generation flow</h3>
<p>So long as the studio feature isn't out yet, the primary way to generate images of your character is through a chat with said character. These pictures are referred to as <strong>context aware images</strong>, and will be referred to as <strong>CAI</strong> (not to be mistaken for the trash AI chat site).</p>
<p>In order to get started with generating an image, you will need a chat with at least a handful of messages, so the context engine has something to go off of. If the character supports images, you should see a button like this appear on the message:</p>
<p><img src="https://miocai.com/media/uploads/68022c2ceab23_paste.png" alt="Generate Image button" /></p>
<p>Upon clicking it, your image will start generating, you can wait the 10-40 seconds it takes to generate, or you can just continue chatting in the meantime. Upon completion, the image will occupy a space on your message.</p>
<p><img src="https://miocai.com/media/uploads/68022ce27c193_paste.png" alt="Generated image.. :3" /></p>
<h3>Under the hood</h3>
<p>The reason image generation works as stable as it does is because of a number of steps it goes through, to ensure reliable fast image generation.</p>
<h4>Prompt generation</h4>
<p>The Moment you click the <strong>CAI</strong> generation button, the system collects messages from it to use in prompt generation. This, at the time of writing this article, takes into account the last 5 messages. The reason for using only 5 message is prompt pollution. If too much action occurs in the span of the given context, the AI model may overload the prompt with actions, which can cause mutated characters, inaccurate images or worse.</p>
<p>The prompt below is the prompt that was generated for the image used in the <strong>Basic Generation Flow</strong> section.</p>
<p><code>safe, rising from sitting, standing with hands on hips, face smug and annoyed, pointing catalyst, stomach growls, looking determined, smug while hungry</code></p>
<p>Breaking it down:</p>
<p><code>safe</code> - The model is to define the extent of boundaries in the image. Generally having the options "safe", "sensitive", and "explicit". As of writing, this isn't processed any further than being passed as a keyword to the image model, although it may soon be used for dynamic <code>negative_prompts</code> and lora selection.</p>
<p><code>rising from... face smug and annoyed</code>: Pose and expression are determined by the model, and mildly described. Repetition will not help here, which is something the model has not done right in this specific example.</p>
<p><code>pointing catalyst</code>: The model is tasked with defining objects that may be in the scene, which holds a lesser priority than pose and expression.</p>
<h4>Imagen initiation</h4>
<p>After a prompt is received (and confirmed if applicable), the request will be tied into the queue. At most, 25 concurrent images can be generated at once to conserve server resources. Usually the queue wait time is much shorter than a second though.</p>
<p>Once the queue has been cleared and the system has confirmed you are in fact eligible to generate images (rate limit or image limit), the system retrieves the character details relative to the current chat, normalizes the prompt and does minor pre-processing (including scenario details, user details etc.), before tasking the model to generate the actual image.</p>
<h4>image generation</h4>
<p>Upon receiving the image generation task from the pre-processor, a model router selects the applicable model (realistic NSFW, realistic SFW, anime NSFW, anime SFW) and dynamically selects a set of parameters for maximum performance.</p>
<p>The user receives an estimate of the generation duration, and the image model starts generating the image. Now, based on the user's settings, this image is then subject to additional processing, adding the watermark for example.</p>
<p>Finally, it is scanned for <a href="https://miocai.com/documentation/2">Prohibited material</a>, this is done through a model that seeks out underage pornographic material. All hits are safely verified through a number of computative steps to avoid human exposure to said material and ensure it is not a false positive. Detected and confirmed instances will result in staff being informed, an account suspension and potential legal process. Note that accidental generation is not a grounds for this procedure, and we are incredibly sparing with this process.</p>
<h3>Advanced prompt construction</h3>
<p>The more intrigued tinkerers Have most likely already discovered that MiocAI offers the option to edit/define the specific action prompt for your image yourself. This can be enabled in two ways:</p>
<p><img src="https://miocai.com/media/uploads/680233aec1db9_paste.png" alt="Through the chat settings" /></p>
<p><img src="https://miocai.com/media/uploads/680233cdce689_paste.png" alt="Through the user settings" /></p>
<p>Upon activating this option, the image wont be generated right away, but you will first be presented with the prompt that is intended for use. You can then alter it, and when satisfied, send it off.</p>
<p><img src="https://miocai.com/media/uploads/6802342172141_paste.png" alt="Prompt confirmation section" /></p>
<h4>Best practices in your own prompt</h4>
<p>There is many key aspects that will contribute to the quality and accuracy of your images. One of the most notable is camera angle/perspective. If you would like to incorporate an angle into your image, i suggest following this format:</p>
<p><code>[angle], [pose], [objects]</code>.</p>
<p><img src="https://miocai.com/media/uploads/680237338b63a_paste.png" alt="Example angles to use" /></p>
<p>The above seen angles are the ones the image model is most acquainted with, and as such will yield the best results. Here they are again:</p>
<ul>
<li>Long Shot</li>
<li>High-angle Shot</li>
<li>Low-angle Shot</li>
<li>Wide Shot</li>
<li>Bird-eye Shot</li>
<li>Medium Shot</li>
<li>Close-up Shot</li>
</ul>
<p>A couple more:</p>
<p>Wide Shot/Establishing Shot, Long Shot, Full Shot, Medium Shot, Cowboy Shot, Medium Close-Up, Close-Up, Extreme Close-Up, Two-Shot, Over-the-Shoulder Shot, Point-of-View Shot (POV), Reaction Shot, Insert Shot, Cutaway Shot, Low Angle Shot, High Angle Shot, Dutch Angle/Tilted Shot, Aerial Shot, Tracking Shot, Dolly Shot, Steadicam Shot, Crane Shot, Handheld Shot, Whip Pan Shot, Zoom Shot, Rack Focus Shot, Split Screen Shot, Freeze Frame Shot, Slow Motion Shot, Fast Motion Shot, Montage Shot, Cross-Cutting Shot, Bird's Eye View Shot, Worm's Eye View Shot, Reverse Shot, Reaction Shot, Panning Shot, Tilt Shot, Follow Shot, Static Shot, Establishing Drone Shot, Underwater Shot, POV Drone Shot, Crash Zoom Shot, Snorricam Shot, Tracking POV Shot, Vertigo Shot (Dolly Zoom), Flashback Shot, Flashforward Shot, Static Long Take Shot.</p>
<p>Overall, a recipe for success is generally working with best practices. Here are some dos and don'ts for prompt confirmation:</p>
<p><strong>Dos</strong></p>
<ul>
<li>Define explicit details in stern direct language</li>
<li>Outline the pose and expression</li>
<li>Use keywords separated by commas</li>
</ul>
<p><strong>Don'ts</strong></p>
<ul>
<li>Use vague terms (e.g.: <code>"his member grew"</code>. What the fuck is the AI supposed to understand under member.)</li>
<li>Use full sentences</li>
<li>Define time and place, you should use scenarios for that.</li>
</ul>
<p>Another thing that can be helpful is emphasis. Repetition (AKA repeating a specific word multiple times) can cause the the model to emphasize on it much more than it usually would. Putting the word in brackets can have a similar effect.</p>
<p><img src="https://miocai.com/media/uploads/680236437a56c_paste.png" alt="Something the model would usually refrain from generating, a backflip." /></p>
<p>I might polish this article in the future, but for now, I hope this helps with image generation a bit!</p>
Explore More on MiocAI
Read more documentation articles about AI characters and roleplay on MiocAI.