A New AI Image Generation Enthusiast | Exploring Ice Hockey Image Variations with Midjourney v5.2 and Stable Diffusion XL 0.9

Currently, I am early in my journey with Artificial Intelligence.  Most of my experience to date is through image content creation using Midjourney. Midjourney is one of the popular text-to-image models.  The others are Dall-E 2 via OpenAI and Stable Diffusion by StabilityAI, and Adobe Firefly (based on a dataset of Adobe Stock).  My experience with Midjourney and Stable Diffusion has been very interesting.  The current versions of the applications that I have used is Midjourney v5.2 and now the recently released Stable Diffusion XL 0.9 (accessed through DreamStudio). 

I am getting more acclimated and have been producing better images.  There is a learning curve, primarily with using parameters as well as prompt structure to get the best results.  However, there are still some shortcomings.  Each of the image models do an overall great job on producing images based on my prompts.  There are differences in the results pending the model I had used.  Taking my interest in sports and combining it with technology, I have focused on hockey as the primary subject matter.  I have also created prompts in a more abstract creation based on seeing animals in my morning runs on multiple days.

Let me state that I got fantastic images right from the start even though my prompts were not the best. This is truly a very powerful median to use and I have learned a great deal growing through this process. However, there are places where both language models struggled.  By and large, the models struggled with the players hockey sticks in either showing them or even omitting them entirely.  It took me some time to massage my prompt to mostly get the desired result.  

Original Midjourney v 5.2 Image

At first glance, the picture above looks great and it does in an overall view which is a great example to use.  The devil is in the details with this image.  Looking closer, the closest player’s stick is not in the players hands combined with the players in the background missing sticks entirely.  Other difficulties that I had come across deals with the number of players placed on the ice is some images.  There were images where there were players with 15 players and more than two teams on the ice despite the prompt giving parameters for two teams and a limit of the number of players in my prompt.  This is where trial and error came to play, all I had to do is find some more specific wording (prompts) and placed in a specific structure along with learning Midjourney’s parameters command fixed this issue to a good extent.  There were some images that had issues but they were far fewer in number.  Another difficulty I ran into was the uniforms and numbers.  If there were numbers on the jersey in two places, they did not match or the number on the helmet did not match the number of the jersey.  For these, just had to use the trial-and-error method to massage the prompt as explained before. The biggest issue was that the jersey’s did match for players on the same team in my original prompts.  This is one where I had to add extra information to the original prompt specifically stating that the jersey’s for the same team must be the same.  If I would have known this, the process to get to good results would have been much more efficient.

 

Corrected Midjourney v 5.2 Image

This picture was the result of correcting the prompts that I mentioned in the previous paragraph.  For this picture, I would have to adjust the prompt to be more specific so the same color helmet would be used for all players on the same team.  One thing that I learned is that just because I think of something as being logical and specific is not what works best for the language model to use to provide the desired results.  In this way, it is still very much a “garbage in, garbage out” process.  This means the resulting image is only as good as what you use.  It takes time but anyone can improve.  Now, I also have the advantage of my wife being a Visual Information Specialist (Graphic Design) and run the images by her.  So far not to shabby for this history major and Government Contract researcher.

Up to this point I have concentrated on Midjourney in this article/post.  I have mentioned that I have used other image creation applications.  The following picture is from DreamStudio by Stability which uses the latest version of Stable Diffusion.  This version is Stable Diffusion XL v. 0.9.

Stable Diffusion XL 0.9 Image

I used the same prompt to come up with this image as I have written about the Midjourney prompts.  At first glance, this player seems almost perfect.  Yes, it is in a different style with it being more of a computer-generated type over a picture type of image.  This can be changed with editing the prompt or changing the settings in the application.  The big error in this image also revolves around the hockey stick.  Where the stick meets the players legs you will see that the stick shaft is bent almost in a serpentine type of shape.  This is a big error for hockey players where their stick shafts only come in straight.  This also generates ideas that lead one to better prompts. Before seeing this result, I would not have considered to put instructions in the prompt related to stick shape and instructions on the length of the stick for the language to consider.  After seeing this image, I edited my prompt to receive better results.  The point here, is that there is a great learning experience that can be applied for anyone to learn and adjust their skills. 

The different text-to-image languages offer similar results, with their own spin on things, to a close enough extent for my use case.  Each will find that what fits them best.  The point of this article/post is to show what my novice experience has been particularly with my interest in sports and technology.  I have had a blast learning this new technology using skills I have developed through my career and expanding in the world of both prompt engineering and image creation from ChatGPT to the text-to-image language models previously discussed.  Next Steps for me is looking into how to utilize other forms artificial intelligence applications to increase my marketable skillset.

Now I did mention ChatGPT in the last paragraph which was the first mention of this technology in this article/post.  As I have said in my other articles and post that I have written, I am a history major and take the term cite your sources seriously.  With new technology, specifically Artificial Intelligence, the line between human generated content and machine generated content has blurred significantly.  Thus, I like to point out each content type in my writings.  Everything above this point has been human generated (written by me) and everything below was provided from a prompt by ChatGPT.

Per ChatGPT: When using the Midjourney v5.2 and Stable Diffusion XL v0.9 text-to-image artificial intelligence learning models to create images related to ice hockey, there are several aspects to consider.

What Midjourney v5.2 did well:

Improved image detail: Midjourney v5.2 showed significant progress in generating images with enhanced detail compared to its previous versions (v3 and v4). It added more realistic skin textures, facial features, cinematic lighting, reflections, glares, shadows, and expressive angles or overviews of a scene.

Realistic hands: One notable improvement in Midjourney v5.2 is its ability to generate realistic human hands with five fingers, which was a challenge for previous AI art generators. This feature adds credibility and realism to the generated ice hockey-related images.

Issues with Midjourney v5.2:

Overwhelming perfection: Some AI art fans have expressed that the images generated by Midjourney v5.2 can be “too perfect” and photorealistic to the point where they lack a sense of novelty or surprise. This precision and lack of imperfections in the results can reduce the excitement and thrill of using the AI model to generate images [[1]].

Regarding Stable Diffusion XL v0.9, the provided information doesn’t specifically mention its performance in the context of ice hockey images. However, it is worth noting the general differences between Midjourney and Stable Diffusion for better understanding:

Differences between Midjourney and Stable Diffusion:

Machine learning model: Midjourney utilizes a proprietary machine learning model, while Stable Diffusion has its source code available as open-source.

Accessibility: Stable Diffusion can be downloaded and run on your own computer, given that it meets the requirements, whereas Midjourney can only be used through the Discord chat app and requires an internet connection.

Cost: Midjourney requires a minimum monthly cost of $10 for limited image generations, while Stable Diffusion can be used for free by running it on your own hardware or by paying a nominal fee for online services.

Customization options: Stable Diffusion offers thousands of downloadable custom models, providing more options for users, whereas Midjourney has a limited selection of models to choose from.

Ease of use vs. advanced features: Midjourney may seem easier to use due to its simpler settings, while Stable Diffusion offers more advanced features and customization options, making it suitable for both beginners and experienced users.

Since the information provided does not specifically highlight the issues or strengths of Stable Diffusion XL v0.9 in generating ice hockey-related images, it’s not possible to provide a comprehensive evaluation of its performance in this context.

Overall, Midjourney v5.2 excelled in improving image detail and generating realistic human hands for ice hockey-related images. However, it also faced criticism for producing images that were perceived as “too perfect” and lacking the element of surprise. Meanwhile, Stable Diffusion XL v0.9 differs from Midjourney in terms of its machine learning model, accessibility, cost, customization options, and the balance between ease of use and advanced features, but its specific performance in generating ice hockey-related images is not mentioned in the provided information.

 *A human written addendum to the ChatGPT information.  After several conversations back and forth between I and ChatGPT, the conclusion was that the current knowledge base did not include the differences that I have written and have not posted to the internet yet about ice hockey images. And yes, I do use a Chrome extension that attaches ChatGPT to the Internet.  Thus, the best answer was to provide ChatGPT’s initial response that was based on the basic differences between Midjourney and Stable Diffusion versions I have written about.  I guess that I am ahead of the curve.

The information provided in this post was obtained through open research and personal experience using various websites.  The information must be accessible to all in order to appear in this information piece.

Print Friendly, PDF & Email