You’ve been responsible for Managing the ARTA – AI Art generator from the ideation phase until now. Could you share some insights on these early days?
Of course! Those were dynamic times. We managed to release a finely made application within just a week, becoming one of the first consumer app creators to offer text-to-image generation functionality on mobile. Our goal was to build a mass-market product providing people with “an artist” in their pocket. So, since the conceptualization and early development stages, we have taken a focus on usability and scalability. But despite entering the market very timely, it was quite challenging to grow our install volumes to an adequate extent, even with a brilliant media buying team like ours. A significant boost occurred three months after the app’s release when our Avatar feature got hyped. The volume quickly became moderately high for our niche, and since then, our task has been to maintain and increase it.
What was the original tech stack that you launched on and what were some of the challenges with art generation during this period?
We launched based on Stable Diffusion 1.3 using the official API from Stability.ai. I should say the situation with the quality of generations then and now is like night and day. When we first started, our QA managers frequently reported issues related to the aesthetic value of images or inaccuracies in representing specific concepts and features. However, that was standard for Stable Diffusion at that time. Now, generation output is much better in all aspects, including stylistic reproduction, composition coherence, visual fidelity, level of detail, and more.
Shortly after the app’s release, we began renting servers on Amazon, and supporting them turned out to be quite a challenge. Even with sufficient funds, there may be no free A100 available when you need it, and you will have to wait for a couple of days. Thereby, we had to live without autoscale, redirecting all excess traffic to our partners’ APIs.
Maintaining all of this remains rather tricky to this day, with minor issues occurring on one end or the other every month or so. For example, we occasionally encounter temporary problems with the quality of generations when the provider updates the server, tests weights, or implements other changes that affect the generation output. Such errors can last from an hour to half a day and are unpredictable and difficult to track. Usually, by the time our support department receives a user report about blurry images or some other occurring issue, the API provider has already fixed the problem. However, it’s a serious concern for our users. Therefore, we are now building a system that combines multiple providers and our own servers for special generations, allowing us to have more control on our side of things.
As a product manager, what strategic decisions have been pivotal in guiding ARTA to its top-ranking position shortly after its release?
ARTA’s (at that time called Aiby) early rise resulted from the timely decision to implement the viral Avatar feature when it just started making rounds on social media. We quickly recognized the growing interest in this functionality. Our entire team, including product, marketing, and development, was on the same wavelength and visionary about its success. We also acknowledged that a short time to market was crucial. So, from day one, we dedicated all our resources to realizing this feature, prioritizing it above other tasks.
Since our deadline was ASAP so as not to miss the moment when AI Avatars reach their hype peak, we opted to use a third-party solution and customize it for our app. While avatars were beginning to gain traction on mobile, the technology had already been available on the web for some time, even with an API. Thanks to the team’s concentrated efforts, our first working version was in the App Store in just five days, offering highly competitive avatar output. It helped us attain the #2 position in the American top charts and stay the second most downloaded app in the US for a week.
Your team has recently released an upgrade to ARTA’s AI avatar generation feature. Could you share some details regarding this?
The AI models tend to add generic facial features during training, making avatars look different from the source photos, and the more unique one’s traits are, the more unlike the AI interpretation can appear. To address this issue, we decided to create our own avatar service. We had been using a third-party API for a long time but didn’t yield significant improvements. With the server shift, we were able to set up more optimal training technology to better maintain the likeness of the user’s real face in the avatar output. While I can’t disclose our unique pipeline in detail, it became possible due to a specific combination of SDXL settings, LORAs, and face enhancers, and we haven’t yet seen better outcomes elsewhere.
With the new server, we moved away from a fixed cost for each avatar pack to a monthly server fee and can now offer avatars through a weekly subscription instead of requiring separate in-app purchases. It creates a more fulfilling experience and is much cheaper for our users if they want to generate, for example, five avatar packs within a week or change the photo input as they go. Considering all of the above, our avatar offer currently boasts the best price-performance ratio on the market. While there are apps capable of creating high-quality realistic avatars, ARTA stands out by providing a diverse range of bright and colorful output variations besides realistic styles, all with the same precise level of facial recognition.
In what other ways has the team improved the app’s capabilities?
We concluded that using third-party APIs is more efficient for common use cases like text-to-image generation, image conversion, and inpainting. This approach eliminates the need to spend time figuring out how to integrate these functionalities into our server infrastructure. Furthermore, it reduces costs in situations when a new feature doesn’t take off as expected and we decide to remove it. The AI image generation industry is rapidly evolving, with numerous dedicated services available, so we explore and gradually adopt those that align with our objectives.
At the same time, ARTA’s needs often turn out to be quite unique, requiring in-house findings. In cases when tailored APIs are either non-existent or do not provide satisfactory output quality, we specialize and customize our internal services and develop our own solutions to achieve the results we want. For example, in addition to upgrading AI Avatars, our ML and prompt engineers have come up with a new pipeline for the app’s AI Filters (Selfies) feature. We’ve also developed a unique algorithm for our upcoming AI Baby feature – a generating functionality that allows two people to merge their photos and see how their child might look. Based on my perception of the world as a product manager, I initially doubted its success, but ad creatives featuring this concept are very popular. So, checking up on marketing insights is especially helpful in content-related cases.
Can users influence the artistic process in ARTA? If so, what tools and options are available for users to customize the AI-generated artwork?
We handle all the complex aspects related to generation, aiming to provide our users with a straightforward artistic experience without unnecessary technical overload. So, the primary way users influence the output is through prompts. We keep this process transparent by showing the exact word request that will be sent to the model for generation and only offer assistance with composing effective prompts if needed.
We select the best default settings for each integrated model so users don’t bother about that. Typically, there’s no need to adjust them to maximize results, as they already produce an optimal generation output. Still, if the user wants to experiment, an advanced mode is one tap away, and some deeper parameters are in the settings section.
Soon, we will add a Seed parameter, allowing users to have complete control over generation when they need to recreate an identical image from scratch. Additionally, we plan to expand the list of aspect ratios. We are also thinking of adding several controlnets to regular generations. They are already supported on the server side, as we use them to generate AI Filters and sketches, but they aren’t yet delivered to end users.
How do you perceive the impact of AI like ARTA on the traditional art market? Do you see AI art generation as a disruption or an enhancement to the art industry?
I see it as an enhancement. Generative AI has introduced new and valuable opportunities to enhance the artistic process while significantly reducing turnaround time. It assists digital artists, designers, illustrators, and other visual content creators with a variety of tasks, from exploring ideas and developing concepts to generating sketchups and ready-to-go images. Ultimately, our ability to leverage its advancements is only limited by our imagination.
For example, I have a hobby of creating PC games, and recently, I used ARTA to generate a set of icons for skills and items. I could design them on my own using Adobe Illustrator, but with an image generator, I got what I needed almost right away. My wife, in turn, is a retoucher-photographer. Thanks to Photoshop’s Generative Fill, she works much faster and has more free time (or more income if she decides to accept more retouching orders).
When done well, AI-generated images can look indistinguishable from professional artwork. However, in my opinion, AI will never replace a true professional. No matter how skilled neural networks become, they are still trained on data created by humans, meaning that everything they generate already exists somewhere. As then and now, truly innovative ideas can only be produced by people. While the traditional meaning of art remains associated with human-made pieces, AI art is like an anticipated spinoff, inviting everyone, regardless of artistic background, to try an exciting new experience.
Looking beyond just improving image quality, where do you see the future of AI image generation heading?
Along with the image quality, the speed of generations will increase, automatically leading to more cost-effective outputs.
I think it won’t be long before there is an easy way to generate the same characters in different environments and positions so that we will see the rise of AI in comics, children’s books, game graphics, and more. Interior design and ad creatives production are already the spheres actively leveraging generative AI, but more is ahead of us as the technology continues to evolve.
Considering that all generations require strong GPUs, these technologies will develop along with AI for quite some time. We are only yet at the beginning of the journey. Perhaps the new Apple of our time will be Nvidia, with everyone, or at least those in the IT industry, anticipating new video card releases just as we all did with iPhones.
AI image generators will continue delivering fun and engaging experiences, whether by introducing new concepts emerging from pop culture or reviving older ideas enhanced with better technology. For example, interest in AI Baby generations is currently growing. One recent technology based on Stable Diffusion has demonstrated impressive output from merging two individuals’ features to reveal their biological child’s potential appearance. The results far surpass what was available on horoscope sites a few years ago, and people are eager to give it another try.
What are your predictions for what we should expect next from Generative AI?
The wave of popularity for video generation is on the horizon. With advancements in technology reaching a sufficient level, there will undoubtedly be attempts to train neural networks using people’s facial expressions and gestures to create video avatars, potentially even with unique user voices.
AI Audio is another significant breakthrough ushering in a new era for the music production industry. This technology has already presented amazing opportunities for composing songs based solely on text input, making it an excellent tool for creating custom non-stock soundtracks for various types of video content. Overall, it’s really fun to listen to something as mundane as Terms of Use rapped or sung with romantic intonation.
Thank you for the great interview, readers who wish to learn more or generate some images should visit ARTA.