The rapid evolution of generative AI has sparked various discussions and topics. Where is the relationship between generative AI and creativity heading? Arai Mono, who is the CTO of AIHUB and is working at the intersection of entertainment and technology, and Seshita Hiroyuki, an animation director with a long career in 3DCG, discuss the future of this relationship from their perspectives.

■Profile
Seshita Hiroyuki
An animation director belonging to Studio KADAN. Born in 1967, he has been working in various fields of CG and VFX production since the 1980s. Seshita has directed several notable works, including “Knights of Sidonia,” “Ajin: Demi-Human,” “BLAME!,” and the “GODZILLA” trilogy. His most recent project, “Lupin III vs. Cat’s Eye,” is now available on Amazon Prime Video. Additionally, Seshita contributed to CG character direction in Shinkai Makoto’s “Suzume.”

Arai Mono
Representative Director and CTO of AIHUB Co., Ltd., where he also serves as an Artist and Engineer. He has been involved in numerous startups and projects, primarily where entertainment and tech cross over, holding roles such as Project Manager, Product Manager, and Architect. Arai played a role in establishing the Japan Linux Association and the Japan Medical Association ORCA Management Organization “ORCA Project.” Since founding AIHUB Co., Ltd., he has been focusing on research and development, developing use cases, social implementation of generative AI, and the fusion of responsible AI and web3 technology. He is a founding member of the Anime Chain Initiative, aiming to develop a clean foundation model for generative AI.

■The Mechanism of AI and Concerns
–Director Seshita, how long has it been since you dove into the world of 3DCG?

Seshita: Around 1987. I participated in a project for the Tujitsu Pavilion at the International Garden and Greenery Exposition (held in 1990), through an introduction by my vocational school teacher, Douglas Lerner. It was a part-time job in a project led by notable figures such as Dr. Nelson Max from Lawrence Livermore National Laboratory and Roman Kroitor, the co-founder of IMAX Corporation. Later, in 1989, I joined a CG production company called Links, where I ended up working on the Flora Dome (co-hosted by the Ministry of Posts and Telecommunications, NTT, and KDD) at the International Garden and Greenery Exposition. I thought “Here I go again!” (laughs).

Arai: I see. At the time, I was a private secretary for Mr. Aikawa (Kiyoshi) of Omnibus Japan. I also received a lot of support from Links (laughs). In addition, I was part of a joint venture established by several companies, such as Omnibus Japan and Links.

Seshita: Wow! It seems we were quite close to each other.

Arai: However, I eventually moved away from 3DCG and decided to pursue development utilizing operating systems and open-source software. Currently, I work at the intersection of entertainment and tech, including AI.

Seshita: I remember there being an AI boom in the 1980s as well. At that time, a workstation called Symbolics, which used LISP (a high-level language used in AI research), was popular. I remember being amazed by simulations of flock movement. It was an era with many progressive ideas, but even though the theories were fascinating, there were many environmental limitations like low machine specs that took tens of thousands of years of computation to be practical (laughs). That’s why I’m so excited that AI has become so common now, and I view the various discussions positively as they are part of the process that brings practical applications closer..

Arai: The emergence of the transformer architecture in 2017 marked a turning point in the development of AI in recent years. This is one of the techniques of deep learning, that with its emergence deeply divided the history of AI into before and after its creation. In addition, current AI development also has the open-source community as its background, where the sharing of papers and the implementation of functions are carried out at an incredibly fast pace, which is another factor contributing to its incredibly rapid development. The production cycle used to take much longer. Researchers from universities and companies would write papers, which would undergo a peer-review process before being published in academic journals. Only after that, the actual products based on the research would be developed.

Nowadays, papers are immediately uploaded online, and within a few days, plugins based on those papers are implemented. It’s the so-called “cathedral and bazaar” model. In AI development, new things are born in a lively bazaar-like environment where individual shops gather instead of completing a huge project by stacking stones like building a cathedral.

That’s why it is often said that what’s important for AI innovation is creating an environment where various talented people can engage with it. Against this backdrop, groundbreaking breakthroughs such as the “hierarchical merge” technique and the “ControlNet” extension have emerged.

Seshita: In the world of 3DCG, Blender is certainly making its presence known as a prominent open-source software. Various ideas are constantly being realized and accumulated, stimulating each other and giving birth to new ideas. The benefits of the open-source community are being fully utilized, leading to it becoming a dominating force in the industry..

Going back to AI, there’s a piece of news that caught my attention recently. The author of the novel that won the Akutagawa Prize (Kudan Rie’s “Tokyoto Dojo-to” lit. Tokyo Metropolitan Sympathy Tower) mentioned that they used AI (partially). I wonder why they felt the need to reveal that information. For example, no matter how high-performance a word processor or pencil is, they are all just tools for creativity, right? AI is also one of those tools, so I thought they didn’t need to specifically mention it.

Arai: AI is indeed a new “pen” for creators to use. However, I think many people are anxious about the use of AI in the world because they can’t tell from the outside whether that pen is really safe and reliable.

Seshita: I personally want to utilize AI in some form in the future, and I’ve started experimenting with various things. However, when I look at the discussions surrounding AI, I’m more worried that the legal arrangements and operational morals surrounding it will move in a direction that creates a large number of new regulations and constraints, ultimately leading to a decline in the overall creativity of the community rather than AI itself.

■ The Relationship Between AI Image Generators and Anime
— In the discussions surrounding AI image generators, I think people are concerned that the AI is being trained with images on the internet without the creators’ permission. Could you explain again what the training process of generative AI entails?

Arai: The training process of AI image generators can be roughly divided into three stages. The first is the training of the foundation model. Here, the AI is trained with basic information, such as how the world works and human concepts. AI image generators are trained with approximately 5 billion images on the internet. Next, in the form of additional training, the AI is fed with more specific information like anime styles or photorealistic styles, which is merged with the foundation model.

In the final process, called focused training, the AI is given data that can serve as a reference for the specific visuals desired. Article 30-4 of the Copyright Act does allow the use of images for at least foundation and additional training, as long as it is “use that is not intended for the enjoyment of ideas or feelings expressed in the copyrighted material.” On the other hand, when it comes to the use of the generated results, regardless of the generator being AI or human, it is judged based on “similarity” and “reliance.” If these two factors are present, it is recognized as copyright infringement.

— Foundation models are trained with 5 billion images?

Arai: To be precise, rather than creating a dataset for training, the data is roughly put together, and problematic images, such as child pornography, are mechanically excluded, but this is not done by visually confirming of each image.

Seshita: In the case of a debate about whether an image generated by an AI is “similar or not,” would humans actually make that judgment? Even when humans are drawing, there are various possibilities. It could be plagiarism, homage, or even parody. It’s a tough process to reach a conclusion.

I have some concerns that if society moves in a direction where images that have even the slightest sense of similarity are regarded as copyright infringement, it may restrict creative activities. This lead me to think that there will be a need for an AI to assess the reliance and similarity of AI-generated images, as well as international standards, in the future If there’s a seal of approval saying, “This is good to go,” people can take advantage of it with confidence.

Arai: The foundation models of current AI image generators are trained in a way that can’t wipe away such concerns. In fact, what we’re working on to clear up that problem is the development of an “AI image generator with a foundation model trained exclusively with authorized data” (Anime Chain Initiative “Anime Chain FAQ”). To prove “AI wasn’t used” is an extremely challenging task since it’s devil’s proof, but proving “the use of safe AI” is possible.

Seshita: So, the time is coming where we will also prove how anime is made, so to speak, the “origin and raw materials used to create it.”

Arai: I believe we need to take this approach to contain the spread of AI image generators that exploit Japanese anime and illustrations for their foundation and additional training.

Seshita: It’s just like food saying that they’re made with organic farming or that they’re not using modified crops (laughs).

Arai: That’s right (laughs).

Seshita: However, in the case of anime, it’s a collective effort by a staff of 100 to 300 people. Having a record for each output and the record being able to prove that “safe AI is being used” could be quite troublesome. Blockchain technology may help, but I’m worried that the work outside of the creative process will become a burden.

Arai: Rather than having the workers do it, I think it can be done in a way that makes a record at the application and device level.

Seshita: I’ve been thinking about using blockchain technology to reduce creators’ general office work for a while now. If it can be linked at the application and device level, I believe it can be applied to “recording authorship.” When using generative AI, being able to provide proof of the origins and a record of the production process will become a selling point of the product.

Arai: That’s right. I believe that if permission is obtained to use the images for training, a portion of the profits can be returned to the rights holders even for the use of generated results.

— Earlier, you mentioned that 5 billion images are used for training the foundation model. Will you need as many authorized images?

Arai: According to the latest papers, similar results can be achieved even with 20 to 30 million images, reducing the training time to one-tenth. Also, there’s been a tendency for the foundation model for generative AI to be trained to reflect Western aesthetics. It’s like the difference in flavor when using different types of broth.

Seshita: Broth? (laughs)

Arai: Yes (laughs). The current models are forced to generate output that suits Japanese tastes. It’s always better to have diverse values rather than having a single dominant culture. A generative AI tailored to Japanese content will achieve better results in generating Japanese content. Furthermore, I think that’s where AI can help in spreading Japanese creativity.

Seshita: From a different perspective, it’s amazing that it could also lead to using AI to protect Japanese culture. Listening to you, it seems that the practical use of generative AI is much closer than I imagined, and I’m excited. I really want the day to come when I can casually use AI as my own assistant. One that doesn’t say “I can’t answer that” when I ask it to “Do something about the deadline” (laughs).

Arai: (laughs)

Seshita: I apologize for the conversation turning into sci-fi, but that’s been my ideal image of AI for a long time. It doesn’t have to be direct instructions, but being able to have abstract conversations and having that lead to advice or inspiration. That would be a nice creative relationship with AI.