Vietnamese Doctor at Google Uses AI to Convert Text into Images

Dr. Lương Minh Thắng and 10 experts at Google Brain developed the Parti model, teaching artificial intelligence to create images based on descriptive text.

Dr. Thắng (34 years old) is the only Vietnamese member of the core research team working on the Parti model (Pathways Autoregressive Text-to-Image) – which converts text into images at Google Brain since early 2021. Human language is commonly used for communication, but “if we apply technology to create images and artistic works, it can be considered a new advancement in AI,” Dr. Thắng stated.

Dr. Lương Minh Thắng currently works for Google Brain, specializing in AI product development. (Photo: NVCC).

He shared that current AI models applied in language through chatbot systems can interact with humans via text. In the field of images, AI can recognize objects in photos. “If we combine these two aspects to transform textual language into images, it will create a very modern AI model that significantly supports human creativity in image production,” Dr. Thắng explained the rationale behind the Parti model.

The Parti model allows users to generate images that accurately reflect their descriptions and desires. This technology can assist professionals in image creation, such as artists, photographers, fashion designers, and graphic designers. When they have an idea for a photo, they simply need to write down the desired details, and the AI will analyze it and produce a suggested image that enhances their creative process. Changing a sentence, word, or detail in the text can yield different images.

Images generated by AI based on the textual descriptions below. (Screenshot)

To create the Parti model, Dr. Thắng and the Google team used hundreds of millions of corresponding text-image data pairs to train the AI model. The data was sourced from various websites and processed using an artificial neural network comprising approximately 20 billion neurons. “Based on the textual and image data, the AI will combine them to create a new image, helping humans generate new ideas,” Dr. Thắng shared.

The most common themes represented by the Parti model include nature, animals, and objects. The Google Research website showcases many AI-generated images that appear to be real.

According to the research team, images related to people are processed carefully based on principles to avoid negatively impacting the community concerning gender, race, and religion.

Oil paintings in the style of the famous artist Van Gogh created by AI. (Screenshot).

A current limitation is that when presented with overly long texts, too many details, or conflicting images (such as a beach next to a desert), the AI may misunderstand or fail to produce a result.

Dr. Thắng noted that in the future, the team aims to overcome these limitations to build a more complete AI model. They plan to train the AI to edit images based on user text requests to better serve their needs and also research the creation of videos from multiple similar images.

Lương Minh Thắng was once a student specializing in Mathematics at the High School for Gifted Students, Vietnam National University, Ho Chi Minh City. After graduating high school, he studied computer science at the National University of Singapore. In 2011, he received a PhD scholarship at Stanford University (USA). In September 2016, he officially joined Google Brain, specializing in research on machine learning and natural language processing.