For many of images’s roughly 200-year historical past, altering a photograph convincingly required both a darkroom, some Photoshop experience, or, at minimal, a gentle hand with scissors and glue. On Tuesday, OpenAI launched a software that reduces the method to typing a sentence.
It’s not the primary firm to take action. Whereas OpenAI had a conversational image-editing mannequin within the works since GPT-4o in 2024, Google beat OpenAI to market in March with a public prototype, then refined it to a preferred mannequin known as Nano Banana picture mannequin (and Nano Banana Professional). The enthusiastic response to Google’s image-editing mannequin within the AI neighborhood obtained OpenAI’s consideration.
OpenAI’s new GPT Picture 1.5 is an AI picture synthesis mannequin that reportedly generates photos as much as 4 instances sooner than its predecessor and prices about 20 % much less by the API. The mannequin rolled out to all ChatGPT customers on Tuesday and represents one other step towards making photorealistic picture manipulation an off-the-cuff course of that requires no specific visible abilities.
The “Galactic Queen of the Universe” added to a photograph of a room with a settee utilizing GPT Picture 1.5 in ChatGPT.
GPT Picture 1.5 is notable as a result of it’s a “native multimodal” picture mannequin, that means picture era occurs inside the identical neural community that processes language prompts. (In distinction, DALL-E 3, an earlier OpenAI picture generator beforehand constructed into ChatGPT, used a special method known as diffusion to generate photos.)
This newer kind of mannequin, which we lined in additional element in March, treats photos and textual content as the identical sort of factor: chunks of knowledge known as “tokens” to be predicted, patterns to be accomplished. When you add a photograph of your dad and sort “put him in a tuxedo at a marriage,” the mannequin processes your phrases and the picture pixels in a unified area, then outputs new pixels the identical manner it could output the following phrase in a sentence.
Utilizing this method, GPT Picture 1.5 can extra simply alter visible actuality than earlier AI picture fashions, altering somebody’s pose or place, or rendering a scene from a barely completely different angle, with various levels of success. It could possibly additionally take away objects, change visible types, alter clothes, and refine particular areas whereas preserving facial likeness throughout successive edits. You may converse with the AI mannequin a couple of {photograph}, refining and revising, the identical manner you may workshop a draft of an electronic mail in ChatGPT.
For many of images’s roughly 200-year historical past, altering a photograph convincingly required both a darkroom, some Photoshop experience, or, at minimal, a gentle hand with scissors and glue. On Tuesday, OpenAI launched a software that reduces the method to typing a sentence.
It’s not the primary firm to take action. Whereas OpenAI had a conversational image-editing mannequin within the works since GPT-4o in 2024, Google beat OpenAI to market in March with a public prototype, then refined it to a preferred mannequin known as Nano Banana picture mannequin (and Nano Banana Professional). The enthusiastic response to Google’s image-editing mannequin within the AI neighborhood obtained OpenAI’s consideration.
OpenAI’s new GPT Picture 1.5 is an AI picture synthesis mannequin that reportedly generates photos as much as 4 instances sooner than its predecessor and prices about 20 % much less by the API. The mannequin rolled out to all ChatGPT customers on Tuesday and represents one other step towards making photorealistic picture manipulation an off-the-cuff course of that requires no specific visible abilities.
The “Galactic Queen of the Universe” added to a photograph of a room with a settee utilizing GPT Picture 1.5 in ChatGPT.
GPT Picture 1.5 is notable as a result of it’s a “native multimodal” picture mannequin, that means picture era occurs inside the identical neural community that processes language prompts. (In distinction, DALL-E 3, an earlier OpenAI picture generator beforehand constructed into ChatGPT, used a special method known as diffusion to generate photos.)
This newer kind of mannequin, which we lined in additional element in March, treats photos and textual content as the identical sort of factor: chunks of knowledge known as “tokens” to be predicted, patterns to be accomplished. When you add a photograph of your dad and sort “put him in a tuxedo at a marriage,” the mannequin processes your phrases and the picture pixels in a unified area, then outputs new pixels the identical manner it could output the following phrase in a sentence.
Utilizing this method, GPT Picture 1.5 can extra simply alter visible actuality than earlier AI picture fashions, altering somebody’s pose or place, or rendering a scene from a barely completely different angle, with various levels of success. It could possibly additionally take away objects, change visible types, alter clothes, and refine particular areas whereas preserving facial likeness throughout successive edits. You may converse with the AI mannequin a couple of {photograph}, refining and revising, the identical manner you may workshop a draft of an electronic mail in ChatGPT.

















