Social media giant Meta has introduced two new generative artificial intelligence (AI)–based tools for editing images and videos for Facebook and Instagram uploads.
In a Nov. 16 post, Meta said Emu Video and Emu Edit would allow users to edit videos and images using text prompts. These tools are built on Meta’s Emu, the firm’s first foundational model for image generation.
The social media company furthered that the potential use cases of these tools are limitless as they can help people express themselves in new ways.
Meta did not reveal when these tools would become publicly available for users. The firm has yet to respond to CryptoSlate’s request for additional commentary.
Emu Video
Emu Video allows users to create four-second-long videos using text prompts and reference images. According to Meta, Emu Video leverages the firm’s Emu model with a text-to-video feature based on diffusion models.
The video editing process involves two steps. First, users generate images using text prompts. Then, they create videos using the previously generated image alongside its corresponding caption.
Additionally, the tool could “animate” user-provided images based on a text prompt.
Meta said:
“In human evaluations, our video generations are strongly preferred compared to prior work—in fact, this model was preferred over Make-A-Video by 96% of respondents based on quality and by 85% of respondents based on faithfulness to the text prompt.”
Emu Edit
The Emu Edit offers users a user-friendly tool to tweak images effortlessly.
According to the firm, the tool “streamlines various image manipulation tasks and brings enhanced capabilities and precision to image editing.”
The tool will allow users to manipulate the background of images, tweak the color and geometry of objects in the image, and perform many other functions.
Meta said:
“Emu Edit precisely follows instructions, ensuring that pixels in the input image unrelated to the instructions remain untouched.”
Meta’ Emu Edit tool can achieve this level of precision because it relies on a dataset that contains 10 million synthesized, the largest of its kind.