Apple, in collaboration with the University of California, Santa Barbara, has unveiled MGIE, a cutting-edge AI model transforming photo editing through intuitive text prompts. This MLLM-Guided Image Editing model enables users to articulate changes, from basic adjustments to intricate transformations, without conventional editing tools.
Multimodal Precision and Imagination
MGIE harnesses multimodal language models, seamlessly interpreting user prompts and envisaging edits. For instance, instructing the model to “make the sky bluer” translates into increased brightness for the sky portion. The fusion of linguistic input and imaginative rendering sets MGIE apart, making both simple and complex edits accessible.
User-Friendly Editing Experience
Editing with MGIE is as simple as typing desired changes. An example featuring a pepperoni pizza image showcases the ease—typing “make it healthier” introduces vegetable toppings. Similarly, transforming a dark image of tigers in the Sahara involves instructing the model to “add more contrast to simulate more light,” resulting in a brighter outcome.
Apple has made MGIE open source on GitHub, accompanied by a web demo on Hugging Face Spaces. The company remains tight-lipped about the model’s future applications beyond research, notes NIX Solutions.
Diverse Landscape of AI Image Generation
While giants like Microsoft, Meta, and Google dominate generative AI, Apple aims to bolster its AI capabilities. CEO Tim Cook’s commitment to enhancing AI features on Apple devices aligns with the release of MLX, an open-source machine learning framework for training AI models on Apple chips.