Summary

Completed

Note

See the Text and images tab for more details!

In this module, we explored vision-capable models in Microsoft Foundry and how to use them to analyze images and to generate original images and videos.

The module covered multimodal models, which support image analysis. We also covered image generation models, such as those in the GPT-Image family, for creating and editing images from prompts using Foundry tools and APIs. Finally, we introduced video generation with Sora models, which enable text‑to‑video and image‑to‑video creation through both interactive playgrounds and programmatic, asynchronous REST workflows.

Overall, visual AI models in Microsoft Foundry help bridge the gap between visual data and language‑based AI. They enable scenarios such as document and image analysis, visual assistants, accessibility tools, and multimodal AI agents—making image understanding a natural extension of modern AI applications.

To learn more, check out following links: