Introduction
Computer vision is a field of AI that enables machines to interpret and understand visual information from the world—such as images, videos, and live camera feeds. Computer vision capabilities are powered by AI models and support the automation of all kinds of time-intensive tasks.
This module will discuss AI models that can identify and analyze objects, recognize patterns, read text within images, and interpret scenes much like a human would. The module also covers visual AI models that can go beyond image analysis to generate new visual content. Together, these capabilities enable a wide range of applications from image search and document analysis, to creative tools and interactive AI experiences, by allowing systems to both see and create visual information.
Consider these applications of computer vision:
Defect detection in manufacturing: AI vision systems inspect products on assembly lines in real time. They detect surface defects, misalignments, or missing components using object detection and image segmentation, reducing waste and improving quality control.
Medical imaging analysis: Computer vision helps radiologists analyze X-rays, MRIs, and CT scans. AI models can highlight anomalies like tumors or fractures, assist in early diagnosis, and reduce human error.
Shelf monitoring in retail: Retailers use AI vision to monitor store shelves. Cameras detect when products are out of stock or misplaced, enabling real-time inventory updates and improving customer experience.
Autonomous vehicles: Self-driving cars rely on computer vision to recognize road signs, lane markings, pedestrians, and other vehicles. This enables safe navigation and decision-making in dynamic environments.
Next, explore multimodal models in Microsoft Foundry, Microsoft's unified platform-as-a-service offering on Azure for enterprise AI operations and application development.
Note
We recognize that different people like to learn in different ways. You can choose to complete this module in video-based format or you can read the content as text and images. The text contains greater detail than the videos, so in some cases you might want to refer to it as supplemental material to the video presentation.