04-800-K AIOps: Continuous and Automated IT and AI Monitoring
Location: Africa
Units: 12
Semester Offered: Fall
Location: Africa
Units: 12
Semester Offered: Fall
This course builds on and integrates S/W engineering, AI/ML, and IT skills in support of automated methods for assuring the highest level of system availability and resilience. Students will apply tools including Docker, Kubernetes, and AI-based models for anomaly detection to monitor and correct hybrid cloud applications as they experience simulated disruptions and outages. Lab exercises will deploy such monitoring tools in continuous integration/delivery pipelines for proactive control as opposed to traditional reactive and manual incident response.
The course will focus on automated monitoring of both IT components (e.g. code-based implementations of a distributed application) and AI/ML components (e.g., models and their associated pre- and post-deployment pipelines).
The course will review basic concepts of DevOps including Docker, CI/CD pipelines, and the microservices architectures used in hybrid cloud deployments. This background provides preparation for deep dives into real-time operational data gathering and the automated tools available for anomaly detection, reporting, and even self-correction.
For IT monitoring, multiple types of performance data will be used in automated monitoring including structured metrics such as CPU, memory, and network utilization as well as emerging methods for analyzing unstructured content such as textual information in application logs.
For AI monitoring, extensions of DevIOps methods for AI will introduce ModelOps and its application to multiple types of quality metrics for automated model monitoring. Model accuracy, precision, recall, and bias will be evaluated for initial deployment and tracked for post-deployment drift over time leading to predicted violation of quality standards.
Hands-on lab work will include tools for configuring and executing pipelines for continuous integration and delivery, deploying applications as microservices with embedded monitoring instrumentation, dashboards for collecting performance data, and multiple methods for real-time tracking of such data with automated anomaly detection and repair.
The course consists of weekly hands-on assignments as well as a final project to integrate the project methods covered in the class.
In this course, we will:
By the end of this course, you will be better able to:
The class is taught through weekly lectures and assignments according to this general schedule:
Grading is based on written assignments, a final portfolio of work, participation, and attendance
Strong background in Python programming and exposure to DevOps and Cloud platforms such as Docker and microservices.