
Microsoft researchers have released an open source artificial intelligence (AI) framework for agents running in cloud environments. It is called Aiopslab and is a principled research framework that enables developers to build, test, compare and improve AIOPS agents. The framework is powered by Azure AI proxy services. AIOPSLAB uses mediation interfaces, workload and fault generators, and an observability layer that displays various telemetry data. It is worth noting that the company said the framework’s research papers have been accepted at the annual ACM workshop on cloud computing (SOCC’24).
Microsoft releases AIOPSLAB for cloud-based proxy
Cloud-based services and businesses that leverage them often face significant operational challenges, especially in troubleshooting and mitigation. AIOPS Agent (also known as the AI Agent for IT operations) is a software-based tool used to monitor, analyze and optimize cloud systems and solve these operational challenges.
Microsoft researchers stressed in a blog post that these AIOPS agents rely on proprietary services and datasets in terms of event root cause analysis (RCA) or partitions, and use frameworks that are only suitable for a particular solution. This cannot capture the dynamic nature of real-world cloud services.
To address this pain point, the company released an open source standardized framework called developers and researchers, enabling them to design, develop, evaluate, evaluate and enhance the functionality of the agent. One of the basic ways it solves the problem is to use a strict intermediate interface to separate the proxy and application services. This interface can be used to integrate and extend other system parts.
This allows AIOPS agents to solve the problem step by step, thus mimicking real-life situations. For example, the agent can be taught to first find a problem description, then understand the description, and then use the available application programming interface (API) as an operation.
AIOPSLABS also comes with workload and failure generators that can be used to train these AI agents. It creates simulations of errors and normal situations to enable AIOPS agents to gain knowledge of the solution and eliminate any unnecessary behavior.
In addition, AIOPSLAB also has a scalable observability layer that provides monitoring capabilities for developers. As the system collects extensive telemetry data, the framework can only display data related to a specific agent, giving developers a granular way to make changes.
AIOPSLAB currently supports four critical tasks within the AIOPS domain – event detection, location, root cause diagnosis and mitigation measures. Currently, Microsoft’s open source AI framework is licensed on GitHub for personal and commercial use cases.