AIOps vs. MLOps: exploring the differences and unlocking the benefits for CSPs

18 July 2023 | Research

Joseph Attwood

Article | PDF (6 pages) | AI and Data Platforms


network_technical_735x70_1212064060.jpg

AIOps refers to the application of artificial intelligence (AI) insights to IT and network operations. Communications service providers (CSPs) are adopting AIOps to cope with the increasing complexity associated with managing their IT and network environments, especially as they evolve these environments to become cloud-native. CSPs are also adopting machine learning (ML) operations (MLOps) solutions to simplify the development – and enhance the accuracy – of their AI/ML models. MLOps will be applied to models that are used as part of AIOps solutions and also for other AI use cases that support business functions such as marketing, customer care and network optimisation.

What is AIOps?

AIOps combines the use of big data and ML technologies to simplify IT and network operations. AIOps solutions collect historical and real-time telemetry data from systems as well as data from IT and network operations tools, including monitoring and ticketing systems. These solutions use an ML model (or AI engine) to analyse and correlate this data. As illustrated in Figure 1, the AI engine can identify operational issues, recommend remedial actions to engineers and trigger automated responses.

Figure 1: The steps involved in AIOps

Figure 1: The steps involved in AIOps


Unlike CSPs’ current monitoring tools, which operate in a single domain, AIOps solutions can correlate ingested data across multiple data sources in different domains. This means that AIOps solutions can serve as an overlay solution that can analyse data from disparate toolsets to identify anomalies and patterns hidden in data coming from siloed data environments. With the right data modelling capabilities in place, events in one system can be contextualised using information held in other systems, which makes it easier for IT and network engineers to understand and resolve issues.

AIOps solutions commonly automatically discover IT and network asset topologies, an environment’s resources and the dependencies of applications and services. By creating and maintaining an up-to-date model of IT and network environments, issues can be easily traced throughout/between assets to find their root cause. 

AIOps solutions continually improve as their constituent AI/ML models learn from the data to which they are exposed. For example, AIOps solutions can learn appropriate responses to issues by analysing ticketing records to understand how engineers have resolved past issues (as well as through explicit specification of how to respond to particular types of issue).

What is the difference between AIOps and MLOps?

AIOps and MLOps are two concepts that are often misunderstood in the telecoms industry. Both concepts relate to the AI/ML and the adoption of DevOps1 principles and practices. AIOps is an AI/ML use case that is applied to IT and network operations while MLOps addresses the development of ML models and their lifecycle management. 

AIOps requires MLOps capabilities to enable CSPs to quickly develop and deploy AI models in their AIOps platforms and to prevent the degradation in performance of AIOps solutions.

What are the use cases of AIOps for CSPs?

AIOps will help CSPs to simplify operations and to analyse the increasingly large volumes of data that they generate. The most effective way to generate insights from such vast volumes of data is by using AI/ML. Therefore, many CSPs believe that AIOps will be a key part of their new operating models

Examples of AIOps use cases are given below. All of these examples involve correlating data from IT and network environments end-to-end, across layers and across different platforms and technologies and then generating insights from this data. The use cases that involve IT and network assurance and managing cloud-native environments also use AIOps for troubleshooting and problem remediation.

  • Intelligent alert handling. AIOps solutions can filter out ‘noisy’ (false positive or unhelpful) alerts and prioritise alerts to ensure that IT and networks operational teams address the most-relevant issues within their environments.
  • Supporting DevOps. AIOps can enhance DevOps-based approaches to software development by generating insights from logs, metrics, traces, usage data and feedback from newly released software. These insights can be used by developers to improve future software releases. For example, to support DevOps, Turkcell has built its own AIOps platform on top of Red Hat OpenShift.
  • Security. Verizon and Vodafone are examples of CSPs that are using AIOps to improve IT and network security through early detection of anomalies that may indicate cyberattacks (such as distributed denial of service (DDoS) attacks).
  • IT and network assurance. AIOps solutions can be used by operations teams to identify and resolve issues. AIOps tools can be used to analyse current trends to detect and predict issues that may arise in the future, which will allow CSPs to take preventative steps to resolve potential issues. In advanced solutions, remediations can be activated automatically, facilitating closed-loop automation. BT has stated that it plans to use Dynatrace’s AIOps solution to introduce self-healing capabilities by 2025. However, many early CSP adopters are focussing their use of AIOps on anomaly detection.
  • Managing cloud-native environments. Cloud-native methodologies break down traditional monolithic applications into ephemeral, loosely coupled microservices. AIOps can reduce some of these additional complexities associated with managing cloud-native applications and networks as explained in our report AIOps: supporting operations of CSPs’ IT and network environments.

What challenges do CSPs face with AI/ML?

CSPs want to simplify ML development, improve the success rate of initiatives that use AI/ML and ensure the utility of AI/ML models in production. They must overcome three key challenges with AI/ML.

  • Speeding up and scaling ML model development. To accommodate the increasing demand for AI, data scientists in CSPs’ organisations must stop relying on manual, ad hoc ML development practices. They must speed up model development by improving collaboration and by introducing more automation. Accelerating AI/ML model development was a key driver for the development of Vodafone’s AI Booster platform, which is based on Google Cloud’s Vertex AI platform.
  • Increasing the useful life of models. Models’ performance degrades over time due to model drift. This happens when the characteristics of the data that the model is acting on become dissimilar to the training dataset. For example, a model that has been used to analyse and make predictions about network traffic may begin to perform poorly if traffic volumes are much higher than those in the training dataset or if new infrastructure is added to the network. CSPs need to ensure that their AI/ML models can adapt to changing environments.
  • Improving model management and governance. The performance of models in production must be monitored so that poorly performing models can be decommissioned or updated in a timely manner. Furthermore, CSPs need to comply with increasingly stringent regulation surrounding model governance, auditability, interpretability and explainability, especially as GenAI capabilities are being implemented within AI solutions used to operate IT systems.

What is MLOps and how does it benefit CSPs?

MLOps refers to both machine learning operations and to DevOps for machine learning. The concept of MLOps encompasses the entire lifecycle of machine learning models, including how training data is prepared, how models are developed, trained and deployed, and how models in production are monitored and maintained. Figure 2 illustrates how MLOps is used at each stage of the ML model lifecycle.

Figure 2: The activities involved in MLOps

Figure 2: The activities involved in MLOps


MLOps introduces automated pipelines for ML development, which reduces the time needed to build models. Furthermore, because pipelines are reusable, this reduces duplication of effort and improves collaboration. In addition, MLOps solutions commonly use this automation to improve model auditability, traceability and reproducibility by automatically versioning models, capturing ML metadata and storing the features used to create each model.

MLOps solutions continuously monitor ML models in production to detect degrading performance and model bias. A poorly performing model can automatically trigger a workflow that will automatically retrain the model on the latest available data. The integration of MLOps and CI/CD methodologies will allow CSPs to rapidly deploy new iterations of ML models with little human intervention required.

MLOps capabilities can be provided by public cloud providers, for example through Amazon SageMaker and Google Cloud’s Vertex AI, and independent software vendors such as DataRobot and H2O.ai. Open-source MLOps solutions such as MLflow and Kubeflow can also be used. The decisions that CSPs make about MLOps toolsets will be informed by the skillsets that they possess and by their ability to follow MLOps best practices. 

For more information, see our report MLOps: streamlining machine learning for CSPs.


1 DevOps is the practice of bringing together and introducing automation into the work of software development and operations teams. MLOps is therefore the application of DevOps practices to ML.


This article was written by Joseph Attwood. Joseph Attwood is a research analyst who is supporting programmes in Analysys Mason’s Cloud research practice, including the Data, AI and Development Platforms research programme. He has conducted primary and secondary research on AIOps and MLOps in the telecoms industry and authored reports and company profiles on these topics.

Author

Joseph Attwood

Analyst