Building confidence in machine learning models for IT
|Richard Harris in Artificial Intelligence Wednesday, June 12, 2019|
OpsRamp introduces OpsQ Observed Mode for cloud-native discovery and monitoring to build confidence in machine learning models for IT .
OpsRamp announced OpsQ Observed Mode to build confidence in machine learning models for IT event and performance analysis. The Summer 2019 Release also introduces automated alert suppression to reduce human time spent on first-response to alerts, continuous learning-based alert escalation using live event data, and new infrastructure monitoring capabilities for cloud native environments.
According to OpsRamp’s 2019 State of AIOps report, 67% of respondents have concerns about the relevance and reliability of the insights delivered by artificial intelligence for IT operations (AIOps) tools. OpsQ Observed Mode enables IT teams to assess the accuracy of machine-learning-driven correlation decisions in preview mode, enhancing the integrity of data for improved decision-making.
"We already use the OpsQ event management engine to reduce alert storms from 200,000 raw events per month down to a more manageable 10,000 incidents per month,” said Tim Hebert, Chief Managed Services Office of Carousel Industries, a leading managed services and cloud services provider. “The OpsRamp Summer Release allows our infrastructure teams to understand the alert suppression capabilities of inference models before we commit to them, and that's tremendously beneficial in our event management workflow."
Highlights of the OpsRamp Summer 2019 release include:
Service-Centric AIOps: OpsQ is OpsRamp's intelligent event management, alert correlation, and remediation solution. New OpsQ capabilities help IT teams drive faster incident prioritization and rapid mean-time-to-resolution (MTTR) for dynamic infrastructure workloads and include:
- OpsQ Observed Mode: OpsQ Observed Mode helps incident management teams assess the accuracy of the OpsRamp machine learning algorithms in a live production environment before they take effect. Observed Mode creates shadow inferences that show alert correlation decisions that OpsQ would have made if enabled.
- Learning-Based Auto-Alert Suppression: OpsQ looks for recurring alert patterns in production environments and suppresses those alerts that occur at a predictable cadence. OpsQ uses seasonality-based and attribute-based auto-alert suppression techniques as a first-response mechanism so that incident responders no longer have to acknowledge, process, and triage every alert that they receive.
- Automatic Resource Creation from Third-Party Events: OpsQ now has the ability to auto-extract metadata for resources managed by other tools and use this information to automatically contextualize future alerts from these resources.
- Continuous Learning for Alert Escalation: Alert escalation policies support a continuous learning option for auto-incident creation. The OpsRamp platform continuously re-trains its machine learning models using live alert data, adapting to dynamic environments.
Service and Topology Maps: The Summer 2019 Release introduces new impact visibility and service context features that deliver dynamic relationship data for public cloud services and actionable insights for understanding cross-site interconnections.
- Cloud Topology for AWS: The new AWS topology map shows dependency information for cloud resources such as AWS EC2, VPC, RDS, or ELB instances so that DevOps teams can keep track of all the different moving parts in their public cloud estate.
- Cross-Site Connection Topology: OpsRamp network topology maps now incorporate routing layer relationships (BGP and OSPF) across WAN links.
Cloud Native Discovery and Monitoring: DevOps and site reliability engineering (SRE) teams can now monitor popular open source applications used in cloud native environments and access relevant performance insights for Mesosphere and Azure Stack in the OpsRamp platform.
- Out-of-the-Box Kubernetes Dashboards: OpsRamp can automatically create performance management dashboards for Kubernetes environments. IT teams can gain instant visibility into the health of containerized deployments by tracking cluster, pod, and node level metrics.
- Expanded Application Monitoring: OpsRamp now provides agentless monitoring for commonly used applications (ActiveMQ, Apache Spark, Apache Solr, CockroachDB, Couchbase, Apache CouchDB, Elastic Search, Fluentd, Neo4j Graph Platform, RabbitMQ) within cloud and cloud-native stacks.
- Mesosphere: OpsRamp can now discover and monitor Mesosphere-based cloud-native environments. The integration captures performance metrics for Mesos master and agent nodes that help optimize and scale modern enterprises apps built on dynamic infrastructure.
- Azure Stack: OpsRamp can discover and monitor network connections, virtual networks and load balancers in an Azure Stack environment. Cloud admins can analyze the availability and performance of their hybrid infrastructure in Azure Stack through the integration.
“Our customers have told us that they’d like to see how AIOps inferences proactively detect, diagnose, and address service continuity issues. OpsQ Observed Mode is a no-risk option for IT operations and DevOps teams to assess the accuracy and power of machine intelligence-driven event management, ” said Mahesh Ramachandran, VP of Product Management for OpsRamp. “The Summer 2019 Release provides modern IT infrastructure teams real-time intelligence to fix visibility gaps in their hybrid and multi-cloud environments.”