Apica Ascent is powered by Oracle Cloud Infrastructure and is now available in the Oracle Cloud Marketplace. Learn More

Products

OVERVIEW

test

How it works
Architecture and components

InstaStore^TM
Data Storage for the Modern Enterprise

Experience Ascent
Navigate Your Tech Terrain Effortlessly with Apica’s Ascent Experience

Platform Reliability
Security, Compliance and Scale

Integrations
Inbound and outbound integrations

Generative AI Assistant
Unleashing the Power of contextualized Data

ROI Calculator
Calculate Your Observability Costs Seamlessly

OBSERVE

Active Observability

Logs
Log aggregation, management & analytics

Metrics
Application & infrastructure metrics

Traces
Trace transactions between distributed services

Convergence
Converge and analyze any data source

Synthetic Monitoring
Apica Synthetic Monitoring Built for Proactive Enterprises

LoadTest
Know How Your Apps Will Perform in Any Circumstance

Advanced Scripting Engine
Apica’s Powerful Scripting Engine

Time Series Database
Faster, Efficient, and easier to operate and Scale

FLOW

Pipeline Control

Filter/Reduce
Optimize spend and remediate faster

Mask/Transform
Improve compliance and interpret better

Enrich
Supercharge analytics and improve predictions

Route
Send right data to right target every time

Replay
Instantly replay historical data to any target

LAKE

Compliance & Search

Compliance
Petabyte-scale indexing and instant retrieval

Search
Instantly search and visualize at petabyte-scale

Replay
Instantly replay historical data to any target

Featured Articles

Data Lakes: A Comprehensive Guide
OpenTelemetry VS Prometheus: The Essential Guide
Log Management: The Apica Way

How To Choose the Best Observability Tools
What is OpenTelemetry? A Comprehensive Guide
What is Observability? The Bigger Picture
Resources

Get Started

Get Started With Our Free Tier!

REQUEST DEMO

LEARN MORE

COMPARE

E-Books
FREE e-books on technology and observability topics

Solution briefs
Learn more about Apica in these solution briefs.

Datasheets
Get a brief introduction to our key products with Datasheets

Brochures
Get a quick overview of our products with Apica’s brochures

Videos
Get the most out of Apica though these video demos.

Case studies
Get detailed case studies of Apica’s solutions to real-world challenges.

White Papers
Get a thorough insight of Apica via our comprehensive white papers

Try out Apica
Learn how to use Apica with our quick start guide

BLOG
Articles and guides that help you make data-driven decisions

How does Apica Compare
See how we stack against other vendors

Featured Articles

Data Lakes: A Comprehensive Guide
OpenTelemetry VS Prometheus: The Essential Guide
Log Management: The Apica Way

How To Choose the Best Observability Tools
What is OpenTelemetry? A Comprehensive Guide
What is Observability? The Bigger Picture
Solutions

BY INDUSTRY

BY ROLE

BY USECASE

BY TECHNOLOGY

Banking and Finance
Money, shares, credit, investments

Manufacturing
Streamline your business data with Apica

Government
Empowering Data Control and Mission Resilience

Healthcare
Facilitate the provision of healthcare to patients

IOT and IIOT
Physical objects with sensors, processing ability, software etc

Media and Entertainment
Film, television, radio, print, and gaming

Retail
Sale of goods and services to consumers

Compliance Manager
Comply with industry regulations

DevOps Engineer
Diagnose and troubleshoot complex problems

IT Ops
Maintain high reliability for your business

SOC Analyst
Secure hybrid cloud operations and protect your business

Active Observability
100% visibility with apica.io’s Active Observability Solution

Plan B for Native Observability
100% Observability with zero risk at 1/10th the cost.

Compliance
Petabyte-scale indexing and instant retrieval

Generative AI Assistant
Unleashing the Power of contextualized Data

Apica and Splunk integration
Unlock the Power of Real-Time Analytics

Hybrid Cloud Monitoring
Monitor Public, Private, and Hybrid Cloud Environments

Consolidated Monitoring
Embrace a Unified Observability Platform

AWS Observability
Gain insights into the behavior, performance, and health of your system

Kubernetes Monitoring
Leverage Kubernetes environments to identify services, pods, metrics, etc

OpenTelemetry
Unlock business insights and improve efficiency with Apica’s OpenTelemetry integration

IoT and IIoT
Ensure high levels of data-driven decision-making and powerful business outcomes

Featured Articles

Data Lakes: A Comprehensive Guide
OpenTelemetry VS Prometheus: The Essential Guide
Log Management: The Apica Way

How To Choose the Best Observability Tools
What is OpenTelemetry? A Comprehensive Guide
What is Observability? The Bigger Picture
Documentation

Get Started

Get Started With Our Free Tier!

REQUEST DEMO

DOCUMENTATION

GET STARTED

QUICKSTART GUIDES

Apica Docs

Search Docs

Observability Glossary
Learn more

User Guide
Step-by-Step instructions for common tasks

Apicactl
Integrate with automation and scripted worflows.

ApicaHub
Free dashboards for popular applications

K8S
Step-by-Step instructions to deploy Apica in Kubernetes

Sandbox
Run Apica in a Docker Compose sandbox
Company
Get Started

Get Started With Our Free Tier!

REQUEST DEMO

Company

About Us

Security

News

Events

Leadership

Partners

Apica + Oracle

Apica + Boomi

Careers
Login

Get Started

Get Started With Our Free Tier!

REQUEST DEMO

Login

Load Test Portal

Monitoring Portal

Prometheus, logs and root cause analysis

Log Management, Monitoring, Uncategorized, Visualization
March 30, 2022

Prometheus is a wildly deployed open source monitoring system for time series metrics. For observability use cases, it is important to bring together logs and metrics for root cause analysis into a tightly integrated framework to help faster root cause.

A Prometheus deployment is configured with scrape targets from which metrics are collected periodically. The data is stored in a multi-dimensional data model with metric data stored along with a set of key-value pairs, commonly referred to as labels. This allows data query by one or more dimensions as well as perform aggregate queries such as rate/sum etc. Here’s an example of how a multi-dimensional query looks like in Prometheus

round(sum(increase(message_count{application="ingress"}[30d]))/1000000,0.01)

In the above query, the message_count metric is filtering data by the label/dimension application

Log data tends to be mostly unstructured with some amount of structured payload added by agents and collectors. When an interesting event occurs in the log, one possible workflow is to record it as a metric in Prometheus. This brings logs and metrics together. This can serve as a powerful way to visualize what is happening in your logs. Extracted data from your logs can become labels making dimensional queries possible.

Let’s take an example where we want to visualize when invalid ssh logins are printed in my logs for the sshd process running on Linux instances. This can be very useful as we could then proceed to set up Prometheus and Alertmanager alerting rules if the invalid logins exceed a certain threshold in a 5-minute interval. This could signal an impending attack on our infrastructure.

This requires a simple regex rule when processing the log line and a Prometheus counter to track when the regex matched. The counter is incremented when that happens.

Sample log : Invalid user cactus from 121.4.86.248 port 60112

Regex: log =~ "Invalid user"

We can then have a Prometheus counter metric that can track such regex matches. Let us see how this looks when we plot the data in Prometheus for an incoming stream. Nice! We can see unexpected spikes as well as what looks normal on our network.

The next step is going from the Prometheus graph back to the actual logs. This can be accomplished if the log stream is tagged with the event whenever a regex match is detected. E.g. in the example above let us see how that would look.

Incoming event

{
"log":"Invalid user cactus from 121.4.86.248 port 60112"
} After the regex match, transforms to  {
"log":"Invalid user cactus from 121.4.86.248 port 60112",
"event": "invalid_ssh_event"
}

The transformed event can now be indexed with your log system to go from logs -> metric -> logs. The metric allows you to now set alerts as well while being able to search when the events happened using the event key-value pair.

As you can see it is now simple to do root cause analysis using logs, Prometheus, and a bit of coding. Alternately if you want to try an integrated version of this flow in the apica.io platform, just get started with a FREE Trial and we can help you get started.

Let’s now see this in action with the apica.io platform where we have done the integration for you to go from logs to metrics and back to logs.

https://www.youtube.com/watch?v=-Uyays5TNDM

apica.io’s LogFlow platform allows real-time insights into your data flows as they flow into downstream systems like Splunk, Datadog, etc. Valuable insights can be tagged into data streams so when the data hits the target platform, it is tagged with labels that can assist and speed up root cause analysis in such systems.

Bringing together log data and APM data from Prometheus / Thanos fills a key gap where users frequently need to look at interesting events in their environment in an aggregated way as a time series graph but need to switch over to log/trace view where the actual event can be seen. This can help with reducing the time to triage that is common when these two data streams are disjoint.

The Apica blog

Let’s keep this a friendly and inclusive space: A few ground rules: be respectful, stay on topic, and no spam, please.

More insights. More affordable. Less hassle.

Make use of our valuable resources

Explore

Ready to get started?

Apica Platform

Features

Bring Your Own Bucket

Resources

About

Community

Leaving without a Demo?

Discover the power of Active Observability with Apica

Unlock the full potential of your data and cloud infrastructure with a personalized demo of Apica. See firsthand how our Apica Ascent platform can transform your data observability strategy, ensure scalability, flexibility, and deliver precision in every aspect of your operations.

Request Demo

test

Prometheus, logs and root cause analysis

Ranjan

The Apica blog

Leave a Comment Cancel reply

Table of Contents

Share this article

Related articles

More insights. More affordable. Less hassle.

Make use of our valuable resources

Leaving without a Demo?