High Cardinality at Scale: Rethinking Observability for Cloud- native Environments. Download here

Platform
Fleet
Flow
Lake
Observe

Fleet

Fleet Management transforms the traditional, static method of telemetry into a dynamic, flexible system tailored to your unique operational needs. It offers a nuanced approach to observability data collection, emphasizing efficiency and adaptability.

Learn More

FLEET management

Download

100% Pipeline control to maximize data value. Collect, optimize, store, transform, route, and replay your observability data – however, whenever and wherever you need it.

Learn More

Capabilities

Filter/Reduce >

Mask/Transform >

Enrich >

Route >

Reply >

Apica’s data lake (powered by InstaStore™), a patented single-tier storage platform that seamlessly integrates with any object storage. It fully indexes incoming data, providing uniform, on-demand, and real-time access to all information.

Learn More

Capabilities

Compliance >

Search >

Replay >

The most comprehensive and user-friendly platform in the industry. Gain real-time insights into every layer of your infrastructure with automatic anomaly detection and root cause analysis.

Learn More

Capabilities

Logs >

Metrics >

Traces >

Synthetic Monitoring >

Time Series Database >

Apica Test Data Orchestrator >
Resources

Resources
Events & Webinars
Videos
Blog
DOCUMENTATION

Resources

Solution Briefs

Case studies

Datasheets

White Papers

Brochures

Apica Ascent Freemium Launch

Download

Events & Webinars

Join us for live and virtual events featuring expert insights, customer stories, and partner connections. Don’t miss out on valuable learning opportunities!

Learn More

Apica at Boomi World 2025

Learn More

Videos

Dive into valuable discussions and get to know our company through exclusive video content.

Learn More

Who is Apica?

Blog

Articles and guides that help you make data-driven decisions

Learn More

Apica Ascent Freemium
Free Enterprise-Grade Telemetry Data Management and Observability is Here: Introducing Apica Freemium

Learn More

DOCUMENTATION

Find easy-to-follow documentation with detailed guides and support to help you use our products effectively.

Apica Docs

Search Docs

Ascent API Documentation
Solutions

Overview
By Industry
By usecase
By Technology

Overview

How it works

InstaStoreTM

Experience Ascent

Integrations

ROI Calculator

by industry

Banking and Finance

Manufacturing

Government

Healthcare

IOT and IIOT

Media and Entertainment

Retail

by usecase

Telemetry Pipeline + Observability

Plan B for Native Observability

Compliance

Generative AI Assistant

Apica and Splunk integration

Hybrid Cloud Monitoring

Consolidated Monitoring

AI and LLM Observability

by technology

AWS Observability

Kubernetes Monitoring

OpenTelemetry

IoT and IIoT
Company

About Us
Security
News
Leadership
Partners
Careers

About Us

Apica keeps enterprises operating. The Ascent platform delivers intelligent data management to quickly find and resolve complex digital performance issues before they negatively impact the bottom line.

Learn More

Apica ESG Report 2025

Download

Security

In a world in constant motion where threat actors are everywhere it is important to always improve the security in all parts of your organization. We believe that is done by leveraging industry best practices and adopting the latest technology. We are proud to be both ISO27001 and SOC2 certified and thus your data is safe and secure with us.

Learn More

News

Stay updated with the latest news and press releases, featuring key developments and industry insights.

Learn More

Apica Launches Ascent Freemium to Democratize Intelligent Telemetry Data Management and Observability.

Learn More

Leadership

Meet our leadership team, dedicated to driving innovation and success. Discover the visionaries behind our company’s growth and strategic direction.

Learn More

Apica Partner Network

Join the Apica Partner Network and collaborate with industry leaders to deliver cutting-edge solutions. Together, we drive innovation, growth, and success for our clients.

Learn More

Apica + Oracle

Apica + Boomi

Careers

Build your future with us! Explore exciting career opportunities in a dynamic environment that values innovation, teamwork, and professional growth.

Learn More
Login

Try for Free, No Risk
Load Test Portal
Monitoring Portal

Get Started Free

Get Enterprise-Grade Data Management Without the Enterprise Price Tag Manage Your Data Smarter – Start for Free

Learn More

Load Test Portal

Ensure seamless performance with robust load testing on Apica’s Test Portal powered by InstaStore™. Optimize reliability and scalability with real-time insights.

Learn More

Monitoring Portal

Access the Monitoring Portal (powered by InstaStore™) to view live system performance data, monitor key metrics, and quickly identify any issues to maintain optimal reliability and uptime.

Login

Log aggregation tools : design considerations

Log formats, Observability
December 14, 2020

Log aggregation tools : Design Considerations

Log aggregation tools are essential for a company to be agile and secure. Log management is not only used for troubleshooting issues but is also the building block for any security strategy being adopted by an enterprise.

We recently looked up and close at how a top-performing cloud company had architected its log management infrastructure and the log aggregation tools that it used for their implementation in AWS. This write-up first describes the log system implementation observed on AWS at the company. We will then explore the architecture and see why solutions like this, while popular, results in an architecture that is hard to manage over time with growing costs, and resource usage.

Centralize log management is always a preferred solution for handling enterprise logging. Application, IT system, and cloud micro-service logs are collected and managed in one central location, for example, AWS elasticsearch.

DevOps and automation teams play a central role in developing and maintaining the log management infrastructure. This mean source level log monitoring and log analysis preparation needs to be done first. See [1]. As nice as the system looks from an overview diagram, there are engineering overheads to address the infrastructure’s computing, storage, and budget resource limitation.

The ingestion pipeline first filters the log files by extracting the useful portion of the log so as to reduce noise. The trimming directives usually come from the log data end-user, such as a data scientist or system analyst. They communicate with the DevOps/automation team to create customized log extraction filters for deployment from existing log files. The goal is to control the logging volume size and content. The DevOps/Automation team builds the ad-hoc best-effort log filters, and the process is laborious and error-prone.

This log reduction filter control is in place for the log data infrastructure limitation. In this example, it uses Amazon elastic search service. Both the compute and storage resources for indexing needs to be checked for a healthy logging system. To maintain a performing stable operating state, the total number of ingesting logs is controlled at 30GB/day, see [8], and total backlogs are retained for 2 weeks. Data are backed to an economical Amazon S3 storage after that.

Different business functional units for the company deal with different log data and different usage. For example, the performance and capacity team extracts metrics from logs to model system usage trends and forecast future demand. The customer business unit would extract metrics to analyze customer insight and creates business values for example, customer churn. The DevOps would maintain the system’s operating state over SLA (Service Level Agreement) requirement. The log data metrics extraction and the subsequent analytics are highly customized, flexible, and fluid. Each functional unit is specifically created to solve specific business problems. It is highly desirable to utilize AI/ML techniques and methodology, see [7]. For example, holistically, log ingestion data pipe can be enhanced with tag or label to facilitate later AI/ML analysis. All the mutable fields in the log are automatically extracted for analysis.

Amazon elasticsearch infrastructure, in the design above, can scale-out, but often requires some degree of trial-and-error and knowledge of elasticsearch internals and clustering. There are other factors to consider. For e.g. how much data to store and how long. In this example, the incoming log data was capped at about 30GB/day with a 2-week log data retention time to keep the system within the operating budget, and desired system performance. The system always holds about 500GB of log records for processing. The overflown log is backup into Amazon S3 at $25/TB-month for an indefinite period. Someone needs to understand how to load this data back if it is ever needed. This is a non trivial job function. The AWS hosting of such a setup is around $60k/year, not including engineering and operator costs and knowledge base needed to manager hybrid storage design.

Here at apica.io we have solved the problem of hybrid storage designs for log management. Here is a similar log infrastructure setup using Apica building blocks. The system now becomes simpler because storage limitations are alleviated with the use of S3 storage as primary data store. You have now significantly simplified the log aggregation tools that you need to run this with minimal overheads. See figure below,

Apica log management infrastructure removes the engineering build-in infrastructure overhead. The new construct is efficient and straightforward. The table below lists the infrastructure resource engineering overheads, and the list tags are from an earlier drawing

Tag	Description	Overhead Action	Apica
[4]	Need for reducing log ingestion to save log process resources. Consult and communicate with log end-user.	The process is error-prone due to log and apps changes and requirements.	Simple log data ingestion pipeline
[5]	DevOps implement log filter to reduce ingest log count	Implementations and validate the stored log with end-user	No need to trim log data. Store un-redacted logs into S3 storage.
[8]	Maintain constant overhead and scale-out ELK if needed	Elastic search service does not scale seamlessly.	Apica scales easily using K8S pods
[9]	Daily backup the oldest logs to S3 to maintain stable log working set size	Add backup and no easy process for re-using the backup log data	Directly operate on S3 storage– Searching, Reporting, events, AI/ML analysis, etc.

In summary, this article describes an operating log infrastructure setup on AWS. It also presents a similar Apica log infrastructure setup. Apica log management infrastructure removes induced engineering overhead because of its competitive advantage in the native use of S3 storage.

There are plenty of references on the web about scaling elastic search service and the common consensus from these references is such a task is not for the faint heart.

The Apica blog

Let’s keep this a friendly and inclusive space: A few ground rules: be respectful, stay on topic, and no spam, please.

Discover Apica in Action

See how Apica Ascent helps you with quality testing with comprehensive monitoring and intelligent test data management.
Schedule a demo today to explore the Apica Ascent platform.

Fleet

FLEET management

Capabilities

Capabilities

Capabilities

Resources

Apica Ascent Freemium Launch

Events & Webinars

Apica at Boomi World 2025

Videos

Who is Apica?

Blog

Apica Ascent Freemium

DOCUMENTATION

Overview

by industry

by usecase

by technology

About Us

Apica ESG Report 2025

Security

News

Apica Launches Ascent Freemium to Democratize Intelligent Telemetry Data Management and Observability.

Leadership

Apica Partner Network

Careers

Get Started Free

Load Test Portal

Monitoring Portal

Log aggregation tools : design considerations

Log aggregation tools : Design Considerations

Apica Team

The Apica blog

Leave a Comment Cancel reply

Table of Contents

Share this article

Related articles

Discover Apica in Action

Follow us on: