Monitoring

Definition of Monitor

A monitor observes the activity of a system.

Monitoring Concepts

A system contains physical and logical resources with state:
- CPU, RAM, disk, virtual memory, processes, threads, …
State changes as events:
- CPU/memory instructions, I/O operations, application requests…
A trace is a log of events:
- each entry at the log typically contains a timestamp and other details of the event.
The domain is the set of activities (events) observed:
- E.g., resource consumption (CPU, RAM, disk, network), I/O operations, database requests.
The detail of the information being captured (monitored) varies:
- According to the input rate:
  - how many events can the monitor observe per unit of time?
- According to the resolution:
  - at what granularities (ms, us, ns, etc) can the monitor observe events.
A monitor imposes overhead, changing the observed activity:
- E.g., the monitor also consumes system resources (e.g., CPU, RAM, I/O), such may interfere with the system being monitored and the observations.

Monitor Classification

What triggers the observation/collection of events?
- Event-driven: observation is triggered every time an event of interest occurs;
- Sampling: observation is triggered only for a subset of events (every 100 events, …);
- One is trading accuracy (observing as many events as possible - event-driven) for performance (less overhead introduced by the monitor - sampling).
When are the observations available for users and/or systems?
- On-line: collected events are sent straight away to the user and/or system (e.g., to a file or a database);
- Batch: events are grouped and sent in a batch to the user and/or system;
- One is trading real-time observation (on-line) for better performance (batch).
How is the monitor implemented?
- Hardware monitors are typically more accurate and have better resolution;
- Software-based monitors are usually less expensive, as well as more flexible and extensive in terms of the metrics and events one can observe.
How is the monitor designed distribution-wise?
- Centralized: the monitor service is deployed on a single node (e.g., server);
- Distributed: the service is spread, and even replicated, across several nodes.
What about the scope of monitoring distribution-wise?
- Single node: events collected for a single node;
- Distributed: events collected for a cluster of nodes;
- This may be orthogonal to the distributed or centralized design of the monitor service. Do not confuse the design of the monitor with its distribution scope!

Monitor Architecture

Observation of the raw events in systems (e.g., resource usage, I/O requests, …)
Collection and normalization of data (e.g., normalization of time units);
Analysis of normalized data (e.g., filter, querying, summarizing);
Presentation of the analysis results (e.g., dashboards, reports, alarms).

Observation Layer

Events of interest can be observed through different techniques;
Passive observation or spying: no need to instrument the resources and/or applications being monitored. The monitor passively observes the system;
Instrumentation: the hardware, operating system, and/or applications are instrumental to observe relevant information;
Probing: the monitor interacts (probes) with the system/application being observed to collect metrics.

Collection Layer

Observed events can be collected and normalized through two approaches:
- Push-based: events are sent by the observation layer to the collection one;
- Pull-based: events are pulled by the collection layer from the observation one.
Besides collecting data, the layer is used to normalize it (e.g., normalize time intervals, aggregate data, …);
In some cases, the layer may provide persistency (temporary storage) for collected events.

Analysis Layer

The analysis layer has two main goals;
Efficient and reliable data storage and indexing:
- if a comprehensive set of events is collected, the analysis component may have to persist and index large volumes of data.
Efficient data processing:
- stored and indexed data must be queryable, and sometimes analyzed in streams (e.g., for time-series analysis);
The choice of technology for this layer depends on the type of data being analyzed and the analysis requirements (e.g., time-series database, document-oriented database, …)

Presentation Layer

The presentation layer can be used for visualizing:
- Performance metrics of applications and/or systems (e.g., requests throughput and latency, CPU usage, RAM consumption, …);
- The configuration of cluster resources and the applications deployed in these;
- Errors at cluster resources and/or applications.
Users can use this layer through different visual representations (e.g., dashboards, reports, …)

Elastic Stack

Beats: lightweight data shippers (Observation and Collection):
- Purpose: to observe, collect and send data to Logstash or ElasticSearch for further processing and indexing.
Logstash: a data ingestion tool (Collection):
- Purpose: to collect, transform, and send data from the various sources to ElasticSearch.
ElasticSearch: a distributed, JSON-based search and analytics engine (Analysis):
- Purpose: to index, search, and analyze data.
Kibana: a data visualization and exploration tool (Presentation):
- Purpose: to visualize and manage data from ElasticSearch.

Monitoring and Actuation

Monitoring can be combined with actuation to automatically apply actions as the system;
The actions to apply are specified as policies (e.g., when a server’s CPU is above a given threshold migrate some VMs to another server);
The orchestrator is responsible for defining the best strategy for applying policies (e.g., choose what VM(s) to migrate and to which server(s));
The actuator is responsible for applying the strategy defined by the orchestrator (e.g., migrate the VM resource to another server).

📚 College Note's

Explorer

Definition of Monitor

Monitoring Concepts

Monitor Classification

Monitor Architecture

Observation Layer

Collection Layer

Analysis Layer

Presentation Layer

Elastic Stack

Monitoring and Actuation

Graph View

Table of Contents

Backlinks