Sizing Guide

Logscape performance is dependent upon the following factors.

  • The volume of data being searched
  • The number of concurrent users
  • The number of alerts and their frequency
  • The style of search being executed

From a hardware standpoint CPU, Memory and Disk IO are all equally important. To maintain an acceptable user experience, the Disks must be capable of accessing the data fast enough, the CPU able to perform the needed operations on the supplied data and finally there must be enough RAM for both the Logscape processes and OS Kernel caching.

In every environment the ultimate goal is to have a CPU limited environment with enough memory to allow for off-heap useage by Logscape as well as OS kernel caching.

What is the User Experience?

The most common action a user will perform within any given Logscape environment, is search. To guarentee a pleasant experience we recommend sizing your environment so that searches over the last 24 hours worth of generated data, return within 10 seconds.

File System

In order to gather accurate metrics it is reccomended that you use a tool such as ioZone. However rough assumptions can be made based upon disk type.

Type Disk Speed
Mechanical Drive 150Mb/sec
SSD 500Mb/sec
SSD x2 1Gb/sec
(SSD)RAID 5+1 2.5Gb/sec

Using these device speeds, in a scenario where we are searching 10 GB of data, we get the following search times.

Device Dataset Size IO Rate Search Duration
Mechanical Disk 10Gb 150Mbs 66 Seconds
SSD 10Gb 500MBs 20 Seconds
SSDx2 10Gb 1GBs 10 Seconds
(SSD) RAID 5+1 10Gb 2.5GBs 4 Seconds
NOTE:
  • Realworld performance depends upon the processors ability to cope with throughput.
  • OS kernel caching will improve performance where enough memory is left free (50%)
  • Larger deployments will see users searching different sets of data
  • NFS mounted drives will be dependent upon sharing and latency
  • The load of virtualised infrastructure may impact performance and suffer latency


Measuring Cache Performance

Linux cache performance can be monitored using the 'perf' command line tool, you can read more about perf here.

Perf provides metrics about many parts of an application. For this scenario we are interested in "cache misses", a cache miss is generated when the application is forced to read from disk, rather than RAM or the cache. Ideally we would never read from the disk, but this scenario is unlikely, so we're going to aim for below 5% cache misses.

First, we need to determine which is the parent Logscape process, since Logscape runs as part of the JVM and spawns child processes it could be quite difficult. However by using ps aux and grep we can locate it easily

ps aux | grep vsomain -i

This will provide you with the process id, name and command line paramaters, we're only interested in the process id.

We're now going to make use of perf in order to measure the performance of the Logscape process.

perf stat -p [process_id] sleep 30

After 30 seconds this command will output a variety of metrics regarding the performance of the vsomain process, the metric we are interested in, is "cache misses" if this value is below 5% of all cycles then cache performance is acceptable.

OS Tuning

The Manager and Index Store use considerable operating system resources in large deployments. The number of concurrent open files, network connections and processes can be configured for Linux. Use the following system limits as a guide.

Property Description
nofile > =40000 open file 40000 or more. If the IndexStore hits this limit, Logscape will not be able to process new files or create network connections
nproc >= 100000 If the Index Store hits this limit Logscape will throw an exception each time a new thread is created

You can find more metrics about sizing a linux installation here

Other Considerations

How the number of users affect the Manger - Keep a watch on the memory utilization of the aggspace and dashboard processes when the number of users of the system increases significantly. You may need to increase the heap of these processes ( see table ).

com.liquidlabs.([A-Za-z\.0-9]+)_ | _type.equals(unx-ps) RSZ_MB.max(1,) chart(line-zero)

Large Alert Volumes - If you have hundreds of alerts in your deployment, watch the memory profile of the aggspace and allocate more memory when required.

Indexers - The typical deployment contains a Manager, search operations and data balanced across a few Index Stores and many Forwarders. Some environments make use of Indexers. If you have over 20 indexers in your environment increase the size of your aggspace heap.