Logscape performance is dependent upon the following factors.
From a hardware standpoint CPU, Memory and Disk IO are all equally important. To maintain an acceptable user experience, the Disks must be capable of accessing the data fast enough, the CPU able to perform the needed operations on the supplied data and finally there must be enough RAM for both the Logscape processes and OS Kernel caching.
In every environment the ultimate goal is to have a CPU limited environment with enough memory to allow for off-heap useage by Logscape as well as OS kernel caching.
What is the User Experience?The most common action a user will perform within any given Logscape environment, is search. To guarentee a pleasant experience we recommend sizing your environment so that searches over the last 24 hours worth of generated data, return within 10 seconds.
File SystemIn order to gather accurate metrics it is reccomended that you use a tool such as ioZone. However rough assumptions can be made based upon disk type.
Type | Disk Speed |
Mechanical Drive | 150Mb/sec |
SSD | 500Mb/sec |
SSD x2 | 1Gb/sec |
(SSD)RAID 5+1 | 2.5Gb/sec |
Using these device speeds, in a scenario where we are searching 10 GB of data, we get the following search times.
Device | Dataset Size | IO Rate | Search Duration |
Mechanical Disk | 10Gb | 150Mbs | 66 Seconds |
SSD | 10Gb | 500MBs | 20 Seconds |
SSDx2 | 10Gb | 1GBs | 10 Seconds |
(SSD) RAID 5+1 | 10Gb | 2.5GBs | 4 Seconds |
Linux cache performance can be monitored using the 'perf' command line tool, you can read more about perf here.
Perf provides metrics about many parts of an application. For this scenario we are interested in "cache misses", a cache miss is generated when the application is forced to read from disk, rather than RAM or the cache. Ideally we would never read from the disk, but this scenario is unlikely, so we're going to aim for below 5% cache misses.
First, we need to determine which is the parent Logscape process, since Logscape runs as part of the JVM and spawns child processes it could be quite difficult. However by using ps aux and grep we can locate it easily
ps aux | grep vsomain -i
This will provide you with the process id, name and command line paramaters, we're only interested in the process id.
We're now going to make use of perf in order to measure the performance of the Logscape process.
perf stat -p [process_id] sleep 30
After 30 seconds this command will output a variety of metrics regarding the performance of the vsomain process, the metric we are interested in, is "cache misses" if this value is below 5% of all cycles then cache performance is acceptable.
The Manager and Index Store use considerable operating system resources in large deployments. The number of concurrent open files, network connections and processes can be configured for Linux. Use the following system limits as a guide.
Property | Description |
nofile > =40000 | open file 40000 or more. If the IndexStore hits this limit, Logscape will not be able to process new files or create network connections |
nproc >= 100000 | If the Index Store hits this limit Logscape will throw an exception each time a new thread is created |
You can find more metrics about sizing a linux installation here
How the number of users affect the Manger - Keep a watch on the memory utilization of the aggspace and dashboard processes when the number of users of the system increases significantly. You may need to increase the heap of these processes ( see table ).
com.liquidlabs.([A-Za-z\.0-9]+)_ | _type.equals(unx-ps) RSZ_MB.max(1,) chart(line-zero)
Large Alert Volumes - If you have hundreds of alerts in your deployment, watch the memory profile of the aggspace and allocate more memory when required.
Indexers - The typical deployment contains a Manager, search operations and data balanced across a few Index Stores and many Forwarders. Some environments make use of Indexers. If you have over 20 indexers in your environment increase the size of your aggspace heap.