1TB is a daunting amount of data, however with a small amount of pre-planning ingesting, indexing and searching this data is a simple task when using Logscape.
Whilst a single Indexstore could theoretically ingest and index 1 TB per day, we would suffer severe performance issues when running searches. Introducing additional Indexstores allows the data volume to be separated in to manageable clusters, allowing us to leverage the increased processing and Disk I/O capacity. For this specific deployment, we will install 6 well provisioned Indexstores.
AssumptionsUsing the specifications stated below, we get the following when comparing a 1 Indexstore environment against a 6 Indexstore Environment.
Feature | 1x Indexstore | 6x Indexstore |
Disk Speed | 2GB/Sec | 12GB/Sec |
Network Bandwidth | 2.5GB/sesc | 2.5GB/sec |
Search Speed(1TB) | 500 Seconds | 83 Seconds |
GB Searched Per second | 2GB | 12GB |
Events Per Second(2KB per Event) | 1,000,000 | 12,000,000 |
The inclusion of 6 Indexstores reduces our Input requirements to an insignificant 12mb/sec, however even with the data spread evenly between each host, at maximum capacity a host will be responsible for searching 5TB of data, so disk speed, and in-memory caching are still a concern.
Network StatsNetwork requirements will vary upon the individual IO capability of your Indexstores, but must be configured in such a way to take into account both data transfer incoming to the Indexstores as well as the capacity needed by in order to transfer data to the Manager at search time. To benchmark your network you can follow the steps here. For this scenario we will assume each Indexstore is connected via two fibre channel connections which are bonded for an estimated bandwidth of 2.5gb/sec.
In order to gather accurate metrics it is reccomended that you use a tool such as ioZone. However rough assumptions can be made based upon disk type.
Type | Disk Speed |
Mechanical Drive | 150Mb/sec |
SSD | 500Mb/sec |
SSD x2 | 1Gb/sec |
(SSD)RAID 5+1 | 2.5Gb/sec |
Using these device speeds, in a scenario where we are searching 10 GB of data, we get the following search times.
Device | Dataset Size | IO Rate | Search Duration |
Mechanical Disk | 10Gb | 150Mbs | 66 Seconds |
SSD | 10Gb | 500MBs | 20 Seconds |
SSDx2 | 10Gb | 1GBs | 10 Seconds |
(SSD) RAID 5+1 | 10Gb | 2.5GBs | 4 Seconds |
Device | Manager | IndexStore | Forwarder |
Disk | RAID 5(4+1), Disks with at least 500mb/sec R/W | RAID 5(4+1), Disks with at least 500mb/sec R/W | - |
CPU | 48 core modern CPU | 48 core modern CPU | - |
Memory | 64GB, Minimum of 8GB allocated to the Dashboard/Aggspace process | 64GB | 512MB |
Manager sizing is dependant upon the average size of a search, the number of concurrent users, as well as the amount of pre-configured alerts. General advice is that the system should have enough RAM to allow for caching of search time data, whilst also allowing an additional 5-10mb per dashboard search, and alert.
Indexstores require high read speeds in order to search their indexes for data requested by searches. Ideally every Indexstore should feature a similar read potential as due to the Load Balanced nature of a Logscape deployment, searches will be held up by a single Indexstore with a slower read capability.
Forwarders are a lightweight agent where Disk and CPU are of little concern as long as Disk I/O and CPU resources are not at 100% values. The agent itself is designed to use a minimum amount of RAM, and is operationally capable while sized as low as 256MB.
The key value to note here is the search speed. The input requirements for 1 TB of data are relatively low, however serving 30 days worth of collated storage back in the form of a search in a short span of time requires multiple indexstores.