Benchmarking CPU, Memory, file I/O and mutex performance using Sysbench

We already have written blog on Sysbench (https://minervadb.com/index.php/2018/03/13/benchmarking-mysql-using-sysbench-1-1/) , so in this blog we are not covering basic details like installation and configuration of Sysbench. In this blog we are just specific on benchmarking CPU, Memory, file I/O and mutex performance :

Benchmarking CPU using Sysbench

This benchmark is configured with the number of simultaneous threads and the maximum number to verify if it is a prime.

[root@localhost shiv]# sysbench --test=cpu --cpu-max-prime=2000000 --num-threads=120 run
Running the test with following options:
Number of threads: 120
Initializing random number generator from current time


Prime numbers limit: 2000000

Initializing worker threads...

Threads started!

CPU speed:
    events per second:     0.69

Throughput:
    events/s (eps):                      0.6891
    time elapsed:                        174.1418s
    total number of events:              120

Latency (ms):
         min:                               169807.71
         avg:                               172640.02
         max:                               174120.65
         95th percentile:                   100000.00
         sum:                             20716802.25

Threads fairness:
    events (avg/stddev):           1.0000/0.00
    execution time (avg/stddev):   172.6400/0.83
“time elapsed” is the variable we seriously look for to measure CPU performance, In this case it is 174.1418 seconds.
Benchmarking threads performance using sysbench
When we increase the threads workload, each worker thread will be allocated a mutex (a sort of lock) and will, for each execution, loop a number of times (documented as the number of yields) in which it takes the lock, yields (meaning it asks the scheduler to stop itself from running and put it back and the end of the runqueue) and then, when it is scheduled again for execution, unlock.
[root@localhost shiv]# sysbench --test=threads --thread-locks=10 --max-time=60 run

sysbench 1.1.0-651e7fd (using bundled LuaJIT 2.1.0-beta3)

Running the test with following options:
Number of threads: 1
Initializing random number generator from current time


Initializing worker threads...

Threads started!


Throughput:
    events/s (eps):                      2366.0725
    time elapsed:                        60.0003s
    total number of events:              141965

Latency (ms):
         min:                                    0.38
         avg:                                    0.42
         max:                                    8.86
         95th percentile:                        0.53
         sum:                                59942.51

Threads fairness:
    events (avg/stddev):           141965.0000/0.00
    execution time (avg/stddev):   59.9425/0.00

To conclude the interpretation of thread performance benchmarking, we annotate time elapsed (actual time for the completion of the activity), in this case it “60.0003” seconds.

Benchmarking mutex workload 
When benchmarking mutex workload, sysbench will run a single request per thread. This request generates load on the CPU (using a simple incremental loop, through the –mutex-loops parameter), after that it makes a random mutex, increments a global variable and release the lock again. This process is continued till the number of locks mentioned (–mutex-locks). The random mutex is generated by –mutex-num parameter.

 

[root@localhost shiv]# sysbench --test=mutex --num-threads=130 run
WARNING: the --test option is deprecated. You can pass a script name or path on the command line without any options.
WARNING: --num-threads is deprecated, use --threads instead
sysbench 1.1.0-651e7fd (using bundled LuaJIT 2.1.0-beta3)

Running the test with following options:
Number of threads: 130
Initializing random number generator from current time


Initializing worker threads...

Threads started!


Throughput:
    events/s (eps):                      5.8047
    time elapsed:                        22.3956s
    total number of events:              130

Latency (ms):
         min:                                17566.82
         avg:                                20789.93
         max:                                22230.90
         95th percentile:                    21641.55
         sum:                              2702690.46

Threads fairness:
    events (avg/stddev):           1.0000/0.00
    execution time (avg/stddev):   20.7899/0.82

The throughput and average latency are the two matrices we consider to interpret mutex workload performance :

Throughput:
    events/s (eps):                      5.8047
    time elapsed:                        22.3956s

Latency (ms):
         min:                                17566.82
         avg:                                20789.93
         max:                                22230.90
         95th percentile:                    21641.55
         sum:                              2702690.46

 

Benchmarking the memory workload 

When we use sysbench to benchmark memory, sysbench allocate a memory buffer and then read or write from/on it, each time for the size of a pointer (32 bit or 64 bit) and until the total buffer size has been read from or written to.  This activity will be continued till the provided volume (–memory-total-size) is reached. The load can be increased or reduced by providing multiple threads (–num-threads), size of buffer (–memory-block-size) and request type (read / write / sequential / random)

[root@localhost shiv]# sysbench --test=memory --num-threads=140 --memory-total-size=10G run

sysbench 1.1.0-651e7fd (using bundled LuaJIT 2.1.0-beta3)

Running the test with following options:
Number of threads: 140
Initializing random number generator from current time


Running memory speed test with the following options:
  block size: 1KiB
  total size: 10240MiB
  operation: write
  scope: global

Initializing worker threads...

Threads started!

Total operations: 10485720 (3351958.44 per second)

10239.96 MiB transferred (3273.40 MiB/sec)


Throughput:
    events/s (eps):                      3351958.4393
    time elapsed:                        3.1282s
    total number of events:              10485720

Latency (ms):
         min:                                    0.00
         avg:                                    0.01
         max:                                 2931.98
         95th percentile:                        0.00
         sum:                               123371.54

Threads fairness:
    events (avg/stddev):           74898.0000/0.00
    execution time (avg/stddev):   0.8812/0.93

Throughput and operations per second are the important matrices to measure for memory workload benchmarking :

Total operations: 10485720 (3351958.44 per second)

10239.96 MiB transferred (3273.40 MiB/sec)

Benchmarking file system I/O with Sysbench

You can use multiple scenarios for benchmarking file system I/O but here we have used rndrw  (combined random read / write) for more complex I/O and production similar I/O operations, This happens in three steps explained below:

  • Prepare – Creates the files for testing
  • Run – Performs the benchmarking and reporting
  • Cleanup – Clean the system by deleting the files

Prepare 

[root@localhost shiv]# sysbench --num-threads=16 --test=fileio --file-total-size=10G --file-test-mode=rndrw prepare

sysbench 1.1.0-651e7fd (using bundled LuaJIT 2.1.0-beta3)

128 files, 81920Kb each, 10240Mb total
Creating files for the test...
Extra file open flags: (none)
Reusing existing file test_file.0
Reusing existing file test_file.1
Reusing existing file test_file.2
Reusing existing file test_file.3
..................................
..................................

Reusing existing file test_file.122
Reusing existing file test_file.123
Reusing existing file test_file.124
Reusing existing file test_file.125
Reusing existing file test_file.126
Reusing existing file test_file.127

Run

[root@localhost shiv]# sysbench --num-threads=16 --test=fileio --file-total-size=10G --file-test-mode=rndrw run

sysbench 1.1.0-651e7fd (using bundled LuaJIT 2.1.0-beta3)

Running the test with following options:
Number of threads: 16
Initializing random number generator from current time


Extra file open flags: (none)
128 files, 80MiB each
10GiB total file size
Block size 16KiB
Number of IO requests: 0
Read/Write ratio for combined random IO test: 1.50
Periodic FSYNC enabled, calling fsync() each 100 requests.
Calling fsync() at the end of test, Enabled.
Using synchronous I/O mode
Doing random r/w test
Initializing worker threads...

Threads started!


Throughput:
         read:  IOPS=2495.85 39.00 MiB/s (40.89 MB/s)
         write: IOPS=1663.70 26.00 MiB/s (27.26 MB/s)
         fsync: IOPS=5311.68

Latency (ms):
         min:                                  0.00
         avg:                                  1.69
         max:                                631.90
         95th percentile:                      5.00
         sum:                             159794.48

Cleanup 

[root@localhost shiv]# sysbench --num-threads=16 --test=fileio --file-total-size=10G --file-test-mode=rndrw cleanup 
WARNING: the --test option is deprecated. You can pass a script name or path on the command line without any options.
WARNING: --num-threads is deprecated, use --threads instead
sysbench 1.1.0-651e7fd (using bundled LuaJIT 2.1.0-beta3)

Removing test files...

In the file system I/O benchmarking, We spend time annotating and interpreting only throughput (both reads and writes) under varying loads, Here in the test above read throughput is 40.89 MB/s and the write throughput is 27.26 MB/s

About MinervaDB Corporation 88 Articles
Independent and vendor neutral consulting, support, remote DBA services and training for MySQL, MariaDB, Percona Server, PostgreSQL and ClickHouse with core expertize in performance, scalability and high availability . We are an virtual corporation, all of us work from home on multiple timezones and stay connected via Email, Skype, Google Hangouts, Phone and IRC supporting over 250 customers worldwide
UA-155183614-1