- Compact data storage – Ten billions UInt8-type values should exactly consume 10GB uncompressed to efficiently use available CPU . Optimal storage even when uncompressed benefit performance and resource management . ClickHouse is built is store data efficiently without any garbage .
- CPU efficient – Whenever possible, ClickHouse operations are dispatched on arrays, rather than on individual values. This is called “vectorized query execution,” and it helps lower the cost of actual data processing.
- Data compression – ClickHouse supports two kinds of compression LZ4 and ZSTD . LZ4 is faster than ZSTD but compression ratio is smaller .ZSTD is faster and compress better than traditional Zlib but slower than LZ4 . We recommend customers LZ4 , when I/O is fast enough so decompression speed will become a bottleneck . When using super ultra fast disk subsystems you have an option to specify “none” compression . ZSTD is recommended when I/O is the bottleneck in queries with large range scans .
- Can store data in disk – The columnar database systems like SAP HANA and Google PowerDrill can only work in the RAM .
- Massively Parallel Processing – ClickHouse is capable of Massively Parallel Processing very large / complex SQL(s) optimally and cost efficiently
- Built for web-scale data analytics – ClickHouse support sharding and distributed processing, This makes ClickHouse most preferred columnar database system for web-scale . Each shard in ClickHouse can be a group of replicas addressing maximum reliability and fault tolerance .
- ClickHouse support Primary Key – ClickHouse permits real-time data updates with primary key (there will be no locking when adding data) . Data is sorted incrementally using the merge tree to perform queries on the range of primary key values .
- Built for statistical analysis and support partial aggregation – ClickHouse is statistical query analysis ready columnar database store supporting aggregate functions for approximated calculation of the number of various values, medians, and quantiles. ClickHouse support aggregation for a limited number of random keys, instead for all the keys . You can query on a part (sample) of data and generate approximate result reducing disk I/O operations considerably .
- Supports SQL – ClickHouse supports SQL, Subqueries are supported in FROM, IN, and JOIN clauses, as well as scalar subqueries. Dependent subqueries are not supported.
- Supports data replication – ClickHouse supports asynchronous multi-master and master-slave replication .