Cassandra Interview Questions Interview Questions

[Interview Question] – What are the files stored in each SSTable? How to find the SSTable

Answer :

SSTables are the immutable data files that Cassandra uses for persisting data on disk.
Each SSTable is comprised of multiple components stored in separate files:

  • Data.db
  • Index.db
  • Summary.db
  • Filter.db
  • CompressionInfo.db
  • Statistics.db
  • Digest.crc32
  • TOC.txt

Within the Data.db file, rows are organized by partition. These partitions are sorted in token order (i.e. by a hash of the partition key when the default partitioner, Murmur3Partition, is used). Within a partition, rows are stored in the order of their clustering keys.

Index.db, Summary.db, Filter.db, CompressionInfo.db, Statistics.db, Digest.crc32 and TOC.txt

Within the Data.db file, rows are organized by partition. These partitions are sorted in token order (i.e. by a hash of the partition key when the default partitioner, Murmur3Partition, is used). Within a partition, rows are stored in the order of their clustering keys.

We can get SSTable version from list of SSTable file(s) name. All files share common naming convention which is “.db”
The name of the file that contains the SSTable data has the following format:

<version_code>-<generation>-<format>-Data.db

Example: aa-3444-bti-Data.db

  • version is an alphabetic string which represents SSTable storage format version. Like aa in above in example
  • generation is an index number which is incremented every time a new SSTable is created for a table. Like 3444 in above example
  • Format is represents SSTable format, like bti in the above example
  • component represents the type of information stored in the file. Like Data, CompressionInfo, Digest, Index, etc


  • Data (Data.db) : The SSTable data ( The actual data, i.e. the contents of rows )
  • Primary Index (Index.db) : Index of the row keys with pointers to their positions in the data file
  • Bloom filter (Filter.db) : A structure stored in memory that checks if row data exists in the memtable before accessing SSTables on disk
  • Compression Information (CompressionInfo.db) : A file holding information(Metadata) about uncompressed data length, chunk offsets and other compression information
  • Statistics (Statistics.db) : Statistical metadata about the content of the SSTable, including information about timestamps, tombstones, clustering keys, compaction, repair, compression, TTLs, and more.
  • Digest (Digest.crc32, Digest.adler32, Digest.sha1) : A file holding adler32 checksum of the data file
  • CRC (CRC.db) : A file holding the CRC32 for chunks in an uncompressed file.
  • SSTable Index Summary (SUMMARY.db) : A sample of the partition index stored in memory
  • SSTable Table of Contents (TOC.txt) : A file that stores the list of all components for the SSTable TOC
  • Secondary Index (SI_.*.db) : Built-in secondary index. Multiple SIs may exist per SSTable

Check more about Storage Engine and SSTable Versions

Want to see
DataStax Enterprise, Apache Cassandra, CQL, and SSTable compatibility


Categories

Categories