Cassandra Cassandra Interview Questions Interview Questions

[Interview Question] – What is the difference between nodetool rebuild and nodetool repair? why not to use repair instead of rebuild?

Answer :

nodetool rebuild is similar to the bootstrapping process (when you add a new node to the cluster) but for a datacenter. The process here is mainly a streaming from the already live nodes to the new nodes (the new ones are empty). So after defining the key ranges for the nodes which is very fast, the rest can be seen as a copy operation.

nodetool repair [-pr] is not a copy operation, the node being repaired is not empty, it already contains data but if the replication factor is greater than 1 that data needs to be compared to the data on the rest of the replicas and if there is a difference it will be corrected. The process involves a lot of streaming but it is not data streaming: the node being repaired requests a merkle tree (basically a tree of hashes) in order to verify if the information both nodes have is the same or not, if not it requests a full stream of the section of the data that has any difference (so all the replicas have the same data). Streaming this hashes if faster than streaming the whole data before verification, this works under the assumption that most data will be the same on both nodes except for some differences here and there. This process also removes tombstones created when deleting from the database, defining like a new “checkpoint” after which new tombstones will be created upon deletion of data, but the old ones will not be used anymore.

Source :

Comments Rating 0 (0 reviews)

About the author


Add Comment

Click here to post a comment