Cassandra Interview Questions Interview Questions Technologies

[Interview Question] – Types of Repair in Cassandra? Difference between each of them? How to initiate Repair from a particular token range? What is parallel repair in Cassandra?

Repair is used to make different copies (replicas) of data consistent by exchanging data with other replicas.

The repair command repairs one or more nodes in a cluster, and provides options for restricting repair to a set of nodes, see Repairing nodes. Performing an anti-entropy node repair on a regular basis is important, especially in an environment that deletes data frequently.

Important: Ensure that all involved replicas are up and accessible before running a repair. If repair encounters a down replica, an error occurs and the process halts. Re-run repair after bringing all replicas back online.

Control how the repair runs:

  • Number of nodes performing a repair:
    • Parallel runs repair on all nodes with the same replica data at the same time (Default behavior in the DataStax Distribution of Apache Cassandra™ (DDAC).)
    • Sequential (-seq, –sequential) runs repair on one node after another
    • Datacenter parallel (-dcpar, –dc-parallel) combines sequential and parallel by simultaneously running a sequential repair in all datacenters; a single node in each datacenter runs repair, one after another until the repair is complete
  • Amount of data that is repaired:
    • Full repair (default) compares all replicas of the data stored on the node where the command runs and updates each replica to the newest version. Does not mark the data as repaired or unrepaired. Default for DDAC. To switch to incremental repairs, see Migrating to incremental repairs
    • Full repair with partitioner range (-pr, –partitioner-range) repairs only the primary replicas of the data stored on the node where the command runs. Recommended for routine maintenance
    • Incremental repair (-incsplits the data into repaired and unrepaired SSTables, only repairs unrepaired data. Marks the data as repaired or unrepaired

Note: Cassandra runs full repair by default. 

Synopsis

install_location/bin/nodetool [connection_options] repair 
     [(-dc specific_dc | --in-dc specific_dc)...] 
     [(-dcpar | --dc-parallel)] 
     [(-et end_token | --end-token end_token)]
     [(-hosts specific_host | --in-hosts specific_host)...]
     [-inc]
     [(-j job_threads | --job-threads job_threads)]
     [(-local | --in-local-dc)] 
     [(-pr | --partitioner-range)]
     [(-pl | --pull)]
     [(-seq | --sequential)]
     [(-st start_token | --start-token start_token)] 
     [(-tr | --trace)]
     [--] 
     [keyspace tables...]

-dcpar, --dc-parallel
Runs a datacenter parallel repair, which combines sequential and parallel by simultaneously running a sequential repair in all datacenters; a single node in each datacenter runs repair, one after another until the repair is complete. 

-et end_token, --end-token end_token
Token UUID. Repair a range of nodes starting with the first token (see -st) and ending with this token (end_token). Use -hosts to specify neighbor nodes. 

-pr, --partitioner-range
Repair only the primary partition ranges of the node. To avoid re-repairing each range RF times, DataStax recommends using this option during routine maintenance (nodetool repair -pr).
 
 Note: Not recommend with incremental repair because incremental repairs marks data as repaired during each step and does not re-repair the same data multiple times.

-seq, --sequential
Runs a sequential repair, which runs repair on one node after another.

-st start_token, --start-token start_token
Specify the token (start_token) at which the repair range starts. 

Examples:

  • To do a sequential repair of all keyspaces on the current node:
    • $ nodetool repair -seq
  • To do a partitioner range repair of the bad partition on current node using the good partitions on 10.2.2.20 or 10.2.2.21:
    • $ nodetool repair -st -9223372036854775808 -et -3074457345618258603
  • To restrict the repair to the local datacenter, use the -dc option followed by the name of the datacenter. Issue the command from a node in the datacenter you want to repair. Do not use -pr with this option to repair only a local data center.
    • $ nodetool repair -dc DC1

Source : nodetool repair


Categories

Categories