How to install Cassandra in Cluster Mode on Centos 6.5

Introduction

This tutorial teach you how to setup apache cassandra in cluster mode means multiple system for scaleable database system.

Prerequisite

  • Assuming you have installed CentOS 6.5 in all nodes(systems)
  • Installed Java in all system.
  • You have followed my previous tutorial on How to setup Cassandra on standalone mode if not please go and do it for all your node first before following this tutorial.Please verify that in all the nodes the cassandra is up and running

First stop cassandra in all nodes and delete data directory using following command

To stop cassandra use below command

ps -aux | grep cassandra

node down the pid of cassandra and use kill command to stop cassandra

kill -9 <pid>

To delete the data directory use below command

rm -rf /opt/apache-cassandra-2.1.9/data/*

and remove logs as well using below command

rm -rf /opt/apache-cassandra-2.1.9/logs/*

For this tutorial i’m assuming that we are setting up cluster using 2 machine, lets name as

  • cassandra1
  • cassandra2

if you want to add more nodes go ahead and add no problem.

Now you should do some configuration in all nodes, believe me not to much.

First lets change  in cassandra1

The file to change is cassandra.yaml under config directory

Open cassandra.yaml using some editor, i’m using nano editor

Command is:

cd /opt/apache-cassandra-2.1.9/config

nano cassandra.yaml

It looks like below image

cassandra_cluster_config1

change the line listen_address to your machine ip address. To get your ip address use command ifconfig.

and scroll down and change rpc_address to your machine ip address and uncomment the line where it says  broadcast_rpc_address and leave blank dont fill anything like below image.

cassandra_cluster_config2

This is very important step you should understand correctly

Change the cluster_name to “Cassandra_Node” (This should me same in all the node if not it throws the exception) .

Add the following commands under cluster_name

num_tokens: 256
seed_provider:
– class_name: org.apache.cassandra.locator.SimpleSeedProvider
parameters:
– seeds: “<seed machine>”
endpoint_snitch: GossipingPropertyFileSnitch

Do not forget to comment below lines

#seed_provider:
# Addresses of hosts that are deemed contact points.
# Cassandra nodes use this list of hosts to find each other and learn
# the topology of the ring. You must change this if you are running
# multiple nodes!
# – class_name: org.apache.cassandra.locator.SimpleSeedProvider
# parameters:
# seeds is actually a comma-delimited list of addresses.
# Ex: “<ip1>,<ip2>,<ip3>”
# – seeds: “127.0.0.1”

As below image

cassandra_cluster_config4

Note: Do the same step for all the nodes, keep in mind that you should replace the ip address of the that machine for listen_address and rpc_address only, dont change address of seeds entry it should be same for all the nodes.

What is seed machine?

Seeds are used during startup to discover the cluster.

The seed node designation has no purpose other than bootstrapping the gossip process for new nodes joining the cluster.

Choose one or two node as seed node(Dont choose all node as seed node its not good practice).

After doing all the configuration change in all the nodes then it’s time to start the cassandra in all node.

Use same command as used in standalone mode

cd /opt/apache-cassandra-2.1.9
./bin/cassandra

To verify the cassandra cluster setup you can use command as

./bin/nodetool status

The above command should give output as below image

cassandra_cluster_config3

In the image i got two rows because i used two node to setup, if you are using more nodes then all the node entry should be there, this verifies that the cassandra cluster is ready. :)

 

 

Leave a Reply

Your email address will not be published. Required fields are marked *