This tutorial teach you how to setup apache cassandra in cluster mode means multiple system for scaleable database system.
- Assuming you have installed CentOS 6.5 in all nodes(systems)
- Installed Java in all system.
- You have followed my previous tutorial on How to setup Cassandra on standalone mode if not please go and do it for all your node first before following this tutorial.Please verify that in all the nodes the cassandra is up and running
First stop cassandra in all nodes and delete data directory using following command
To stop cassandra use below command
node down the pid of cassandra and use kill command to stop cassandra
To delete the data directory use below command
and remove logs as well using below command
For this tutorial i’m assuming that we are setting up cluster using 2 machine, lets name as
if you want to add more nodes go ahead and add no problem.
Now you should do some configuration in all nodes, believe me not to much.
First lets change in cassandra1
The file to change is cassandra.yaml under config directory
Open cassandra.yaml using some editor, i’m using nano editor
It looks like below image
change the line listen_address to your machine ip address. To get your ip address use command ifconfig.
and scroll down and change rpc_address to your machine ip address and uncomment the line where it says broadcast_rpc_address and leave blank dont fill anything like below image.
This is very important step you should understand correctly
Change the cluster_name to “Cassandra_Node” (This should me same in all the node if not it throws the exception) .
Add the following commands under cluster_name
– class_name: org.apache.cassandra.locator.SimpleSeedProvider
– seeds: “<seed machine>”
Do not forget to comment below lines
# Addresses of hosts that are deemed contact points.
# Cassandra nodes use this list of hosts to find each other and learn
# the topology of the ring. You must change this if you are running
# multiple nodes!
# – class_name: org.apache.cassandra.locator.SimpleSeedProvider
# seeds is actually a comma-delimited list of addresses.
# Ex: “<ip1>,<ip2>,<ip3>”
# – seeds: “127.0.0.1”
As below image
Note: Do the same step for all the nodes, keep in mind that you should replace the ip address of the that machine for listen_address and rpc_address only, dont change address of seeds entry it should be same for all the nodes.
What is seed machine?
Seeds are used during startup to discover the cluster.
The seed node designation has no purpose other than bootstrapping the gossip process for new nodes joining the cluster.
Choose one or two node as seed node(Dont choose all node as seed node its not good practice).
After doing all the configuration change in all the nodes then it’s time to start the cassandra in all node.
Use same command as used in standalone mode
To verify the cassandra cluster setup you can use command as
The above command should give output as below image
In the image i got two rows because i used two node to setup, if you are using more nodes then all the node entry should be there, this verifies that the cassandra cluster is ready.