Frequently Asked Apache Cassandra Interview Questions
Explain What Is Cassandra?
Cassandra is an open source data storage system developed at Facebook for inbox search and designed for storing and managing large amounts of data across commodity servers. It can server as both Real time data store system for online applications Also as a read intensive database for business intelligence system
List The Benefits Of Using Cassandra.?
Unlike traditional or any other database, Apache Cassandra delivers near real-time performance simplifying the work of Developers, Administrators, Data Analysts and Software Engineers.
Instead of master-slave architecture, Cassandra is established on peer-to-peer architecture ensuring no failure.
It also assures phenomenal flexibility as it allows insertion of multiple nodes to any Cassandra cluster in any datacenter. Further, any client can forward its request to any server.
Cassandra facilitates extensible scalability and can be easily scaled up and scaled down as per the requirements. With a high throughput for read and write operations, this NoSQL application need not be restarted while scaling.
Cassandra is also revered for its strong data replication capability as it allows data storage at multiple locations enabling users to retrieve data from another location if one node fails. Users have the option to set up the number of replicas they want to create.
Shows brilliant performance when used for massive datasets and thus, the most preferable NoSQL DB by most organizations.
Operates on column-oriented structure and thus, quickens and simplifies the process of slicing. Even data access and retrieval becomes more efficient with column-based data model.
Further, Apache Cassandra supports schema-free/schema-optional data model, which un-necessitate the purpose of showing all the columns required by your application.
What Is The Use Of Cassandra And Why To Use Cassandra?
Cassandra was designed to handle big data workloads across multiple nodes without any single point of failure. The various factors responsible for using Cassandra are
It is fault tolerant and consistent
Gigabytes to petabytes scalabilities
It is a column-oriented database
No single point of failure
No need for separate caching layer
Flexible schema design
It has flexible data storage, easy data distribution, and fast writes
It supports ACID (Atomicity, Consistency, Isolation, and Durability)properties
Multi-data center and cloud capable
Data compression
Explain The Concept Of Tunable Consistency In Cassandra.?
Tunable Consistency is a phenomenal characteristic that makes Cassandra a favored database choice of Developers, Analysts and Big data Architects.
Consistency refers to the up-to-date and synchronized data rows on all their replicas. Cassandra’s Tunable Consistency allows users to select the consistency level best suited for their use cases. It supports two consistencies -Eventual and Consistency and Strong Consistency.
The former guarantees consistency when no new updates are made on a given data item, all accesses return the last updated value eventually. Systems with eventual consistency are known to have achieved replica convergence.
For Strong consistency, Cassandra supports the following condition:
R + W > N, where
N –
Number of replicas
W –
Number of nodes that need to agree for a successful write
R –
Number of nodes that need to agree for a successful read
Explain What Is Composite Type In Cassandra?
In Cassandra, composite type allows to define key or a column name with a concatenation of data of different type. You can use two types of Composite Type
Row Key
Column Name
How Does Cassandra Write?
Cassandra performs the write function by applying two commits-first it writes to a commit log on disk and then commits to an in-memory structured known as memtable. Once the two commits are successful, the write is achieved. Writes are written in the table structure as SSTable (sorted string table). Cassandra offers speedier write performance.
How Cassandra Stores Data?
All data stored as bytes
When you specify validator, Cassandra ensures those bytes are encoded as per requirement
Then a comparator orders the column based on the ordering specific to the encoding
While composite are just byte arrays with a specific encoding, for each component it stores a two byte length followed by the byte encoded component followed by a termination bit.
Define The Management Tools In Cassandra.?
DataStaxOpsCenter:
internet-based management and monitoring solution for Cassandra cluster and DataStax. It is free to download and includes an additional Edition of OpsCenter
SPM primarily administers Cassandra metrics and various OS and JVM metrics. Besides Cassandra, SPM also monitors Hadoop, Spark, Solr, Storm, zookeeper and other Big Data platforms. The main features of SPM include correlation of events and metrics, distributed transaction tracing, creating real-time graphs with zooming, anomaly detection and heartbeat alerting.
Mention What Are The Main Components Of Cassandra Data Model?
The main components of Cassandra Data Model are
Cluster
Keyspace
Column
Column & Family
Define Memtable.?
Similar to table, memtable is in-memory/write-back cache space consisting of content in key and column format. The data in memtable is sorted by key, and each ColumnFamily consist of a distinct memtable that retrieves column data via key. It stores the writes until it is full, and then flushed out.
Explain What Is A Column Family In Cassandra?
Column family in Cassandra is referred for a collection of Rows.
What Is Sstable? How Is It Different From Other Relational Tables?
SSTable expands to ‘Sorted String Table,’ which refers to an important data file in Cassandra and accepts regular written memtables.
They are stored on disk and exist for each Cassandra table. Exhibiting immutability, SStables do not allow any further addition and removal of data items once written. For each SSTable, Cassandra creates three separate files like partition index, partition summary and a bloom filter.
Explain What Is A Cluster In Cassandra?
A cluster is a container for keyspaces. Cassandra database is segmented over several machines that operate together. The cluster is the outermost container which arranges the nodes in a ring format and assigns data to them. These nodes have a replica which takes charge in case of data handling failure.
Explain The Concept Of Bloom Filter.?
Associated with SSTable, Bloom filter is an off-heap (off the Java heap to native memory) data structure to check whether there is any data available in the SSTable before performing any I/O disk operation.
List Out The Other Components Of Cassandra?
The other components of Cassandra are
Node
Data Center
Cluster
Commit log
Mem-table
SSTable
Bloom Filter
Explain Cap Theorem?
With a strong requirement to scale systems when additional resources are needed, CAP Theorem plays a major role in maintaining the scaling strategy.
It is an efficient way to handle scaling in distributed systems. Consistency Availability and Partition tolerance (CAP) theorem states that in distributed systems like Cassandra, users can enjoy only two out of these three characteristics.
One of them needs to be sacrificed. Consistency guarantees the return of most recent write for the client, Availability returns a rational response within minimum time and in Partition Tolerance, the system will continue its operations when network partitions occur. The two options available are AP and CP.
Explain What Is A Keyspace In Cassandra?
In Cassandra, a keyspace is a namespace that determines data replication on nodes. A cluster consist of one keyspace per node.
State The Differences Between A Node, A Cluster And Datacenter In Cassandra.?
While a node is a single machine running Cassandra, cluster is a collection of nodes that have similar type of data grouped together. DataCentersare useful components when serving customers in different geographical areas. You can group different nodes of a cluster into different data centers.
Mention What Are The Values Stored In The Cassandra Column?
In Cassandra Column, basically there are three values
Column Name
Value
Time Stamp