WebNMS takes advantage of the new technology Hadoop

Big Data in
WebNMS
Overview
WebNMS takes advantage of the new technology Hadoop, making it ideal for
large-scale service provider deployments. A highly scalable product, WebNMS
addresses Big Data requirement using Hadoop to store and process large volumes of collected performance data, thus accelerating the scalability of the
product.
www.webnms.com
Big Data
in WebNMS
Architecture Diagram
HADOOP
KPI & Report Jobs
Mapper
KPI & Report Jobs
For each KPI & Report a
MapReduce job is created in
which the map tasks are for
the collection of required
data and the reduce task is
for the aggregation of
collected data.
Reducer
DataNode
Stores the actual data. A
functional file system has
more than one DataNode
with data replicated
across them.
HBase
Region Server
KPI & Report Jobs
1
2
3
4
1
2
3
KPI & Report Jobs
4
HBase
3
HBase
1
1
2
4
2
2
DataNodes
NameNode
NameNode keeps the
directory "tree-of-all-files"
(metadata) in the file system,
and tracks where across the
cluster the file data is kept. It
does not store the data of
these files itself.
3
4
3
KPI & Report Engine
KPI & Report Engine
HBase
HBase
Metadata
Metadata
NameNodes
Primary
Standby
KPI & Report Engine
Schedules KPI calculation
and Report aggregation
based on the specified
definitions.
Database
Inventory, Topology, Fault,
Configuration, Provisioning,
Security Data
ZooKeeper
ZooKeeper is used to
perform leader election
in case of multiple
Masters/Name Node.
ZooKeeper
APIs
PM
Data
WebNMS
WebNMS Server
BE server does the Discovery,
Fault, Performance,
Provisioning, Configuration,
Security management and
stores Performance collected
data in Hadoop and all other
data in RDBMS database.
API
API
TSDB
ASYNC
TSDB
Backend
Server
ASYNC
Backend
Server
Primary
Distributed Poller
Designed to collect the data
from share of the Network
Elements from the entire
Network and to store the
data in to Hadoop.
NmsHadoopAPI
PollAPI
HadoopKPIAPI
OpenTSDB
OpenTSDB is a distributed,
scalable framework to
effectively store, index and
retrieve Time Series values
in HBase.
Standby
Async HBase
TSDB
ASYNC
Polling
Engine
Distributed Poller
TSDB
ASYNC
Polling
Engine
Distributed Poller
TSDB
ASYNC
Polling
Engine
Distributed Poller
AsyncHBase is an
asynchronous,
non-blocking,
thread-safe, high-performance HBase API.
There are mainly two layers in the Hadoop implementation in WebNMS. The WebNMS layer exhibits enhanced performance data collection,
and the Hadoop layer implements the Hadoop functionalities involved in data storage and retrieval in WebNMS.
www.webnms.com
Big Data
in WebNMS
The WebNMS Layer
In the WebNMS layer, the polling engine of the
WebNMS the performance values generated
performance module plays the key role in the
over a time period are stored as key value pairs
high volume statistical data collection. WebNMS
using asyncHBase library, which is an asyn-
supports distributed poller functionality where
chronous, non-blocking, thread-safe, high
massive data can be collected. There can be
performance HBase API.
multiple distributed poller setup in the WebNMS
server to ease the large data collection process
WebNMS by default uses only the PollAPIs for
in various high-end deployments. WebNMS
the data collection processes. Minimal configu-
introduces Hadoop implementation at this layer
ration settings need to be done in the applica-
for storing these massive data. Hadoop cluster
tion to trigger the Hadoop implementation in
is maintained to distribute and store the data.
performance. With Hadoop setup, WebNMS
More number of servers can be added to the
uses NmsHadoopAPI in addition to the PollAPI
Hadoop cluster to increase the efficiency of the
for the polling engine for performance data
application. This depends on the volume of data
collection. Apart from this HadoopKPIAPI is
and the expected processing capabilities.
also available for KPI calculation via WebNMS.
The storage of the collected data in Hadoop is
The Hadoop Layer
done in Hadoop Distributed File System (HDFS)
through HBase. The raw data is stored keeping
The Hadoop layer in WebNMS Performance
in mind the retrieval and aggregation processes
consists of the three functional entities, the
that follow. This ensures the easy retrieval and
ZooKeeper, NameNodes, and DataNodes.
process of the data whenever required.
These three components help to actively select,
OpenTSDB is used to effectively store, index,
store, retrieve, and process the required data.
and retrieve the collected metrics, and make
The ZooKeeper plays the key role in coordina-
this data easily accessible and reportable. In
tion among the distributed services.
www.webnms.com
Big Data
in WebNMS
When multiple NameNodes are present in the
cluster, the ZooKeeper performs leader elec-
High Availability in Every
Layer
tion. It receives the heartbeats from the primary and standby servers and provides directions
WebNMS supports high availability in every
to connect with the active server.
layer. In WebNMS HA deployments, the
secondary BE server is employed, so that it
The NameNode maintains the references to
takes over the functionality of the primary BE
the HDFS file system. It knows where exactly
server when the primary fails. Likewise,
in the cluster the file data is kept with additional
WebNMS also supports FE failover. In this
details such as the size, block address etc.
case the clients would automatically switch
Whereas, the actual data is stored in the local
over to the available FE. Similarly, the KPI
storage systems of the DataNode. The
and Reports engine also supports failover
NameNode decides on splitting the data and
where the Engine connects to the available
allocating the configured data into the respec-
server when the primary fails.
tive DataNodes. The scheduling of KPI and
Report generation is done here. Also for each
At the Hadoop layer, Zookeeper handles the
KPI and Report, MapReduce job is created.
failover mechanism. In multiple NameNode
The MapReduce job is actually handled in two
setup, it monitors the primary and standby
steps, the mapper and reducer tasks. The
NameNodes, and establishes connection with
aggregation of data for the KPI and reporting is
the active standby server in case of the failure
done through many mappers and reducers.
of the primary server.
The aggregation result is stored in HBase from
which WebNMS reads for displaying it as
reports.
www.webnms.com