Changes of Huawei Big Data Platform

Security Level:
Challenges of Big Data
Platform
www.huawei.com
HUAWEI TECHNOLOGIES CO., LTD.
Contents

Categories of data in carrier network

Network insight

Customer behavior insight

Society activity insight

Challenges
Five Categories of Data
Business
Domain
Volume
Enterprise
Management
OSS
Network
Element
BSS
VAS
Generated Manually
Generated by
machine
Generated by
Machine
Generated Manually
Generated Manually
100TB~10TB
1TB~100TB
xxGB / Year
10PB / Year,1~3
years accumulation
100GB~10TB
xxTB / Year
100TB~10PB
100TB / Year
E-Learning
Source
HR
ERP
CMS
Account
Characteri
stics
NE Parameter
Structured(Table)
Unstructured
(graphics、text、
video)
NE Config
NE Log
Alert
Perf
Structured(table)
Unstructured(Time
series data)
CDR CHR
SDR
MR
Counter
NE Log
Semi-Structured
(signaling call records)
Unstructured(Time
series data)
Billing
MKT
report
Order
User Profile
SCM
ERP
Structured(table)
企业管理域
Probe or NE Integration
NodeB
RNC
GIS
SGSN
GGSN/DPI
ISP
Structured(table、point
sets)
Semi-Structured(column
cluster)
Unstructured(graphics、
text、video、time-series
data)
BSS
FRM
HRM
Service Content
CRM
OSS
MRP
Order Usage
VAS
Evolutions of data analytic business in big
Past: Typical analytic business is operation analysis, based on statistics, off line, isolated data;
Nowadays:
New business,such
as network optimization, customer experience, etc. Large volume, real-time,
data
era
various kinds of data type;
VAS Data
order
Operational analytic
system:operation
reports/KPI reports
(statistics)
BSS
CRM/Billing
OSS
Performance
Network schedule
(statistics)
HR/FRM/SRM
Indicator
Business
NPM/SQM
NE data
Alerts
HR, Financial reports
Statistics、
offline、
isolated
AD
promotion
CEM
Stats of network
management performance
E
v
o
l
u
t
i
o
n
Offer design
Enterprise management
Data Volume/Flow
Data Set>100TB
Data flow rate
Accumulation rate
( >60% scenarios)
Velocity
Data Variety
Requirements on
scale-out
Data format and sources
Operation Report
Statistics data
Offline
Statistic scenario,low
accumulation rate
No
CRM、Billing,structured
Billing Verification
<100T
Offline
Fixed
No
Billing structured
Elastic data processing
cluster of over 100 servers,
Handle 1PB data
Data from NEs, such as RAN, PS, etc
Network optimization Network equipment
data,10PB
Large volume、
real-time,
Customer
convergent of experience
various data
types
Precise marketing
Network data, 10PB
~200Gbps
Archive 1 year’s data
Elastic data processing
cluster of over 100 servers,
Handle 1PB data
Network signaling, xDR, traffic stastics,
NE configuration data, semi-structured
data takes the majority
Customer profile
100GB~300GB
~100,000
packages/s
Fixed volume
In-memory computing
CRM、billing、xDR, structured data ,
semi-structured data
Data evolutions driven by carrier business
Business
Evolution
Three
categori
es of Big
Data
business
Network
Insight
• Analytics based on network data,
combined with user data , to adjust
network layout;
• Focus on network status: location,
equipment workload, adjust network
dynamically
Customer
Insight
Society
Insight
• Analytics based on user data, combined
with network equipment data, to
recognize characters of customer
behavior
• To understand who is using network,
consume which service , and to optimize
business
• Analytics based on laws behind
data, ,to dig out data values
• Based on laws, guide carrier
develop new valuable business
Categories and characteristics of carrier big
data business
Business
Data
Capability
Data
representation
and query
Data storage
and
integration
ETL
Customer Insight
Network Insight
NE data
Summary Data
TS
MR
Log
xDR
DPI
Dial
test
Traffi
c test
order
Ac
co
unt
UP
User account
Complaints
User consuming
Society Insight
Operational data
CRM
VAS
CBS
Netw
ork
VAS and External data
IPCC
LBS
Internet
Mark
eting
VAS
usage
User
profil
e
Achieved data
xDR
Log
Traffic
statistics
Data visualization, rich and complex
models
Ad-Hoc Query
Real-time response
Multi-dimension
Query is not complex
High concurrency
Complex Query
Raw data
Low data volume
Moderate Volume
Large volume,10PB level,
Low cost
Summarized data
Mixed with raw data
and summarized data
High performance loading
Real time update
Data model complex
Cross domain data integration
Real time
High concurrency
Complex
Query
Complex models and
algorithms
High performance
Low cost
Complex data mining algorithms,
need the guides from data scientist
and industry experts
Data volume varies in different
domain, averagely 10PB level,
requires low cost
Business requirements onNetwork Insight
Data processing procedure
③
For a carrier network to provide service for 40M users, there are
several challenges:
Volume: 120T -> 5.6P; Integration: 33 nodes -> 6 nodes; query
response time: 100s -> 15s; Multi-dimension analytics
Data representation
④
Data analytics
and processing
Multi-dimension
analytic
Target(40M users)
②①
③
Data
Management
Requirements
Summari Archieve
ze
DW
140k Records/s354kRecords/s
60 days,120T1 Year,5.6P
②:
raw data
summarization
①:Archive
Data
summarizatio and query raw
n and storage data
Data preprocessing
Data
PS
CS
NMS
EMS
ingress
 20M users,25Gbps, 60 days’ raw data, 120TB
 40M users,200Gbps, 1 year’s raw data, 5.6PB
Data
analytics
and
processing
• Feeding rate 90,000rows/s
• Ensure stable query performance
• 1 year’s data,5.6P
• Compression rate: 10:1
• Support a few AD-hoc queries
③:statistics
/analysis libs
• Support complex queries
invoving10 tables
• 20 concurrent reporting queries,
respond in 15 seconds
④:
Multidimension
analytics
• Multi- Dimension:14 dimensions;
• General analytics:combination of
5 to 9 dimensions of SDR
• BKPI combination of 10 to14
dimensions in BKPI
• Second level response time, on
1.4 billion rows
Business requirements on Customer Insight
Precise AD promotion based on user behavior information, refined event content requirements from suppliers
Promote electronic
magazine for
people taking
public traffic
8 AM
Go to office
Promote Wifi offers to
people in coffee shops
without wifi services
Working days
Big Data Platform
weekends
Promote cosmetics
vouchers to females in
shopping market
holidays
vocations
Get subscriber’s location
Based on behaviors,analysis users’ consuming
characteristic, favorite content ant offers;
Business requirements on Customer Insight
Two general requirements on BI technologies:High performance DW with low cost, analysis & mining
algorithms based on user behaviors and values
Requirements
Data processing procedure
Application
Item
inquiry
Traffic
analysis
Network
analysis
Customer
insight
Dynamic
policy
Performance
assess
Finance
analysis
Marketing
management
Service capabilities
(information archive,
process)
ingress
retrieve Text processing
Characteristic Content
visualization
profile
classification
Location
service
……
……
Graphic
service
aggregationclassification
Infrastructure
Distributed/Distributed
Statistics analysis
(Data mining, analysis)DBMS query engine
……
association predicates
Distributed platform
Hardware
Distributed file
system
Distributed
database
Distributed
computation
Pain point 1:Poor OLAP performance, minute level
response time with server hundreds GB data. OLAP
system is built by ROLAP solution, such as Cognos,
DB2 etc;
Pain point 2:Poor DW performance, high cost(raw
data storage and computation costs above 70%
capability of a DW,reach the maximum volume and
capability of traditional database)
Pain point 3:high software / hardware cost:solution
is composed with high end servers, disk array and
commercial dbms, expensive license and hardware
Query:
• Point query and analytic query from RTD
• Exploring query such as customer segmentation requires
full table scan and muti-table join
• Query on predefined 1024 KPIs
• Tag ,labeling, 500+ indicators, 50+ graphic computation
Data mining:
• Customized model(User Modeling)
User/Item/content/properties/similarity,Min Hash(CF)
• Behavior Targeting,customer profiling based on behavior
and values
Business requirements on society Insight
Focus on anonymous wireless users and location based application, focus on government, industry and
enterprise application
Traffic Application:Congestion information possible
through Telco signaling data
Population Analytics:traffic planning, city resources distribution,
abnormal events
Business requirements on society Insight
To dig out laws of group activity through data mining algorithms applied on maps and dimensional data. Core part is the data analysis
layer.
Visualization
Population Density
OD Graph&Matrix
OD transport classification
Traffic congestion detection
UniBI Reporting
Tools
Data Analysis
Population
Density
OD Table
OD transportation
Mode Classification
Traffic Congestion
Detection
HDFS + Map/Reduce
Data
Preprocessing
Map preprocessing
Data Cleaning
Data Integration
Data Exploration
District
segmentation
Extract district
coordinates
Road segmentation
Extract road coordinates
Data Selection
HDFS + HQL
Data Sources
MR Data (Time, IMSI, Longitude, Latitude, RNCID, CellID)
Summary of big data business requirements
• Huawei product lines is attempting to build new big data business.
• Huawei product lines have various requirements on big data components: mainly on MPP DB、in-memory
analytics DB、streaming computation、MOLAP、parallel computation, analytics & mining algorithms;
Requirements
Data storage and computation
Data analytics
• MPP DB:Support 10PB level volume; 100+ node linear scalability; • MOLAP:support SQL and MDX, <5s response
respond queries on 0.1 billion rows in 1 minute;10:1 compression
time in 80~90% scenarios; 1s response
ratio;
latency on TB data with hundred dimensions
• Real-time analytics in-memory DB:100TB, columnar, wide table • Real-time dashboard;
with 2000-5000 columns, 30,000 updates/s, ad-hoc query respond in • Data mining : High accuracy
, various
3 seconds, to support real time business policy adjustment, real-time
algorithms, online data mining , quick
KPI calculation
response.
• Streaming processing : 1 million events per second; 1 micro
second latency for each event
Thank you
www.huawei.com
Copyright©2011 Huawei Technologies Co., Ltd. All Rights Reserved.
The information in this document may contain predictive statements including, without limitation,
statements regarding the future financial and operating results, future product portfolio, new technology,
etc. There are a number of factors that could cause actual results and developments to differ materially
from those expressed or implied in the predictive statements. Therefore, such information is provided for
reference purpose only and constitutes neither an offer nor an acceptance. Huawei may change the
information at any time without notice.