Security Level: Challenges of Big Data Platform www.huawei.com HUAWEI TECHNOLOGIES CO., LTD. Contents Categories of data in carrier network Network insight Customer behavior insight Society activity insight Challenges Five Categories of Data Business Domain Volume Enterprise Management OSS Network Element BSS VAS Generated Manually Generated by machine Generated by Machine Generated Manually Generated Manually 100TB~10TB 1TB~100TB xxGB / Year 10PB / Year,1~3 years accumulation 100GB~10TB xxTB / Year 100TB~10PB 100TB / Year E-Learning Source HR ERP CMS Account Characteri stics NE Parameter Structured(Table) Unstructured (graphics、text、 video) NE Config NE Log Alert Perf Structured(table) Unstructured(Time series data) CDR CHR SDR MR Counter NE Log Semi-Structured (signaling call records) Unstructured(Time series data) Billing MKT report Order User Profile SCM ERP Structured(table) 企业管理域 Probe or NE Integration NodeB RNC GIS SGSN GGSN/DPI ISP Structured(table、point sets) Semi-Structured(column cluster) Unstructured(graphics、 text、video、time-series data) BSS FRM HRM Service Content CRM OSS MRP Order Usage VAS Evolutions of data analytic business in big Past: Typical analytic business is operation analysis, based on statistics, off line, isolated data; Nowadays: New business,such as network optimization, customer experience, etc. Large volume, real-time, data era various kinds of data type; VAS Data order Operational analytic system:operation reports/KPI reports (statistics) BSS CRM/Billing OSS Performance Network schedule (statistics) HR/FRM/SRM Indicator Business NPM/SQM NE data Alerts HR, Financial reports Statistics、 offline、 isolated AD promotion CEM Stats of network management performance E v o l u t i o n Offer design Enterprise management Data Volume/Flow Data Set>100TB Data flow rate Accumulation rate ( >60% scenarios) Velocity Data Variety Requirements on scale-out Data format and sources Operation Report Statistics data Offline Statistic scenario,low accumulation rate No CRM、Billing,structured Billing Verification <100T Offline Fixed No Billing structured Elastic data processing cluster of over 100 servers, Handle 1PB data Data from NEs, such as RAN, PS, etc Network optimization Network equipment data,10PB Large volume、 real-time, Customer convergent of experience various data types Precise marketing Network data, 10PB ~200Gbps Archive 1 year’s data Elastic data processing cluster of over 100 servers, Handle 1PB data Network signaling, xDR, traffic stastics, NE configuration data, semi-structured data takes the majority Customer profile 100GB~300GB ~100,000 packages/s Fixed volume In-memory computing CRM、billing、xDR, structured data , semi-structured data Data evolutions driven by carrier business Business Evolution Three categori es of Big Data business Network Insight • Analytics based on network data, combined with user data , to adjust network layout; • Focus on network status: location, equipment workload, adjust network dynamically Customer Insight Society Insight • Analytics based on user data, combined with network equipment data, to recognize characters of customer behavior • To understand who is using network, consume which service , and to optimize business • Analytics based on laws behind data, ,to dig out data values • Based on laws, guide carrier develop new valuable business Categories and characteristics of carrier big data business Business Data Capability Data representation and query Data storage and integration ETL Customer Insight Network Insight NE data Summary Data TS MR Log xDR DPI Dial test Traffi c test order Ac co unt UP User account Complaints User consuming Society Insight Operational data CRM VAS CBS Netw ork VAS and External data IPCC LBS Internet Mark eting VAS usage User profil e Achieved data xDR Log Traffic statistics Data visualization, rich and complex models Ad-Hoc Query Real-time response Multi-dimension Query is not complex High concurrency Complex Query Raw data Low data volume Moderate Volume Large volume,10PB level, Low cost Summarized data Mixed with raw data and summarized data High performance loading Real time update Data model complex Cross domain data integration Real time High concurrency Complex Query Complex models and algorithms High performance Low cost Complex data mining algorithms, need the guides from data scientist and industry experts Data volume varies in different domain, averagely 10PB level, requires low cost Business requirements onNetwork Insight Data processing procedure ③ For a carrier network to provide service for 40M users, there are several challenges: Volume: 120T -> 5.6P; Integration: 33 nodes -> 6 nodes; query response time: 100s -> 15s; Multi-dimension analytics Data representation ④ Data analytics and processing Multi-dimension analytic Target(40M users) ②① ③ Data Management Requirements Summari Archieve ze DW 140k Records/s354kRecords/s 60 days,120T1 Year,5.6P ②: raw data summarization ①:Archive Data summarizatio and query raw n and storage data Data preprocessing Data PS CS NMS EMS ingress 20M users,25Gbps, 60 days’ raw data, 120TB 40M users,200Gbps, 1 year’s raw data, 5.6PB Data analytics and processing • Feeding rate 90,000rows/s • Ensure stable query performance • 1 year’s data,5.6P • Compression rate: 10:1 • Support a few AD-hoc queries ③:statistics /analysis libs • Support complex queries invoving10 tables • 20 concurrent reporting queries, respond in 15 seconds ④: Multidimension analytics • Multi- Dimension:14 dimensions; • General analytics:combination of 5 to 9 dimensions of SDR • BKPI combination of 10 to14 dimensions in BKPI • Second level response time, on 1.4 billion rows Business requirements on Customer Insight Precise AD promotion based on user behavior information, refined event content requirements from suppliers Promote electronic magazine for people taking public traffic 8 AM Go to office Promote Wifi offers to people in coffee shops without wifi services Working days Big Data Platform weekends Promote cosmetics vouchers to females in shopping market holidays vocations Get subscriber’s location Based on behaviors,analysis users’ consuming characteristic, favorite content ant offers; Business requirements on Customer Insight Two general requirements on BI technologies:High performance DW with low cost, analysis & mining algorithms based on user behaviors and values Requirements Data processing procedure Application Item inquiry Traffic analysis Network analysis Customer insight Dynamic policy Performance assess Finance analysis Marketing management Service capabilities (information archive, process) ingress retrieve Text processing Characteristic Content visualization profile classification Location service …… …… Graphic service aggregationclassification Infrastructure Distributed/Distributed Statistics analysis (Data mining, analysis)DBMS query engine …… association predicates Distributed platform Hardware Distributed file system Distributed database Distributed computation Pain point 1:Poor OLAP performance, minute level response time with server hundreds GB data. OLAP system is built by ROLAP solution, such as Cognos, DB2 etc; Pain point 2:Poor DW performance, high cost(raw data storage and computation costs above 70% capability of a DW,reach the maximum volume and capability of traditional database) Pain point 3:high software / hardware cost:solution is composed with high end servers, disk array and commercial dbms, expensive license and hardware Query: • Point query and analytic query from RTD • Exploring query such as customer segmentation requires full table scan and muti-table join • Query on predefined 1024 KPIs • Tag ,labeling, 500+ indicators, 50+ graphic computation Data mining: • Customized model(User Modeling) User/Item/content/properties/similarity,Min Hash(CF) • Behavior Targeting,customer profiling based on behavior and values Business requirements on society Insight Focus on anonymous wireless users and location based application, focus on government, industry and enterprise application Traffic Application:Congestion information possible through Telco signaling data Population Analytics:traffic planning, city resources distribution, abnormal events Business requirements on society Insight To dig out laws of group activity through data mining algorithms applied on maps and dimensional data. Core part is the data analysis layer. Visualization Population Density OD Graph&Matrix OD transport classification Traffic congestion detection UniBI Reporting Tools Data Analysis Population Density OD Table OD transportation Mode Classification Traffic Congestion Detection HDFS + Map/Reduce Data Preprocessing Map preprocessing Data Cleaning Data Integration Data Exploration District segmentation Extract district coordinates Road segmentation Extract road coordinates Data Selection HDFS + HQL Data Sources MR Data (Time, IMSI, Longitude, Latitude, RNCID, CellID) Summary of big data business requirements • Huawei product lines is attempting to build new big data business. • Huawei product lines have various requirements on big data components: mainly on MPP DB、in-memory analytics DB、streaming computation、MOLAP、parallel computation, analytics & mining algorithms; Requirements Data storage and computation Data analytics • MPP DB:Support 10PB level volume; 100+ node linear scalability; • MOLAP:support SQL and MDX, <5s response respond queries on 0.1 billion rows in 1 minute;10:1 compression time in 80~90% scenarios; 1s response ratio; latency on TB data with hundred dimensions • Real-time analytics in-memory DB:100TB, columnar, wide table • Real-time dashboard; with 2000-5000 columns, 30,000 updates/s, ad-hoc query respond in • Data mining : High accuracy , various 3 seconds, to support real time business policy adjustment, real-time algorithms, online data mining , quick KPI calculation response. • Streaming processing : 1 million events per second; 1 micro second latency for each event Thank you www.huawei.com Copyright©2011 Huawei Technologies Co., Ltd. All Rights Reserved. The information in this document may contain predictive statements including, without limitation, statements regarding the future financial and operating results, future product portfolio, new technology, etc. There are a number of factors that could cause actual results and developments to differ materially from those expressed or implied in the predictive statements. Therefore, such information is provided for reference purpose only and constitutes neither an offer nor an acceptance. Huawei may change the information at any time without notice.
© Copyright 2024 ExpyDoc