From Dolphins to Elephants: Real-Time MySQL to Hadoop Replication with Tungsten MC Brown, Director of Documentation Linas Virbalas, Senior Software Engineer ©Continuent 2014. About Tungsten Replicator • Open source drop-in replacement for MySQL replication, providing: • • • • • • Global transaction ID ©Continuent 2014 Multiple masters Multiple sources Flexible topologies Heterogeneous replication Parallel replication 2 Tungsten Replicator Master Download transactions via network DBMS Logs Slave Replicator (Transactions + Metadata) Replicator Apply using JDBC ©Continuent 2014 THL THL (Transactions + Metadata) 3 How Tungsten Replicator Works Pipeline Stage Extract Filter Master DBMS ©Continuent 2014 Stage Apply Extract Filter Transaction History Log Stage Apply Extract Filter In-Memory Queue 4 Apply Slave DBMS Where we replicate master-slave fan-in slave ©Continuent 2014 Heterogene MySQL Oracle Oracle MySQL all-masters 5 Direct slave Regular MySQL star-schema Why Hadoop • • • • • • ©Continuent 2014 Customer driven Change in the air Environments moving to heterogenous • • NoSQL was the first We already support MongoDB Hadoop used for big analytics More frequently a live resource Big datasets require Map/Reduce 6 Tungsten Replicator and Hadoop • • Extract from MySQL or Oracle • Automatic replication of incremental changes Customizable formatting • • • Hive Schema generation ©Continuent 2014 Base Hadoop and Commercial distributions; Cloudera, HortonWorks, Amazon Elastic MapReduce and IBM InfoSphere BigInsights compatible Materialized views in Hive for carbon-copy tables Sqoop and parallel extractor compatibility for provisioning 7 Applying Data into Hadoop Replicator Replicator Extract transactions from log THL CSV Hadoop DBMS Logs ©Continuent 2014 8 Applying Data into Hadoop Replicator Replicator Extract transactions from log THL CSV Hadoop DBMS Logs ©Continuent 2014 9 Applying Data into Hadoop Replicator Replicator Extract transactions from log THL CSV Hadoop DBMS Logs ©Continuent 2014 10 CSV (Staging) Materialized Views Hadoop ID Message Hive Table ©Continuent 2014 11 CSV (Staging) Materialised Views Hadoop ID Message Hive Table ©Continuent 2014 12 CSV (Staging) Materialised Views Hadoop ID Message Hive Table ©Continuent 2014 13 CSV (Staging) Materialized Views Hadoop ID Message Hive Table ©Continuent 2014 14 MySQL Configuration • • • ©Continuent 2014 Use Row-based replication Every table must have primary keys Replicator configured with: • • Filters for metadata and primary key optimisation Extracts to standard THL 15 Configure Hadoop • • Data is stored in CSV format on HDFS • • • Compatible with Hive, HBase, and others ©Continuent 2014 Cloudera, HortonWorks, Amazon Elastic Map Reduce (EMR) and IBM Infosphere BigInsights compatible Staging DDL can be automatically generated Live Table DDL can be automatically generated 16 DDL Generation • • Built-in Tool, part of Tungsten Replicator • Default mode is for default migrations to Hive types • Customizable for your needs • ©Continuent 2014 Handles staging and live table DDL generation • BigInts as Strings Data transformations possible through filters 17 Replicator Hadoop Configuration • • • ©Continuent 2014 Batch Commit interval • • By rows count By time interval CSV Format • • Predefined formats Customizable by field and row characters Parallelization Supported 18 Materialized Views • Merges Data from Staging CSV into Hive Tables • Processing separate from Replicator • ©Continuent 2014 • Allows individual table views to be generated independently • Allows for custom materialization intervals Views based on 'live' data, or by point-in-time from CSV staging 19 Demo ©Continuent 2014 20 Provisioning Data • • ©Continuent 2014 Sqoop • • • • Start the replicator Sqoop the data Materialized views are idempotent DDL generation is Hive compatible Parallel Extractor • • Currently Oracle only Will extract data in parallel and insert into THL 21 Replication Management • Replication can be stopped, started, restarted at any time • Enables MySQL or Hadoop maintenance windows • • • DDL customizable ©Continuent 2014 Views regenerated at any time Schema changes can be handled by reSqooping and dematerialising views 22 560 S. Winchester Blvd., Suite 500 San Jose, CA 95128 Tel +1 (866) 998-3642 Fax +1 (408) 668-1009 e-mail: [email protected] Our Blogs: http://scale-out-blog.blogspot.com http://mcslp.wordpress.com http://flyingclusters.blogspot.com http://www.continuent.com/news/blogs Continuent Web Page: http://www.continuent.com ! Tungsten 2.2Standby and 3.0 Failed Preview: Master Replicator Slave Hot http://code.google.com/p/tungsten-replicator ©Continuent 2014
© Copyright 2024 ExpyDoc