Migrating from PostgreSQL to MySQL at Cocolog Naoto Yokoyama, NIFTY Corporation Garth Webb, Six Apart Lisa Phillips, Six Apart Credits: Kenji Hirohama, Sumisho Computer Systems Corp. Agenda 1. What is Cocolog 2. History of Cocolog 3. DBP: Database Partitioning 4. Migration From PostgreSQL to MySQL 1. What is Cocolog What is Cocolog NIFTY Corporation Cocolog Established in 1986 A Fujitsu Group Company NIFTY-Serve (licensed and interconnected with CompuServe) One of the largest ISPs in Japan First blog community at a Japanese ISP Based on TypePad technology by SixApart Several hundred million PV/month History Dec/02/2003: Cocolog for ISP users launch Nov/24/2005: Cocolog Free for free launch April/05/2007: Cocolog for Mobile Phone launch Cocolog (Screenshot of home page) 2008/04 700 Thousand Users Cocolog (Screenshot of home page) Cocolog TypePad Cocolog template sets Cocolog Growth (User) ■Cocolog ■Cocolog Free phase1 phase2 phase3 phase4 Cocolog Growth (Entry) ■Cocolog ■Cocolog Free phase1 phase2 phase3 phase4 Technology at Cocolog Core System Linux 2.4/2.6 Apache 1.3/2.0/2.2 & mod_perl Perl 5.8+CPAN PostgreSQL 8.1 MySQL 5.0 memcached/TheSchwartz/cfengine Eco System LAMP,LAPP,Ruby+ActiveRecord, Capistrano Etc... Monitoring Management Tool APL response time of each post number of spam comments/trackbacks number of comments/trackbacks source IP address of spam number of entries number of comments via mobile devices page views via mobile devices time of batch completion amount of API usage bandwidth usage DB DB Hard Monitoring points (order of priority) Service Proprietary in-house development with PostgreSQL, PHP, and Perl Disk I/O Memory and CPU usage time of VACUUM analyze APP number of active processes CPU usage Memory usage 2. History of Cocolog Before DBP 10servers Phase1 2003/12~(Entry: 0.04Million) Postgre SQL Register TypePad Static contents Published NAS WEB Phase2 2004/12~ (Entry: 7Million) Publish Book Tel Operator Support Rich template Postgre SQL 2005/5~ Podcast Portal Profile Etc.. Before DBP 50servers Register 2004/12~ TypePad Static contents Published NAS WEB Phase2 - Problems The system is tightly coupled. Database server is receiving from multiple points. It is difficult to change the system design and database schema. Phase3 2006/3~ (Entry: 12Million) Publish Book Rich template Before DBP 200servers Web-API memcached Tel Operator Support Postgre SQL Register TypePad Podcast Portal Profile Etc.. Static contents Published NAS WEB Phase4 2007/4~ (Entry: 16Million) Publish Book Rich template Before DBP 300servers Web-API Typepad Tel Operator Support memcached Postgre SQL Register Static contents Published Atom Mobile WEB Now 2008/4~ Publish Book Rich template After DBP 150servers Web-API memcached Tel Operator Support Multi MySQL Register Typepad Static contents Published Atom Mobile WEB 3. TypePad Database Partitioning Steps for Transitioning • • • • • • • • Server Preparation Hardware and software setup Global Write Write user information to the global DB Global Read Read/write user information on the global DB Move Sequence Table sequences served by global DB User Data Move Move user data to user partitions New User Partition All new users saved directly to user partition 1 New User Strategy Decide on a strategy for the new user partition Non User Data Move Move all non-user owned data TypePad Overview (PreDBP) Internet Mobile Blog Readers Blog Owners https(443) Blog Readers smtp(25) / pop(110) http(80) Web Server Application Server TypeCast Server Mail Server smtp(25) / pop(110) memcached(11211) postgres(5432) nfs(2049) Storage Static Content (HTML, Images, etc) http(80) : atom api Database (Postgres) Data Caching servers to reduce DB load MEMCACHED Dedicated Server for TypeCast (via ATOM) ATOM Server ADMIN(CRON) Server Cron Server for periodic asynchronous tasks Why Partition? TypePad TypePad TypePad TypePad TypePad TypePad TypePad TypePad User Role (User1) User Role (User0) NonUser Role Current setup All inquires (access) go to one DB(Postgres) Global Role Non-User Role User Role (User2) User Role (User3) After DBP Inquiries (access) are divided among several DB(MySQL) Server Preparation Information that does not need to be partitioned (such as session information) TypePad Global Role User Role (User1) DB(PostgreSQL) User Role (User2) Non-User Role Schwartz DB Maintains user mapping and primary key generation Stores job details User information is partitioned Server for executing Jobs User Role (User0) NonUser Role Current Setup User Role (User3) DB(MySQL) for partitioned data Job Server + TypePad + Schwartz Asynchronous Job Server New expanded setup ※Grey areas are not used in current steps Global Write Creating the user map TypePad ① ② DB(PostgreSQL) User Role (User0) NonUser Role Global Role User Role (User1) Non-User Role Schwartz DB Maintains user mapping and primary key generation User Role (User2) User Role (User3) Job Server + TypePad + Schwartz DB(MySQL) for partitioned data Asynchronous Job Server Explanation ①:For new registrations only, uniquely identifying user data is written to the global DB ②:This same data continues to be written to the existing DB ※Grey areas are not used in current steps Global Read Use the user map to find the user partition TypePad ② Global Role ③ Non-User Role Schwartz DB ① Migrate existing user data DB(PostgreSQL) User Role (User0) NonUser Role User Role (User1) Maintains user mapping and primary key generation User Role (User2) User Role (User3) DB(MySQL) for partitioned data Job Server + TypePad + Schwartz Asynchronous Job Server Explanation ①:Migrate existing user data to the global DB ②:At start of the request, the application queries global DB for the location of user data ③:The application then talks to this DB for all queries about this user. At this stage the global DB points to ※Grey areas are not used in current steps the user0 partition in all cases. Move Sequence Migrating primary key generation TypePad ② Global Role Non-User Role Schwartz DB ① Migrate sequence management DB(PostgreSQL) User Role (User0) NonUser Role User Role (User1) Maintains user mapping and primary key generation User Role (User2) User Role (User3) DB(MySQL) for partitioned data Job Server + TypePad + Schwartz Asynchronous Job Server Explanation ①:Postgres sequences (for generating unique primary keys) are migrated to tables on the global DB that act as “pseudo-sequences”. ② Application requests new primary keys from global DB rather than the user partition. ※Grey areas are not used in current steps User Data Move Moving user data to the new user-role partitions ② TypePad Global Role ③ User Role (User1) Non-User Role Schwartz DB Maintains user mapping and primary key generation Stores job details ④ User Role (User2) DB(PostgreSQL) User information is partitioned Server for executing Jobs User Role (User0) NonUser Role Migrating each user data ① User Role (User3) DB(MySQL) for partitioned data Job Server + TypePad + Schwartz DB(MySQL) for partitioned data Explanation ①:Existing users that should be migrated by Job Server are submitted as new Schwartz jobs. User data is then migrated asynchronously ②:If a comment arrives while the user is being migrated, it is saved in the Schwartz DB to be published later. ③:After being migrated all user data will exist on the user-role DB partitions ※Grey areas are not used in current steps ④:Once all user data is migrated, only non-user data is on Postgres New User Partition New registrations are created on one user role partition TypePad ① ② DB(PostgreSQL) User Role (User0) NonUser Role Global Role User Role (User1) User Role (User2) Non-User Role Schwartz DB Maintains user mapping and primary key generation User information is partitioned User Role (User3) DB(MySQL) for partitioned data Explanation ①:When new users register, user data is written to a user role partition. ②:Non-user data continues to be served off Postgres Job Server + TypePad + Schwartz Asynchronous Job Server ※Grey areas are not used in current steps New User Strategy Pick a scheme for distributing new users TypePad ① ② DB(PostgreSQL) User Role (User0) NonUser Role Global Role User Role (User1) User Role (User2) Non-User Role Schwartz DB Maintains user mapping and primary key generation User information is partitioned User Role (User3) DB(MySQL) for partitioned data Job Server + TypePad + Schwartz Asynchronous Job Server Explanation ①:When new users register, user data is written to one of the user role partitions, depending on a set distribution method (round robin, random, etc) ②:Non-user data continues to be served off Postgres ※Grey areas are not used in current steps Non User Data Move Migrate data that cannot be partitioned by user Information that does not need to be partitioned (such as session information) TypePad Global Role User Role (User1) User Role (User2) DB(PostgreSQL) User Role (User0) NonUser Role Migrate non-User data Non-User Role Schwartz DB Maintains user mapping and primary key generation User information is partitioned User Role (User3) ① DB(MySQL) for partitioned data Job Server + TypePad + Schwartz Asynchronous Job Server Explanation ①:Migrate non-user role data left on PostgreSQL to the MySQL side. ※Grey areas are not used in current steps Data migration done TypePad ① Information that does not need to be partitioned (such as session information) ① Global Role User Role (User1) DB(Postgres) User Role (User2) Non-User Role Schwartz DB Maintains user mapping and primary key generation Stores job details User information is partitioned Server for executing Jobs User Role (User0) NonUser Role User Role (User3) Job Server + TypePad + Schwartz DB(MySQL) for partitioned data ② Explanation Asynchronous Job Server ①:All data access is now done through MySQL ②:Continue to use The Schwartz for asynchronous jobs ※Grey areas are not used in current steps The New TypePad configuration Internet Blog Readers Mobile Blog Readers Blog Owners (management interface) http(80) Web Server https(443) smtp(25) / pop(110) Application Server TypeCast Server Mail Server http(80) : atom api smtp(25) / pop(110) memcached(11211) MySQL(3306) nfs(2049) Storage Static Content (HTML, Images, etc) Database (MySQL) Data Caching servers to reduce DB load MEMCACHED Dedicated Server for TypeCast (via ATOM) ATOM Server ADMIN(CRON) Server Cron Server for periodic asynchronous tasks Job Server TheSchwartz server for running ad-hoc jobs asynchronously 4. Migration from PostgreSQL to MySQL History of scale up PostgreSQL server, Before DBP DB Node Spec History Time OS(RedHat) CPU Xeon MEM DiskArray 2003/12 7.4(2.4.9) 1.8GHz/512k×1 1GB No ES2.1(2.4.9) 3.2GHz/1M×2 4GB No ES2.1(2.4.9) 3.2GHz/1M×2 4GB Yes AS2.1(2.4.9) 3.2GHz/1M×4 12G B Yes AS4 (2.6.9) 3.2GHz/1M×4 12G B Yes AS4 (2.6.9) MP3.3GHz/1M×4 〔2Core×4〕 16G B Yes 2007/11 History of scale up PostgreSQL server, Before DBP DB DiskArray Spec [FUJITSU ETERNUS8000] http://www.computers.us.fujitsu.com/www/pro ducts_storage.shtml?products/storage/fujitsu/ e8000/e8000 Best I/O transaction performance in the world 146GB (15 krpm) * 32disk with RAID - 10 MultiPath FibreChannel 4Gbps QuickOPC (One Point Copy) OPC copy functions let you create a duplicate copy of any data from the original at any chosen time. Scale out MySQL servers, After DBP A role configuration Each role is configured as HA cluster HA Software: NEC ClusterPro Shared Storage Scale out MySQL servers, After DBP heart beat FibreChannel SAN DiskArray Postgre SQL MySQL Role1 MySQL Role2 MySQL Role3 … TypePad Application Scale out MySQL servers, After DBP Backup Replication w/ Hot backup Scale out MySQL servers, After DBP heart beat FibreChannel SAN DiskArray Postgre SQL MySQL Role1 MySQL Role2 MySQL Role3 mysqld mysqld mysqld … rep rep rep mysqld mysqld mysqld TypePad Application opc MySQL BackupRole Troubles with PostreSQL 7.4 – 8.1 Data size over 100 GB 40% is index Severe Data Fragmentation VACUUM “VACUUM analyze” cause the performance problem Takes too long to VACUUM large amounts of data dump/restore is the only solution for de-fragmentation Auto VACUUM We don’t use Auto VACUUM since we are worried about latent response time Troubles with PostgreSQL 7.4 – 8.1 Character set PostgreSQL allow the out of boundary UTF-8 Japanese extended character sets and multi bytes character sets which normally should come back with an error - instead of accepting them. “Cleaning” data Removing characters set that are out of the boundries UTF-8 character sets. Steps PostgreSQL.dumpALL Split for Piconv UTF8 -> UCS2 -> UTF8 & Merge PostgreSQL.restore dump Split restore UTF8->UCS2->UTF8 Merge Migration from PostgreSQL to MySQL using TypePad script Steps PostgreSQL -> PerlObject & tmp publish -> MySQL -> PerlObject & last publish diff tmp & last Object (data check) diff tmp & last publish (file check) data check Object Object TypePad TypePad PostgreSQL Document Document tmp File check last Troubles with MySQL convert_tz function doesn't support the input value outside the scope of Unix Time sort order different sort order without “order by” clause Cocolog Future Plans Dynamic Job queue Consulting by Sumisho Computer Systems Corp. System Integrator first and best partner of MySQL in Japan since 2003 provide MySQL consulting, support, training service HA Maintenance online backup Japanese character support Questions
© Copyright 2024 ExpyDoc