Scaling out Event Ticketing Jens Bussmann Sales Manager Cloud Platform Google Nacho Coloma CTO & Founder Koliseo Google Cloud Platform at a Glance Google Cloud Platform Google Cloud Platform = Commercialization of Google Innovations Google Cloud Platform Compute Storage Services Compute Engine Cloud Storage BigQuery App Engine Cloud SQL Cloud Endpoints Cloud Datastore Selling tickets with love since 2012 Ticketing is a tough market with lots of users hitting your system on weekends Reasons to build Koliseo on App Engine Infinite-scaling NoSQL storage. Flexible, automated server scaling under user load. No DevOps team: we don’t wear pagers. Low cost. MapReduce and NoSQL when all you have is a hammer, everything looks like a nail Photo: 0Four Anyone here using Prototype... ...and considering to migrate to jQuery? Anyone using jQuery... ...considering to migrate to Angular? Photo: Paul Oka The HTTP Archive Introduced in 1996 Registers the Alexa Top 1,000,000 Sites About 400GB of raw CSV data That’s answers to a lot of questions Which sites are using Prototype and jQuery today? * as of June 1 2013 (not really today) How do we know that? SELECT pages.pageid, url, cnt, libs, pages.rank rank FROM [httparchive:runs.2013_06_01_pages] as pages JOIN ( SELECT pageid, count(distinct(type)) cnt, GROUP_CONCAT(type) libs FROM ( SELECT REGEXP_EXTRACT(url, r'(jquery|prototype).*\.js') type, pageid FROM [httparchive:runs.2013_06_01_requests] WHERE REGEXP_MATCH(url, r'jquery|prototype.*\.js') GROUP BY pageid, type ) GROUP BY pageid HAVING cnt >= 2 ) as lib ON lib.pageid = pages.pageid WHERE rank IS NOT NULL ORDER BY rank asc t i e v o r p o t y r e u q a e v a h e W Source: Google BigQuery Google Cloud Platform Compute App Engine (PaaS) Storage Compute Engine (IaaS) Cloud Storage Cloud SQL Services Cloud Datastore Big Data analysis tool BigQuery Cloud Endpoints Google innovations in the last twelve years MapReduce GFS 2002 Dremel Big Table 2004 2006 Spanner Compute Engine Colossus 2008 2010 2012 2013 Awesomeness starts here Google BigQuery Analyze terabytes of data in seconds Data imported in bulk or using streaming Supports CSV and JSON Use the browser tool, the command-line tool or REST API s u r u o e r a o h W ? s er t u o b a g n i n i a l p m o c s r e s u t r O o p p u ZER s r e t t i w T Yahoo!) d d n e a v il o a m Mail, Hot G g in s the rTewitm u re ter, all we (behind BigQuery is a prototyping tool Answers questions that you need to ask once in your life. Has a flexible interface to launch queries interactively, thinking on your feet. Processes terabytes of data in seconds. It’s much cheaper than the alternative. What are the top 100 most active Ruby repositories on GitHub? SELECT repository_name, count(repository_name) as pushes, repository_description, repository_url FROM [githubarchive:github.timeline] WHERE type="PushEvent" AND repository_language="Ruby" AND PARSE_UTC_USEC(created_at) >= PARSE_UTC_USEC('2012-04-01 00:00:00') GROUP BY repository_name, repository_description, repository_url ORDER BY pushes DESC LIMIT 100 Source: Time to start testing these queries Do we have a minute? Photo: Alex Lomix Much more flexible than SQL Multi-valued attributes lived_in: [ { city: ‘La Laguna’, since: ‘19752903’ }, { city: ‘Madrid’, since: ‘20010101’ }, { city: ‘Cologne’, since: ‘20130401’ } ] Correlation and nth percentile SELECT CORR(temperature, number_of_people) Data manipulation: dates, urls, regex, IP... What are the top cities contributing modifications to Wikipedia? SELECT COUNT(*) c, city, countryLabel, NTH(1, latitude) lat, NTH(1, longitude) lng FROM ( SELECT INTEGER(PARSE_IP(contributor_ip)) AS clientIpNum, INTEGER(PARSE_IP(contributor_ip)/(256*256)) AS classB FROM [publicdata:samples.wikipedia] WHERE contributor_ip IS NOT NULL ) AS a JOIN EACH [fh-bigquery:geocode.geolite_city_bq_b2b] AS b ON a.classB = b.classB WHERE a.clientIpNum BETWEEN b.startIpNum AND b.endIpNum AND city != '' GROUP BY city, countryLabel ORDER BY 1 DESC Source: Geoip geolocation with Google BigQuery Cost of BigQuery Loading data Free Exporting data Free Storage $80 per TB/month Interactive queries $35 per TB processed Batch queries $20 per TB processed Not for dashboards: If you need to launch your query frequently, it’s more cost effective to use MapReduce or SQL Questions? Visit
© Copyright 2025 ExpyDoc