Scaling out Event Ticketing

Scaling out Event Ticketing
Jens Bussmann
Sales Manager Cloud Platform
Google
Nacho Coloma
CTO & Founder
Koliseo
Google Cloud Platform at a Glance
Google Cloud Platform
Google Cloud
Platform
=
Commercialization of
Google Innovations
Google Cloud Platform
Compute
Storage
Services
Compute Engine
Cloud Storage
BigQuery
App Engine
Cloud SQL
Cloud Endpoints
Cloud Datastore
Selling tickets with love since 2012
Ticketing is a tough market
with lots of users hitting your system on weekends
Reasons to build Koliseo on App Engine
Infinite-scaling NoSQL storage.
Flexible, automated server scaling under user load.
No DevOps team: we don’t wear pagers.
Low cost.
MapReduce and NoSQL
when all you have is a hammer,
everything looks like a nail
Photo: 0Four
Anyone here using Prototype...
...and considering to migrate to jQuery?
Anyone using jQuery...
...considering to migrate to Angular?
Photo: Paul Oka
The HTTP Archive
Introduced in 1996
Registers the Alexa Top 1,000,000 Sites
About 400GB of raw CSV data
That’s answers to a lot of questions
Which sites are using
Prototype and jQuery
today?
* as of June 1 2013 (not really today)
How do we know that?
SELECT pages.pageid, url, cnt, libs, pages.rank rank FROM [httparchive:runs.2013_06_01_pages] as pages JOIN (
SELECT pageid, count(distinct(type)) cnt, GROUP_CONCAT(type) libs FROM (
SELECT REGEXP_EXTRACT(url,
r'(jquery|prototype).*\.js') type, pageid
FROM [httparchive:runs.2013_06_01_requests]
WHERE REGEXP_MATCH(url, r'jquery|prototype.*\.js')
GROUP BY pageid, type
)
GROUP BY pageid
HAVING cnt >= 2
) as lib ON lib.pageid = pages.pageid
WHERE rank IS NOT NULL
ORDER BY rank asc
t
i
e
v
o
r
p
o
t
y
r
e
u
q
a
e
v
a
h
e
W
Source: http://www.igvita.com/2013/06/20/http-archive-bigquery-web-performance-answers/
Google BigQuery
Google Cloud Platform
Compute
App Engine
(PaaS)
Storage
Compute Engine
(IaaS)
Cloud Storage
Cloud SQL
Services
Cloud Datastore
Big Data analysis tool
BigQuery
Cloud Endpoints
Google innovations in the last twelve years
MapReduce
GFS
2002
Dremel
Big Table
2004
2006
Spanner
Compute
Engine
Colossus
2008
2010
2012
2013
Awesomeness starts here
Google BigQuery
Analyze terabytes of data in seconds
Data imported in bulk or using streaming
Supports CSV and JSON
Use the browser tool, the command-line tool or REST API
s
u
r
u
o
e
r
a
o
h
W
?
s
er
t
u
o
b
a
g
n
i
n
i
a
l
p
m
o
c
s
r
e
s
u
t
r
O
o
p
p
u
ZER
s
r
e
t
t
i
w
T
Yahoo!)
d
d
n
e
a
v
il
o
a
m
Mail, Hot
G
g
in
s
the rTewitm
u
re
ter, all we
(behind
BigQuery is a prototyping tool
Answers questions that you need to ask once in your life.
Has a flexible interface to launch queries interactively, thinking on
your feet.
Processes terabytes of data in seconds.
It’s much cheaper than the alternative.
What are the top 100 most active Ruby repositories on
GitHub?
SELECT repository_name, count(repository_name) as pushes, repository_description,
repository_url
FROM [githubarchive:github.timeline]
WHERE type="PushEvent"
AND repository_language="Ruby"
AND PARSE_UTC_USEC(created_at) >= PARSE_UTC_USEC('2012-04-01 00:00:00')
GROUP BY repository_name, repository_description, repository_url
ORDER BY pushes DESC
LIMIT 100
Source: http://bigqueri.es/t/what-are-the-top-100-most-active-ruby-repositories-on-github/9
Time to start testing these queries
Do we have a minute?
Photo: Alex Lomix
Much more flexible than SQL
Multi-valued attributes
lived_in: [
{ city: ‘La Laguna’, since: ‘19752903’ },
{ city: ‘Madrid’, since: ‘20010101’ },
{ city: ‘Cologne’, since: ‘20130401’ }
]
Correlation and nth percentile
SELECT CORR(temperature, number_of_people)
Data manipulation: dates, urls, regex, IP...
What are the top cities
contributing modifications to
Wikipedia?
SELECT COUNT(*) c, city, countryLabel, NTH(1, latitude) lat, NTH(1, longitude) lng
FROM (
SELECT
INTEGER(PARSE_IP(contributor_ip)) AS clientIpNum,
INTEGER(PARSE_IP(contributor_ip)/(256*256)) AS classB
FROM [publicdata:samples.wikipedia]
WHERE contributor_ip IS NOT NULL
) AS a
JOIN EACH [fh-bigquery:geocode.geolite_city_bq_b2b] AS b
ON a.classB = b.classB
WHERE a.clientIpNum BETWEEN b.startIpNum AND b.endIpNum AND city != ''
GROUP BY city, countryLabel
ORDER BY 1 DESC
Source: Geoip geolocation with Google BigQuery
Cost of BigQuery
Loading data
Free
Exporting data
Free
Storage
$80 per TB/month
Interactive queries
$35 per TB processed
Batch queries
$20 per TB processed
Not for dashboards: If you need to launch your query frequently, it’s more cost
effective to use MapReduce or SQL
Questions?
Visit http://bigqueri.es