A Language for Grid File System

DGL: The Assembly Language for
Grid Computing
Arun swaran Jagatheesan
[email protected]
San Diego Supercomputer Center (SDSC)
University of California, San Diego
GriPhyN All Hands Meeting
May 17, 2004, University of Chicago
National Partnership for Advanced Computational Infrastructure
University of Florida
San Diego Supercomputer Center
Acknowledgement
• Participants
•
•
•
•
•
•
•
Jonathan Weinberg
Allen Ding
Dipti Borkar
Erik Vandekieft
Reena Mathew
Marcio Faerman (SCEC)
Lucas Gilbert (BIRN)
Also an out-sourced resource from
the Gator’s Physics department –
Thanks to Paul Avery for this
important resource
• Good-will Wishers
• Reagan Moore and SDSC SRB Team
• Kim Baldridge
• You !!!
Grid Physics Network (GriPhyN)
University of Florida
2
San Diego Supercomputer Center
Talk Outline
•
•
•
•
•
•
•
•
Problem : Gridflow Description and Querying
Gridflow Description
Gridflow Language Requirements
Options
Path we took
Our success
Our future
Summary
Grid Physics Network (GriPhyN)
University of Florida
3
San Diego Supercomputer Center
SRB Data Grid Management Systems
Southern California Earthquake Center
NASA Data Grids
This work is
generic and not
restricted to SRB
alone
NIH Biomedical Informatics Research Network
National Science Digital Library
Scripps Institute of Oceanography
Grid Physics Network (GriPhyN)
University of Florida
4
San Diego Supercomputer Center
Gridflow in SCEC
(data  information pipeline)
Metadata derivation
Ingest Data
Ingest Metadata
Pipeline could be
triggered by input
at data source or
by a data request
from user
Determine analysis pipeline
Initiate automated analysis
Use the optimal set
of resources based
on the task – on
demand
Organize result data into distributed
data grid collections
All gridflow activities
stored for data flow
Grid Physics Network (GriPhyN) provenance
University of Florida
5
San Diego Supercomputer Center
Data  Discovery
New data
Digital entities
updates relationships among
data in collections
Meta-data
Services invoked to analyze
new relationships
Services
DGMS applications get
notified of state updates
State
Grid Physics Network (GriPhyN)
University of Florida
6
San Diego Supercomputer Center
What they want?
We know the
business
(scientific) process
CyberInfrastructure is
all we care (why bother
about colliding atoms)
Grid Physics Network (GriPhyN)
University of Florida
8
San Diego Supercomputer Center
What they want?
Use DGL to describe
your process logic with
abstract references to
datagrid infrastructure
dependencies
Describe resource, site, VO or
grid policy dependencies
independently (UPL, CVF??)
Grid Physics Network (GriPhyN)
University of Florida
9
San Diego Supercomputer Center
Gridflows
• Grid Workflow (Gridflow) is the automation of a
execution pipeline in which data or tasks are
processed through multiple autonomous grid
resources according to a set of procedural rules
• Gridflows are executed on resources that are
dynamically obtained through confluence of one
or more autonomous administrative domains
(peers)
Grid Physics Network (GriPhyN)
University of Florida
10
San Diego Supercomputer Center
Gridflow Language and CS Domains
• Compiler Design
• Variable scope definition, Recursive Grammar, Execution
Stack Management,
• Data Modeling
• Schema definitions for gridflow patterns
• Grid Computing
• Data Grid data types, Virtual Organization, basic
operations, …
• Other concepts and Standards
• Rules, W3C XQuery, GGF JSDL?
Grid Physics Network (GriPhyN)
University of Florida
11
San Diego Supercomputer Center
Gridflow Language Requirements
• High level Abstract descriptions
• Abstract description of cyberinfrastructure dependencies
• Simple yet flexible
• Flexible to describe complex requirements (no brute force)
• Gridflow dependency patterns
• Based on execution structure and data semantics
• (Parallel, Sequential, fork-new), (milestones, for-each,
switch-case)..
• Asynchronous execution
• For long-run requests
• Querying using existing standard
• XQuery
Grid Physics Network (GriPhyN)
University of Florida
12
San Diego Supercomputer Center
Gridflow Language Requirements
• Process meta data and annotations
• Runtime definition, update and querying of meta-data
• Runtime Management of Gridflows
• Stop gridflow at run time
• Partitioning
• Facility in language to divide a gridflow request to multiple
requests
• Import descriptions
• Refer other gridflows in execution
Grid Physics Network (GriPhyN)
University of Florida
13
San Diego Supercomputer Center
Data Grid Language (DGL)
• DGL is just a language specification
• Can be used in any commercial or academic
data grid software
• DGL describes gridflow description and
dependencies
Grid Physics Network (GriPhyN)
University of Florida
14
San Diego Supercomputer Center
Gridflow Process I
End User using DGBuilder
Grid Physics Network (GriPhyN)
University of Florida
15
Gridflow Description
Data Grid Language
San Diego Supercomputer Center
Gridflow Process II
Abstract Gridflow using
Data Grid Language
Grid Physics Network (GriPhyN)
University of Florida
16
Planner
Concrete Gridflow
Using Data Grid Language
San Diego Supercomputer Center
Gridflow Process III
Gridflow Processor
Concrete Gridflow
Using Data Grid Language
Grid Physics Network (GriPhyN)
University of Florida
17
Gridflow P2P Network
San Diego Supercomputer Center
DGL - Hypothetical Picture
• SRB Operation
•GridFTP Operation
•Condor execution DAG
•TeraGrid Scheduler
DGL Compiler
node?
(at run time – late binding)•Capone?
•…
Grid Physics Network (GriPhyN)
University of Florida
18
San Diego Supercomputer Center
DGL Structure (data model)
Flow Logic Structure
Structure – parallel, sequential etc.,
Pre-Process
ECA Rule based
definitions
Runnable
Recursive definition of
runnables as either data operation
or as a executable process (Job)
Post-Process
Meta-data
Grid Physics Network (GriPhyN)
University of Florida
19
San Diego Supercomputer Center
Operations in DGL
• Execute Process (DAG, java, WSDL, etc)
• Very generic Datagrid operations
•
•
•
•
•
•
•
•
Copy directories/files
Change Permissions (Chmod)
Create directory/file/archive
Delete directory/file/archive
Ingest/download URl or any data source
Replicate, Rename, List
SeekNWrite, SeekNRead
Ingest, Query Any type of Metadata
Grid Physics Network (GriPhyN)
University of Florida
20
San Diego Supercomputer Center
Components of DGL
• DGL document is either a request or a response
• Data Grid Request
• Could be a Flow (aggregation of operations)
• Or could be a Status Query
• Data Grid Response
• Could be a Flow Acknowledgement
• Or could be a Status Response
• Can be made Synchronous or Asynchronous
• Flexibility for any type of Implementation
Grid Physics Network (GriPhyN)
University of Florida
21
San Diego Supercomputer Center
Summary
• A standard description language is Needed
• Requirements of the language
• Data Grid Language (DGL)
•
•
•
•
Recursive definition of flows and steps
Metadata or variable scopes
Rules
Can be partitioned (sub-divided)
• Components of Data Grid Language
• Next step: Talk to Scheduling or Heuristics
people
Grid Physics Network (GriPhyN)
University of Florida
22
San Diego Supercomputer Center