Data Format Description Language (DFDL)

Data Format Description
Language (DFDL) WG
Martin Westhead
EPCC, University of Edinburgh
[email protected]
Alan Chappell
PNNL
[email protected]
Agenda
•
•
•
•
Introduction and welcome - Martin Westhead 10mins
Binary Format Description Language (BFD) - Alan Chappell 10mins
Binary XML (BinX) - Stephen Rutherford 10mins
DFDL - Martin Westhead 15mins
–
–
–
Big picture
Structural Description Language
Charter
(20 mins Discussion)
• Examples repository - Alan Chappell 10mins
– Bruce Barkstrom Examples at NASA
(15mins Discussion)
Motivation
• There will never be a standard data format
–
–
–
–
E.g. XML – verbose, tree-based, explicit structure
Legacy formats
Application specific formats
One size will never fit all
• But could we provide a language for describing
formats
– Transparency of physical representation
– Automatic format conversion
– Unambiguous description of data
There’s more…
Explicit structure enables:
• Standard transformation to/from XML
representation
– Could allow application to read/write XML
– But provide underlying efficient binary representation
• Data stream/file becomes database
–
–
–
–
Point to parts of the structure
Extract parts of the structure
Modify parts of the structure
Integrate parts of different structures
And more…
• Generic tools possible
– Browsing
– Conversion and transformation
• Annotation of data
– E.g. identify bits that depict hurricane in an image
• Enables general semantic labels, many ontologies could
be developed e.g.:
– S.I. units, SQL types, Time
– Community specific labels, “starClass = whiteDwarf”
– Application specific labels, “nodeColour = green”
• Could lead to a standard transformation language
Not fairy tales
• Based on implemented work
– BinX
http://www.epcc.ed.ac.uk/gridserve/WP5/Binx/
– BFD part of the Scientific Annotation Middleware
project (http://www.scidac.org/SAM/)
• Generalized and extended a little
• Formal semantics
• Foundation for extensibility
Approach
• Separate out structure and semantics
• General structural language
–
–
–
–
Repetition
Pointers
References to data
New structures can be built (compositionality)
• Semantics
–
–
–
–
Hard to express so…we don’t
General labeling
Label semantics define elsewhere (ontologies)
Labels can be added (extensibility)
Structure – arbitrary labels
bunchThings
fooPair
foo
bunchThings
fooSet
bunchThings
bunchThings
foo
fooPair
fooPair
fooPair
.
.
.
.
.
.
thing
0
thing
1
thing
1
thing
0
thing
0
thing
1
thing
1
thing
1
.
.
.
.
.
.
Structure – example labels
byte
complex
float
complex
Array
byte
byte
byte
float
complex
complex
complex
.
.
.
.
.
.
bit
0
bit
1
bit
1
bit
0
bit
0
bit
1
bit
1
bit
1
.
.
.
.
.
.
Structural language
• Formal semantics
– Structured binary sequence
– Defines hierarchical structure over underlying sequence of binary values
• Language for describing hierarchical structure
– Repetition
• Explicit number repeats
• Termination characters
– Data reference
• Conditionals
• Data size
– Pointers
• Scope
– As general as possible but
– Must be concise and implementable
• Draft language definition on web page (www.epcc.ed.ac.uk/dfdl)
CSV file example
char:=byte
data:=[(char - [',']).*]
field:=[data; [',']]
finalField:=[data; [‘\n’]]
row:=[field.*] :: [finalField]
table:=[row.*]
Semantic labels
• Many ontologies possible
• Initial scope probably:
– Basic types (floating point, integer, character)
– Simple structures (structs, arrays, tables)
• Obvious extensions:
– SQL types
– XML Schema types
• Key WG goal:
– Define form and requirements of new ontologies
What is an Ontology?
• XML Schema for new types
• Structural description of new types
• Definition of core API behaviour on new
type
• API extensions
• Relationships to other types
WG goals
• Formal language for DFDL data structure
• Standard representation of this language
in XML
• Requirements for DFDL ontology
• Basic types ontology
• Basic structures ontology
Currently under discussion
• Abstraction from the underlying binary
– Compression, encoding, encryption
– Physical vs. conceptual binary sequence
• Abstraction of description
– complex:=[foo; foo]
– Instantiate “foo:= float” or “foo:= double” at use time
• Filtering of results
– Getting to data model and leave format behind
– CSV -> [[value; value; value]; [value; value; value]]
DFDL in the VO
• Generic tools
• Metadata possibilities
– Ontologies can define relationships between
types
– E.g. polar to Cartesian
– Standard classes over data objects
Getting involved
• Webpages:
http://www.epcc.ed.ac.uk/dfdl
• Mailing list ([email protected])
• My address:
[email protected]