Research Data Exchange (RDE) and Safety Pilot Model - FOT-Net

FOT-Net Final Event
First FOT-Net Data Workshop
Amsterdam, 18-19 March 2014
Research Data Exchange (RDE) and
Safety Pilot Model Deployment Data
Dale Thompson
USDOT / ITS Joint Program Office
Outline
 The Research Data Exchange (RDE)
□ Mission
□ Structure
□ Statistics and Usage
 Featured Data Environment: Safety Pilot Model Deployment (SPMD)
□ Overview of the SPMD
□ Hosting the SPMD Data
 Future Data Environment - SHRP 2 Naturalistic Driving Study (NDS)
□ Overview of SHRP2 NDS
□ Anticipated Use Cases and Challenges Of NDS Data
U.S. Department of Transportation
ITS Joint Program Office
2
The RDE: a central transportation data repository for
researchers and application developers
 Data Capture and Management
program’s mission is to provide a
variety of data-related services that
support the development, testing, and
demonstration of multi-modal
transportation mobility applications
 The RDE is a transportation data
sharing system that promotes sharing
of archived and real-time data from
multiple sources and multiple modes
 The RDE provides the ability for users
to download data and appropriate
documentation, create research
projects and collaborate with other
users, and comment on data sets
Source: http://its-rde.net/
U.S. Department of Transportation
ITS Joint Program Office
3
The RDE employs the concept of a Data Environment
to structure the various data sets
 RDE organizes data using a data
environment / data set / data file
hierarchy
 A Data Environment is a collection of
data sets which were obtained under
the same test / experiment
 Data Sets represent a logical
arrangement of files that convey a
central concept or idea about an
aspect of a data collection exercise
 Data sets contain Data Files that are
archived collection of data (elements)
and can be text, zip, binary, or other
file types
Source: http://its-rde.net/
U.S. Department of Transportation
ITS Joint Program Office
4
The RDE currently houses 11 Data Environments with
data from different locations throughout the US
 States from which data have been
collected include California, Florida,
Michigan, Minnesota, Oregon,
Virginia, and Washington
 The number of data sets that are a
part of these 11 environments range
from 2 to 37
 And the number of data files per data
set ranges from a few to well over 100
Source: http://its-rde.net/
U.S. Department of Transportation
ITS Joint Program Office
5
The Safety Pilot Model Deployment is a recently added
data environment to the RDE
 SPMD is an exploration of the realworld effectiveness of connected
vehicle safety applications in multimodal driving conditions
 A one-day sample of SPMD data is
captured in this environment
□ This provide users with a snapshot
of the output from the
implementation of connected
vehicle technology
 This environment contains
□ 5 data sets
□ mobility data elements collected
from approximately 3000 vehicles
□ weather and infrastructure related
data elements
Source: USDOT
U.S. Department of Transportation
ITS Joint Program Office
6
Hyper-accurate, hyper-frequent data posed a series of
challenges in uploading the SPMD data
 Some of the challenges faced in
making the SPMD data publicly
available include:
□ Data governance
□ Distribution rights
□ Personally identifiable information
□ Size of the data sets and data files
 Understanding the data governance
structure amongst involved entities is
integral to acquiring data for
distribution
 Two of the more directed constraints
for data distribution are the inclusion
of data that may compromise the
initial goal of the exercise, and the
presence of PII
Source: http://www.safetypilot.us/
U.S. Department of Transportation
ITS Joint Program Office
7
PII had to be removed from the SPMD data while
maintaining meaningfulness of the data
 To protect participants’ identity the
RDE team rid all data files of data
elements that contain PII
Complete Trajectories
 Data elements that could be paired
with other publicly available data were
also deleted
 Vehicle trajectories, with points
collected at 10Hz, revealed the
identity of participants, therefore
□ Sanitization algorithms were
developed to truncate trajectories to
mask trip origins and destinations
□ The algorithms were also applied to
dependent / related data elements
Truncated Trajectories
Map Source: Google Maps
U.S. Department of Transportation
ITS Joint Program Office
8
Connected vehicle data is an emerging area, subject to
“Big Data” opportunities and challenges
 The SPMD data environment was
structured in 5 data sets, with a total
sanitized volume of approximately 24
GB (largest file ~ 10GB) for a 24-hr
period
 The original un-sanitized data set was
approximately 50GB
 The challenge with working with such
large data sets is two-fold
□ Extracting and sanitizing the data is
computationally expensive
□ (Large) files had to be carefully
broken into more manageable
segments for easy download
Source: http://www.safetypilot.us/
U.S. Department of Transportation
ITS Joint Program Office
9
The RDE team will continue to post additional data sets
while leveraging efforts of similar data sharing entities
 Data sets being pursued for RDE
hosting include data from:
□ Dynamic Mobility Applications
□ Applications for the Environment:
Real Time Information Synthesis
(AERIS)
□ Road Weather Management
Program
 Entities that the RDE team is looking
to partner with, to not only share data,
but also sharing strategies and
insights when distributing data
□ FOT-Net Data
□ Research Data Alliance
U.S. Department of Transportation
ITS Joint Program Office
10
The RDE team will be adding the SHRP2 Naturalistic
Driving Study (NDS) data in the coming months
 Designed to investigate ordinary
driving under real world conditions,
with aim of learning about driver
decisions
 Wide-spread demographics of the
study’s 3100 participants
 Two year timeframe for extensive data
collection
 Wide-spread geography of test sites
around the US: Tampa, FL;
Bloomington, IN; Durham, NC;
Buffalo, NY; State College, PA;
Seattle, WA.
Source: https://insight.shrp2nds.us/docs/shrp2_background.pdf
U.S. Department of Transportation
ITS Joint Program Office
11
There is a wide variety of data available from the study
 Driver Assessment Data: visual perception, medical history, reaction time, driving
knowledge, etc.
 Vehicle Data: vehicle make and model, and how vehicle is equipped (with sensors,
for example)
 Driving Data: Video images from various perspectives in vehicle, vehicle kinematics,
and others such as seat belt use, steering wheel angle, alcohol presence, radar to
identify external near field objects
 Crash Data: interview Q&As, police crash reports
 Roadway Data: roadway geometry, speed limit signs, intersection location and
characteristics, etc. (these data are obtained in an effort separate from the collection
of the driving data)
U.S. Department of Transportation
ITS Joint Program Office
12
In making NDS data accessible via the RDE, the
procedure followed will be informed by that of SPMD
 Similar to the challenges when
distributing SPMD data, these
challenges will face when distributing
NDS data
 These challenges include:
□ Data governance
□ Distribution rights
□ Personally identifiable information
□ Size of the data sets and data files
 RDE team will employ lessons leaned
from posting the SPMD data to the
RDE, while being cognizant of the
nuisances of the NDS data that will
lead changes to the developed
approach
U.S. Department of Transportation
ITS Joint Program Office
13
RDE Policy Issues
 The RDE is a public-facing, research resource that hosts large volumes of potentially
sensitive data from multiple sources
 It required development of policies and procedures in a number of areas typical of
other websites:
□ Authorities and membership management
□ Accessibility
□ Terms of use
 To create the RDE, the team also confronted a range of unique policy issues in
these areas:
□ Data ownership
□ Data security
□ Data privacy
U.S. Department of Transportation
ITS Joint Program Office
14
Data Ownership
 Issue: The RDE may host data that has been provided by different sources:
□ Federal contractors
□ State and other public agencies
□ Universities
□ Private individuals or businesses
 Relevant RDE Goal: Foster and support research in transportation operations by a
wide variety of stakeholders
 Challenge: Balance rights of various providers (with different institutional structures
and needs) against needs for wide access and use
 Response:
□ Sign agreements with each data contributor
□ Offer RDE content to the public under open source license that requires attribution
(Creative Commons Attribution-ShareAlike 3.0 Unported)
U.S. Department of Transportation
ITS Joint Program Office
15
Data Security
 Issue: The RDE contains several terabytes of data, scaling up to petabytes.
 Relevant RDE Goals:
□ Offer reliable and cost-effective access to huge data sets
□ Comply with Federal Information Security Management Act (FISMA)
 Challenge: Develop a business model for on-site Departmental hosting or certify
external server host
 Response:
□ Launch version 1.0 on website contractor servers
▪ Enforce security training
▪ (Insert additional info on IndraSoft certification here)
□ Transition to FedRAMP-certified cloud-based host (Amazon Web Services or
similar)
U.S. Department of Transportation
ITS Joint Program Office
16
Data Privacy
 Issue: The RDE contains GPS traces from vehicles.
 Relevant RDE Goals:
□ Provide maximum research value from available data
□ Protect identity of vehicle users
 Challenge: Develop an approach to reliably de-identify GPS traces
 Response:
□ Launch with GPS traces only from public agency vehicles on agency business
□ Develop processes for:
▪ GPS trace de-identification by minimal truncation
▪ Validation of the de-identification methods
U.S. Department of Transportation
ITS Joint Program Office
17
Near-Term Steps: Data Federation
 Issue: Data federation entails providing access through the RDE to data sets not
owned or managed by the RDE Team
 Relevant RDE Goals:
□ Protect data rights of providers
□ Protect privacy of vehicle users in data sets
□ Ensure overall system security
 Challenge: Develop a flexible system of agreements that can be instituted between
the RDE Team and federated sites
 Response: TBD
U.S. Department of Transportation
ITS Joint Program Office
18
Questions
Dale Thompson
USDOT
ITS Joint Program Office
[email protected]
202-493-0259
U.S. Department of Transportation
ITS Joint Program Office
19