ALMA data quality assurance (QA) and the "scriptForPI"

ALMA data quality assurance (QA) and the "scriptForPI"
Dirk Petry (ESO/EU ARC) November 2014
Outline
→ Introduction
→ The ALMA pipelines and ALMA Data Quality Assurance (QA0 - QA3)
→ ALMA QA2 with “the Script Generator”
→ ALMA QA2 with “the Pipeline”
→ The scripts in the PI package
→ the scriptForPI
→ scriptForCalibration, scriptForFluxCalibration, scriptForImaging
D. Petry, ALMA data QA and the "scriptForPI", November 2014
1
Operational Features of ALMA
●
service-observing only, PI not present to help or watch during observation
●
dynamic scheduling: proposer does not set exact time or date (with exceptions)
●
(approx.) one-year proposal and planning cycle, ToO programs, DDT
●
●
●
Large data volumes
- typical dataset (few hours obs time) will consist of several 100 GB of data
service data analysis:
- full calibration and standard imaging performed by the project
- in addition to raw data, PI obtains custom calibration and imaging scripts,
and standard imaging products
(expert) users are given full data analysis software: CASA
D. Petry, ALMA data QA and the "scriptForPI", November 2014
2
ALMA QA
●
●
“The goal of ALMA Quality Assurance (QA) is to deliver to the PI a
reliable final data product that has reached the desired control
parameters outlined in the science goals, that is calibrated to the
desired accuracy and free of calibration or imaging artifacts.”
i.e. Science-goal-oriented service data analysis
D. Petry, ALMA data QA and the "scriptForPI", November 2014
3
ALMA QA
Science-goal-oriented service data analysis
● PI defines science goals in propsal using the Observing Tool (OT)
⇒ Scheduling Blocks (“SBs”) in Observation Unit Sets
● SB = prototype of an atomic (ca. 0.5 h) observation to reach a science goal
● Exec Block = actual execution of an SB (may need several to reach science goal)
MemberOUS
SB
Execution creates
ExecBlock 1
ExecBlock 2
...
ExecBlock n
until required sensitivity reached
D. Petry, ALMA data QA and the "scriptForPI", November 2014
4
ALMA Data Flow
ALMA Archive
Bulk Data
from Correlator
QA0 (rapidly var. par.)
Instrumental State
& Calibration
Database
Metadata
Raw Data
OSF
TelCal: pointing, focus,
delay, system health, ...
Bulk Data
(visibilities)
QA1 (slowly var. par.)
QA2 Calibration
Calibration
Products
Feed back results
Metadata
from Control
JAO and the ARCs
QA2 Imaging
and Spectroscopy
D. Petry, ALMA data QA and the "scriptForPI", November 2014
Science Products
5
ALMA Data Flow
ALMA Archive
Bulk Data
from Correlator
QA0 (rapidly var. par.)
Instrumental State
& Calibration
Database
Metadata
Raw Data
OSF
TelCal: pointing, focus,
delay, system health, ...
Bulk Data
(visibilities)
QA1 (slowly var. par.)
QA2 Calibration
Calibration
Products
Feed back results
Metadata
from Control
JAO and the ARCs
QA2 Imaging
and Spectroscopy
D. Petry, ALMA data QA and the "scriptForPI", November 2014
Science Products
6
ALMA Data Analysis and Quality Assurance Teams
QA team (JAO + ARCs) - led by Baltasar Vila Vilaro and Eric Villard
+ 3 Data Reduction Managers at the ARCs (D. Petry, M. Lacy, H. Shinaga)
ca. 6 contributors + up to 40 analysts ... use CASA
At the EU ARC, the analysis work is
managed by ESO and distributed
over ESO and the 8 ARC nodes.
At any time, typically 10 analysts
are at work.
D. Petry, ALMA data QA and the "scriptForPI", November 2014
7
ALMA QA
QA consists of 3 (+1) steps
QA0: Checks at the time of data acquisition:
Atmosphere, Antennas, Front-Ends, Connectivity, Back-Ends
QA1: Monitor slowly varying array performance parameters:
Arrays, Antennas, Calibration Sources
completion of an Obs Unit Set member triggers
QA2: Confirm that the Science Goal was met;
request additional data and iterate if not
(implies full calibration + generation of standard science products)
after QA2 is passed (Science Goal met),
the OUSmember data is delivered to the PI
potential problem report by PI triggers
QA3: re-reduction of the data,
possibly replacement of products in the archive
D. Petry, ALMA data QA and the "scriptForPI", November 2014
8
ALMA QA2
●
●
●
Ultimately, QA2 on all data from standard observing is supposed to be performed
by the fully automated Science Pipeline .
How can a fully automated pipeline be developed before the observatory
is completely commissioned?
Three modes of processing will coexist:
1) Semi-automatic processing using the Script Generator
2) Data calibration with automated pipeline + semi-automatic imaging
3) Fully automated pipeline including imaging
●
Status Nov 2014: - so far most processing has been done in mode (1)
- first version of mode (2) has successfully been commissioned
and used for simple observation modes since October 2014
D. Petry, ALMA data QA and the "scriptForPI", November 2014
9
ALMA QA2 - Script Generator assisted analysis
●
●
●
Before a fully automated Pipeline can be commissioned,
the manual data analysis has to be fully understood!
A large team of ALMA scientists worked together to develop the
best practices to perform a robust standard calibration of ALMA Cycle 0 data.
Following an idea by Eric Villard, these best practices were then slowly
automated using a system of Python scripts called the “Script Generator”
Raw data
from archive
●
●
creates
draft
Analyst
edits
calibrated data
and QA2 science
products
The Script Generator evaluates a raw dataset (imported MS) and writes
a draft for a CASA data reduction script (one each for calibration and imaging)
The data analyst then edits the draft scripts where necessary and runs them
(typically in small steps) iterating until confindent that best calibration achieved
D. Petry, ALMA data QA and the "scriptForPI", November 2014
10
ALMA QA2 - The Science Pipeline
- The ALMA Pipeline is based on CASA and distributed with CASA v4.2.2
- Processes interferometry and single-dish data
- Processing is (meta‐)data driven
- Pipeline team uses CASA tasks and tools to create heuristics tasks
- meant to run in batch mode on HPC clusters
- Pipeline commissioning and verification is performed by
comparing the results with those obtained from the
script-generator-assisted analysis.
D. Petry, ALMA data QA and the "scriptForPI", November 2014
11
The scripts in the PI data package
Structure of the data package directory tree
project
science group OUS
group OUS
member OUS
README ........ read this first
script ............... contains all calibration and imaging scripts
calibration ....... calibration tables
log ................... calibration and imaging log files
qa .................... diagnostic summary and plots
product ............ the FITS cubes of all images
D. Petry, ALMA data QA and the "scriptForPI", November 2014
12
The scripts in the PI data package
Structure of the data package directory tree
project
science group OUS
group OUS
member OUS
README ........ read this first
script .............. all calibration and imaging scripts
calibration ....... calibration tables
log ................... calibration and imaging log files
qa .................... diagnostic summary and plots
product ............ the FITS cubes of all images
D. Petry, ALMA data QA and the "scriptForPI", November 2014
13
The scripts in the PI data package
Structure of the data package directory tree
project
science group
group OUS
member OUS
README ........ read this first
script ............... all calibration and imaging scripts
calibration ....... calibration tables
log ................... calibration and imaging log files
qa .................... diagnostic summary and plots
product ............ the FITS cubes of all images
raw .................. created when ASDMs are unpacked
D. Petry, ALMA data QA and the "scriptForPI", November 2014
14
The scripts in the PI data package
Structure of the data package directory tree
project
science group
group OUS
member OUS
README ........ read this first
script ............... all calibration and imaging scripts
calibration ....... calibration tables
log ................... calibration and imaging log files
qa .................... diagnostic summary and plots
product ............ the FITS cubes of all images
raw .................. created when ASDMs are unpacked
calibrated ....... created when scriptForPI.py is run
D. Petry, ALMA data QA and the "scriptForPI", November 1111
11
The scripts in the PI data package
(contained in the "script" directory")
Filename
Origin
Purpose
uid*.ms.scriptForCalibration.py script(optional)
generator/
analyst
calibrates a single EB (ASDM);
results in one uid*.ms.split.cal
PPR*.xml
(optional)
ALMA
Pipeline
controlled the run of the ALMA Pipeline;
contains the list of ASDMs
casa_piperestorescript.py
(optional)
ALMA
Pipeline
calibrates all pipeline-processed EBs;
results in one uid*.ms.split.cal per EB
casa_pipescript.py
(optional)
ALMA
Pipeline
enables user to rerun the Pipeline from scratch
results in one uid*.ms.split.cal per EB
scriptForFluxCalibration.py
(optional)
scriptgenerator/
analyst
adjust the flux calibration of several EBs close
in time which use same phase calibrator;
prepare imaging; results is calibrated.ms
scriptForImaging.py
scriptgenerator/
analyst
create all imaging products for the MOUS;
results in (among others) *.fits files for all
images
scriptForPI.py
added in
packaging
Perform all necessary steps to create all
uid*.ms.split.cal MSs
D. Petry, ALMA data QA and the "scriptForPI", November 2014
16
The ScriptForPI
Standard Execution of the scriptForPI
Make a work directory, e.g. "work": mkdir work
Move your delivery package into "work"
Move the *.asdm.sdm.tgz tarballs of the ASDMs into "work".
Unpack the delivery package first:
tar xvf 201*.1.*.S*.tar
Then unpack the *.asdm.sdm.tgz . (*.asdm.sdm ASDMs fall into subdir "raw")
for NAME in `ls *.asdm.sdm.tgz`; do tar xvzf $NAME; done cd into subdir "script"
cd 201*.S/*/*/*/script
start CASA with pipeline:
casapy ­­pipeline
> execfile('scriptForPI.py')
D. Petry, ALMA data QA and the "scriptForPI", November 2014
17
The ScriptForPI
The SPACESAVING option
CASA calibration will take large amounts of disk space.
scriptForPI.py will try to estimate the required free diskspace and warn you
if you don't have enough.
You can reduce the need for space by deleting the intermediate MSs.
Setting CASA variable SPACESAVING to a value > 0 will delete them for you.
SPACESAVING = N
execfile('scriptForPI.py')
where N is an integer from 0 to 3 with the following meaning:
SPACESAVING = 0 same as not set (all intermediate MSs are kept)
= 1 do not keep intermediate MSs named *.ms.split
= 2 do not keep intermediate MSs named *.ms and *.ms.split
= 3 do not keep intermediate MSs named *.ms, *.ms.split,
and *.ms.split.cal (if possible)
D. Petry, ALMA data QA and the "scriptForPI", November 2014
18
The ScriptForPI
Additional ASDMs without calibration info, missing ASDMs.
If you have additional ASDMs for which there is no calibration info available,
you will get a warning:
WARNING: Inconsistency between ASDMs and calibration scripts
Calibration info available for: uid...
ASDMs available in directory raw: uid...
Only the ASDMs for which there is calibration information
will be calibrated
If you have not downloaded and unpacked all ASDMs for which there is calibration info,
you will get this message:
ERROR: the following ASDMs have calibration information but are absent from directory "raw": uid...
Will try to proceed with the rest ...
D. Petry, ALMA data QA and the "scriptForPI", November 2014
19
The calibration and imaging scripts
uid*.scriptForCalibration.py
raw data(ASDM)
(one for each EB)
MS for one EB
import
a priori
flagging
(based on observatory information)
(antenna position calibration, rarely necessary)
Antenna pos. calibration
antpos cal.
table
WVR correction
phase cal.
table
(Water Vapour Radiometer based phase correction)
Tsys
calibration
Tsys cal.
table
(atmospheric opacity correction)
apply apriori
cal. tables
Detailed
Flagging
apriori calibrated MS for one EB
named uid*.ms.split
(if problems are found later, re­iterate from here)
D. Petry, ALMA data QA and the "scriptForPI", November 2 0 1 4
20
The calibration and imaging scripts
uid*.scriptForCalibration.py
set model of
flux cal.
flux cal.
model
(set model of flux calibration source
from Buttler JPL Horizons or ALMA cal database)
phase­selfcal
of bandpass cal
phase cal.
table
(fit time­dep. of phase of bandpass cal.;
= selfcal on bandpass) bandpass
calibration
bandpass cal.
table
gain
calibration
phase+amp
cal. table
scale gain
cal. result phase+amp
cal. table
(fit freq­dependence of gain of bandpass cal.) (fit time­dependence of gain and phase of phase cal.) (scale the gain calibration result to the
correct absolute flux)
apply gain+band
calib. to target
calibrated MS
for one EB
named uid*.ms.split.cal
D. Petry, ALMA data QA and the "scriptForPI", November 2014
21
The calibration and imaging scripts
calibrated MS
for one EB
calibrated MS
for one EB
If necessary, flux equalisation and concatenation
takes place: scriptForFluxCalibration.py
calibrated
MS
scriptForImaging.py
imaging
image cube(s)
numerical
analysis
...
image the science targets using clean
continuum and (where requested) line images;
apply continuum subtraction and/or selfcal where needed;
apply primary beam correction. Result: FITS image cube(s)
viewing,
plotting
All further analysis is left to the PI.
(Although sometimes moment maps are created.)
plots and numerical results
D. Petry, ALMA data QA and the "scriptForPI", November 2014
22
QA2 Products Documentation
https://almascience.eso.org/documents-and-tools/cycle-2/alma-qa2-products-v2.1
See also the recent SPIE paper by the DRMs:
Petry, D. et al, 2014, "ALMA service data analysis and level 2 quality assurance with CASA"
Proc. SPIE, Volume 9152, id. 91520J 6 pp (http://arxiv.org/abs/1407.7142)
and references therein.
D. Petry, ALMA data QA and the "scriptForPI", November 2014
23