ALMA data quality assurance (QA) and the "scriptForPI" Dirk Petry (ESO/EU ARC) November 2014 Outline → Introduction → The ALMA pipelines and ALMA Data Quality Assurance (QA0 - QA3) → ALMA QA2 with “the Script Generator” → ALMA QA2 with “the Pipeline” → The scripts in the PI package → the scriptForPI → scriptForCalibration, scriptForFluxCalibration, scriptForImaging D. Petry, ALMA data QA and the "scriptForPI", November 2014 1 Operational Features of ALMA ● service-observing only, PI not present to help or watch during observation ● dynamic scheduling: proposer does not set exact time or date (with exceptions) ● (approx.) one-year proposal and planning cycle, ToO programs, DDT ● ● ● Large data volumes - typical dataset (few hours obs time) will consist of several 100 GB of data service data analysis: - full calibration and standard imaging performed by the project - in addition to raw data, PI obtains custom calibration and imaging scripts, and standard imaging products (expert) users are given full data analysis software: CASA D. Petry, ALMA data QA and the "scriptForPI", November 2014 2 ALMA QA ● ● “The goal of ALMA Quality Assurance (QA) is to deliver to the PI a reliable final data product that has reached the desired control parameters outlined in the science goals, that is calibrated to the desired accuracy and free of calibration or imaging artifacts.” i.e. Science-goal-oriented service data analysis D. Petry, ALMA data QA and the "scriptForPI", November 2014 3 ALMA QA Science-goal-oriented service data analysis ● PI defines science goals in propsal using the Observing Tool (OT) ⇒ Scheduling Blocks (“SBs”) in Observation Unit Sets ● SB = prototype of an atomic (ca. 0.5 h) observation to reach a science goal ● Exec Block = actual execution of an SB (may need several to reach science goal) MemberOUS SB Execution creates ExecBlock 1 ExecBlock 2 ... ExecBlock n until required sensitivity reached D. Petry, ALMA data QA and the "scriptForPI", November 2014 4 ALMA Data Flow ALMA Archive Bulk Data from Correlator QA0 (rapidly var. par.) Instrumental State & Calibration Database Metadata Raw Data OSF TelCal: pointing, focus, delay, system health, ... Bulk Data (visibilities) QA1 (slowly var. par.) QA2 Calibration Calibration Products Feed back results Metadata from Control JAO and the ARCs QA2 Imaging and Spectroscopy D. Petry, ALMA data QA and the "scriptForPI", November 2014 Science Products 5 ALMA Data Flow ALMA Archive Bulk Data from Correlator QA0 (rapidly var. par.) Instrumental State & Calibration Database Metadata Raw Data OSF TelCal: pointing, focus, delay, system health, ... Bulk Data (visibilities) QA1 (slowly var. par.) QA2 Calibration Calibration Products Feed back results Metadata from Control JAO and the ARCs QA2 Imaging and Spectroscopy D. Petry, ALMA data QA and the "scriptForPI", November 2014 Science Products 6 ALMA Data Analysis and Quality Assurance Teams QA team (JAO + ARCs) - led by Baltasar Vila Vilaro and Eric Villard + 3 Data Reduction Managers at the ARCs (D. Petry, M. Lacy, H. Shinaga) ca. 6 contributors + up to 40 analysts ... use CASA At the EU ARC, the analysis work is managed by ESO and distributed over ESO and the 8 ARC nodes. At any time, typically 10 analysts are at work. D. Petry, ALMA data QA and the "scriptForPI", November 2014 7 ALMA QA QA consists of 3 (+1) steps QA0: Checks at the time of data acquisition: Atmosphere, Antennas, Front-Ends, Connectivity, Back-Ends QA1: Monitor slowly varying array performance parameters: Arrays, Antennas, Calibration Sources completion of an Obs Unit Set member triggers QA2: Confirm that the Science Goal was met; request additional data and iterate if not (implies full calibration + generation of standard science products) after QA2 is passed (Science Goal met), the OUSmember data is delivered to the PI potential problem report by PI triggers QA3: re-reduction of the data, possibly replacement of products in the archive D. Petry, ALMA data QA and the "scriptForPI", November 2014 8 ALMA QA2 ● ● ● Ultimately, QA2 on all data from standard observing is supposed to be performed by the fully automated Science Pipeline . How can a fully automated pipeline be developed before the observatory is completely commissioned? Three modes of processing will coexist: 1) Semi-automatic processing using the Script Generator 2) Data calibration with automated pipeline + semi-automatic imaging 3) Fully automated pipeline including imaging ● Status Nov 2014: - so far most processing has been done in mode (1) - first version of mode (2) has successfully been commissioned and used for simple observation modes since October 2014 D. Petry, ALMA data QA and the "scriptForPI", November 2014 9 ALMA QA2 - Script Generator assisted analysis ● ● ● Before a fully automated Pipeline can be commissioned, the manual data analysis has to be fully understood! A large team of ALMA scientists worked together to develop the best practices to perform a robust standard calibration of ALMA Cycle 0 data. Following an idea by Eric Villard, these best practices were then slowly automated using a system of Python scripts called the “Script Generator” Raw data from archive ● ● creates draft Analyst edits calibrated data and QA2 science products The Script Generator evaluates a raw dataset (imported MS) and writes a draft for a CASA data reduction script (one each for calibration and imaging) The data analyst then edits the draft scripts where necessary and runs them (typically in small steps) iterating until confindent that best calibration achieved D. Petry, ALMA data QA and the "scriptForPI", November 2014 10 ALMA QA2 - The Science Pipeline - The ALMA Pipeline is based on CASA and distributed with CASA v4.2.2 - Processes interferometry and single-dish data - Processing is (meta‐)data driven - Pipeline team uses CASA tasks and tools to create heuristics tasks - meant to run in batch mode on HPC clusters - Pipeline commissioning and verification is performed by comparing the results with those obtained from the script-generator-assisted analysis. D. Petry, ALMA data QA and the "scriptForPI", November 2014 11 The scripts in the PI data package Structure of the data package directory tree project science group OUS group OUS member OUS README ........ read this first script ............... contains all calibration and imaging scripts calibration ....... calibration tables log ................... calibration and imaging log files qa .................... diagnostic summary and plots product ............ the FITS cubes of all images D. Petry, ALMA data QA and the "scriptForPI", November 2014 12 The scripts in the PI data package Structure of the data package directory tree project science group OUS group OUS member OUS README ........ read this first script .............. all calibration and imaging scripts calibration ....... calibration tables log ................... calibration and imaging log files qa .................... diagnostic summary and plots product ............ the FITS cubes of all images D. Petry, ALMA data QA and the "scriptForPI", November 2014 13 The scripts in the PI data package Structure of the data package directory tree project science group group OUS member OUS README ........ read this first script ............... all calibration and imaging scripts calibration ....... calibration tables log ................... calibration and imaging log files qa .................... diagnostic summary and plots product ............ the FITS cubes of all images raw .................. created when ASDMs are unpacked D. Petry, ALMA data QA and the "scriptForPI", November 2014 14 The scripts in the PI data package Structure of the data package directory tree project science group group OUS member OUS README ........ read this first script ............... all calibration and imaging scripts calibration ....... calibration tables log ................... calibration and imaging log files qa .................... diagnostic summary and plots product ............ the FITS cubes of all images raw .................. created when ASDMs are unpacked calibrated ....... created when scriptForPI.py is run D. Petry, ALMA data QA and the "scriptForPI", November 1111 11 The scripts in the PI data package (contained in the "script" directory") Filename Origin Purpose uid*.ms.scriptForCalibration.py script(optional) generator/ analyst calibrates a single EB (ASDM); results in one uid*.ms.split.cal PPR*.xml (optional) ALMA Pipeline controlled the run of the ALMA Pipeline; contains the list of ASDMs casa_piperestorescript.py (optional) ALMA Pipeline calibrates all pipeline-processed EBs; results in one uid*.ms.split.cal per EB casa_pipescript.py (optional) ALMA Pipeline enables user to rerun the Pipeline from scratch results in one uid*.ms.split.cal per EB scriptForFluxCalibration.py (optional) scriptgenerator/ analyst adjust the flux calibration of several EBs close in time which use same phase calibrator; prepare imaging; results is calibrated.ms scriptForImaging.py scriptgenerator/ analyst create all imaging products for the MOUS; results in (among others) *.fits files for all images scriptForPI.py added in packaging Perform all necessary steps to create all uid*.ms.split.cal MSs D. Petry, ALMA data QA and the "scriptForPI", November 2014 16 The ScriptForPI Standard Execution of the scriptForPI Make a work directory, e.g. "work": mkdir work Move your delivery package into "work" Move the *.asdm.sdm.tgz tarballs of the ASDMs into "work". Unpack the delivery package first: tar xvf 201*.1.*.S*.tar Then unpack the *.asdm.sdm.tgz . (*.asdm.sdm ASDMs fall into subdir "raw") for NAME in `ls *.asdm.sdm.tgz`; do tar xvzf $NAME; done cd into subdir "script" cd 201*.S/*/*/*/script start CASA with pipeline: casapy pipeline > execfile('scriptForPI.py') D. Petry, ALMA data QA and the "scriptForPI", November 2014 17 The ScriptForPI The SPACESAVING option CASA calibration will take large amounts of disk space. scriptForPI.py will try to estimate the required free diskspace and warn you if you don't have enough. You can reduce the need for space by deleting the intermediate MSs. Setting CASA variable SPACESAVING to a value > 0 will delete them for you. SPACESAVING = N execfile('scriptForPI.py') where N is an integer from 0 to 3 with the following meaning: SPACESAVING = 0 same as not set (all intermediate MSs are kept) = 1 do not keep intermediate MSs named *.ms.split = 2 do not keep intermediate MSs named *.ms and *.ms.split = 3 do not keep intermediate MSs named *.ms, *.ms.split, and *.ms.split.cal (if possible) D. Petry, ALMA data QA and the "scriptForPI", November 2014 18 The ScriptForPI Additional ASDMs without calibration info, missing ASDMs. If you have additional ASDMs for which there is no calibration info available, you will get a warning: WARNING: Inconsistency between ASDMs and calibration scripts Calibration info available for: uid... ASDMs available in directory raw: uid... Only the ASDMs for which there is calibration information will be calibrated If you have not downloaded and unpacked all ASDMs for which there is calibration info, you will get this message: ERROR: the following ASDMs have calibration information but are absent from directory "raw": uid... Will try to proceed with the rest ... D. Petry, ALMA data QA and the "scriptForPI", November 2014 19 The calibration and imaging scripts uid*.scriptForCalibration.py raw data(ASDM) (one for each EB) MS for one EB import a priori flagging (based on observatory information) (antenna position calibration, rarely necessary) Antenna pos. calibration antpos cal. table WVR correction phase cal. table (Water Vapour Radiometer based phase correction) Tsys calibration Tsys cal. table (atmospheric opacity correction) apply apriori cal. tables Detailed Flagging apriori calibrated MS for one EB named uid*.ms.split (if problems are found later, reiterate from here) D. Petry, ALMA data QA and the "scriptForPI", November 2 0 1 4 20 The calibration and imaging scripts uid*.scriptForCalibration.py set model of flux cal. flux cal. model (set model of flux calibration source from Buttler JPL Horizons or ALMA cal database) phaseselfcal of bandpass cal phase cal. table (fit timedep. of phase of bandpass cal.; = selfcal on bandpass) bandpass calibration bandpass cal. table gain calibration phase+amp cal. table scale gain cal. result phase+amp cal. table (fit freqdependence of gain of bandpass cal.) (fit timedependence of gain and phase of phase cal.) (scale the gain calibration result to the correct absolute flux) apply gain+band calib. to target calibrated MS for one EB named uid*.ms.split.cal D. Petry, ALMA data QA and the "scriptForPI", November 2014 21 The calibration and imaging scripts calibrated MS for one EB calibrated MS for one EB If necessary, flux equalisation and concatenation takes place: scriptForFluxCalibration.py calibrated MS scriptForImaging.py imaging image cube(s) numerical analysis ... image the science targets using clean continuum and (where requested) line images; apply continuum subtraction and/or selfcal where needed; apply primary beam correction. Result: FITS image cube(s) viewing, plotting All further analysis is left to the PI. (Although sometimes moment maps are created.) plots and numerical results D. Petry, ALMA data QA and the "scriptForPI", November 2014 22 QA2 Products Documentation https://almascience.eso.org/documents-and-tools/cycle-2/alma-qa2-products-v2.1 See also the recent SPIE paper by the DRMs: Petry, D. et al, 2014, "ALMA service data analysis and level 2 quality assurance with CASA" Proc. SPIE, Volume 9152, id. 91520J 6 pp (http://arxiv.org/abs/1407.7142) and references therein. D. Petry, ALMA data QA and the "scriptForPI", November 2014 23
© Copyright 2024 ExpyDoc