PEGA-Data-Processing-Tips-HOWTO John P. McFarland v1.1, 2002-06-09 v1.0, 2000-10-31 The following is a run-down of many of the ways I have made life easier for myself and for the rest of the PEGA team. If you have any questions, please feel free to ask. It has been updated to include recommended procedures necessary to keep the PEGA data archive as uniform and consistent as possible and to include scripts created since the last version. Most of the changes are in the first section labeled "Changes". ---------------------------------------------------------------------------- Table of Contents 1. Changes 1.1 Naming Conventions 1.2 Directory Names and Hierarchies 1.3 Procedures 1.3.1 Fits Headers 1.3.2 ccdfname 1.3.3 file.inf 1.3.4 imagetyp 1.3.5 wfits 2. Processing Tips 2.1 Automated wfits Renaming 2.2 Using hselect to pull out information from FITS headers 2.3 Customizing IRAF 2.4 Using the FITS Image Type for Processing 3. Pre- and Post-Processing Scripts and Configuration 3.1 Shell Configuration 3.2 The Scripts 3.2.1 fits 3.2.2 num 3.2.3 zero 3.2.4 chpega 3.2.5 pega-combine 3.2.6 pegasort 3.2.7 pegaplot 3.2.8 ccdfname 3.2.9 Post-calibration 3.3 Modification of Scripts 3.4 IDL Setup 3.4.1 Setup Scripts 3.4.2 Running the Setup Scripts From Startup Files 4. Data Locations 4.1 Data Stores 4.2 Other Media 5. Future Plans 1. Changes It has come to my attention that a several specifics in the processing pipeline need to be updated. Below follows a preliminary list of these updates. 1.1 Naming Conventions Many of the older files that have been processed or are otherwise manipulated again, and many of the newer files that are coming in are still in the older yymmdd.xxx format. This is not desirable. In order to keep the entire archive in a uniform state, they must be transformed into the standard ccyymmdd.xxx format. Don't worry, there is a short script (add20) detailed below which will do just this. It will add 20 (or 19 for files acquired in the last century) to the beginning of all filenames. Just "add20 directory" to a whole directory (see below for details). 1.2 Directory Names and Hierarchies 1) In addition to filenames, the directories containing these files should be brought up to date. Since there are far fewer directores in a run than images in a night, the "mv" command can be used to manually change the directory names, even if there are files contained within. simply type: ]$ mv source_dir dest_dir or ]$ mv yymmdd ccyymmdd in the case of PEGA data directories. 2) In addition to the naming conventions of the directories and files, a specific directory hierarchy needs to be used to streamline the calibration process. The general hierarchy is as follows: /nfs/compton2/pega_data/run_date/night/filter/object_name This hierarchy is used by the "mv-object" script (detailed below) to create the proper objects directory hierarchy: /nfs/compton2/pega_data/objects/object_name/filter/run_date/night After processing, the frames for a particular object are put into their correctly named (to avoid duplication in the objects directory) object directory. By running the mv-object script, the files in these object directories are properly moved to their respective directories in the objects hierarchy. A detailed directory listing for an example run follows: /nfs/compton2/pega_data/200206/ /nfs/compton2/pega_data/200206/20020601/ /nfs/compton2/pega_data/200206/20020601/zero/ /nfs/compton2/pega_data/200206/20020601/flat/ /nfs/compton2/pega_data/200206/20020601/r/ /nfs/compton2/pega_data/200206/20020601/r/pks1510-089/ /nfs/compton2/pega_data/200206/20020601/r/rawdata/ By running: ]$ mv-object pks1510-089 200206 from /nfs/compton2/pega_data/, the calibrated frames for pks1510-089 will be moved to: /nfs/compton2/pega_data/objects/pks1510-089/r/200206/20020601/ 1.3 Procedures Below are some revised procedures which will help with data analysis and post-processing. 1.3.1 Fits Headers You must make sure that the FITS header in the raw data is as accurate as possible. This includes the file name, filters, and other keywords. 1) The standard we use for the file name is CCDFNAME. to ensure that it is contained within the header, use the "ccdfname" script detailed below. 2) There are 2 filter name keywords. The current convention is that the FILTERS keyword describes a proprietary code, usually numerical, describing the filter wheel position of the filter in use. The FILTNAME keyword is more useful since it contains the letter code for a given filter. These header keywords will hopefully be included in the header. If not, they should be added. NOTE: the definitions of the keywords is subject to change, but use this convention for now. 1.3.2 ccdfname The ccdfname script is used to help add the CCDFNAME keyword to the headers. It is a BASH script which creates an IRAF script which must currently be run in IRAF on the FITS images. The FITS images must have a .fits extension and IRAF must be setup to recognize FITS images as valid image types for this to work properly. For details, see ccdfname description below. Hopefully soon, the ccdfname script will be updated to fully- fledged IRAF script so that adding/correcting the CCDFNAME keyword will be a much simpler process. 1.3.3 file.inf To allow the possibility of indexing all observations for immediate and future use, the file.inf file should be created. Once the headers are as complete and correct as possible (e.g. CCDFNAME in ccyymmdd.xxx format, etc), this file should be made in IRAF with the following command: cl> hselect *fits ccdfname,object,filters,filtname yes > file.inf This is an abbreviated form of the command which can be found in /nfs/compton2/pega_data/file.inf.cmd 1.3.4 imagetyp Contrary to what exists in most headers, the imagetyp keywords should be lower case. They should be one of the following: dark flat object zero The ccdtype parameter in IRAF parameter files and the IMAGETYP keyword in the image header should always match one of these. 1.3.5 wfits In order to combat "data folding" in which high values (over 32768) of a 16-bit FITS image are folded to negative values (or some similar phenomenon), autoscaling can be used in the wfits task. This scaling is sometimes necessary since the data type used to process the images is capable of holding more information than a 16-bit FITS image. This scaling strives to keep the largest dynamic range in the data while squeezing it into a relatively small 16-bit package. If it is necessary to enable autoscaling, set both the scale and autoscale parameters to yes in the wfits parameter file. 2. Processing Tips 2.1 Automating wfits Renaming 1) After the images are rfits'ed (rfits filename.ext "" ob), create a list of the original FITS images and .imh files. E.g.: cl> files 20001027.* > proc cl> files ob*.imh > ls1 This puts a list of the original FITS files in the file "proc" and the newly created .imh files into the file "ls1". 2) Modify the filenames in "proc" to the processed form in the following way. In vi (I know...), enter the command ":%s/27./27p./g" without the quotes to change the above examples from the form [cc]yymmdd.xxx to [cc]yymmddp.xxx (note the extra "p"). The general form is: :%s/old_string/new_string/g This can also be done by hand in the text editor of your choice. 3) To rewrite the processed .imh files into the new file names, simply use the created lists with the wfits IRAF task: cl> wfits @ls1 @proc These lists can be used in other IRAF tasks, such as "imarith". 2.2 Using hselect to Pull Out Information From FITS Headers This couldn't be simpler. Just simply figure out which FITS keyword(s) you want listed and include them in the following command: cl> hselect filespec comma,separated,keywords yes The filespec can be one file, a @list, or a template (*.*). This list will print to the standard output. It can also be piped into a file: cl> hselect filespec comma,separated,keywords yes > file.inf will pipe the output into the file "file.inf". 2.3 Customizing IRAF These are some modifications to the standard IRAF setup that may make life a little easier. 1) In your login.cl, locate the line: #set imtype = "imh" You can add a comma delimited list to this value which will indicate to IRAF what are valid image file extensions. Whichever is the first extension will be used when an image file is rfits'ed. set imtype = "imh,fits,fit" will cause IRAF to recognize *.imh, *.fits, and *.fit as valid image files while defaulting to the imh/pix file division. Note the removal of the "#" symbol. This symbol is used to "comment out" a line in the login.cl file. Your line may already be uncommented. This can be very handy if you need to quickly view or change a header keyword, for example. All you need to do in this case is "fits add" a set of files, work with them in IRAF, then "fits rem" them. It saves on both disk space and file I/O time. 2) Since we changed over to the disk1, disk2, etc. hierarchy, you have no doubt been annoyed with the extra /nfs and disk?/ needed to access the directories. Here is a simple solution: environmental variables. set bolton1 = "/nfs/bolton1/pega_data/" set bolton2 = "/nfs/bolton2/pega_data/" set compton2 = "/nfs/compton2/pega_data/" . . . . . . . . . These lines in the "set" section of your login.cl will allow you to reference the pega_data directories far more simply, but there are some caveats. This is not a link and the variable will be expanded to its value (which means the "cd bolton" goes to the directory /nfs/bolton1/pega_data/. This is the good part. The bad part is that if you want to add more to this path, you have to add a $ to cause it to expand properly ("cd bolton$1308" changes directory to /nfs/bolton1/pega_data/1308). I also noticed that the command "ls" doesn't work this way, but "cd" does. Go figure. If the plain variable doesn't work, try the $ expansion like above or below. This can also be expanded to include bash and csh (by adding it to your .bash_profile or .cshrc file respectively). While in these shells, you need to *prefix* the variable with a $ like this: cd $compton to go to Compton's pega_data directory. Just look in my .bash_profile or .cshrc for examples. And feel free to look into my login.cl file for IRAF stuff too (i.e. more ~jpm/iraf/login.cl). 2.4 Using the FITS Image Type for Processing Reading in images via rfits has the advantage that you at no time modify the original images. But, when your images are 1024x1024 pixels or larger, the rfits process tends to be slow and can use up a tremendous ammount of room. For example, I just reduced about a hundred 2048x2048 images and the space used by the all the images involved after the new ones were rfits'ed was about 3GB! I could have saved time and reduced disk usage by at least a third had I kept them in FITS format. The point of this section is just that: Save as much time and disk space while still maintaining data integrity. The trick is to use @lists in zerocombine, flatcombine, and ccdproc. It's quite simple actually. Procedure still under testing . . . 3. Pre- and Post-Processing Scripts and Configuration 3.1 Shell Configuration I decided to compile all my useful little scripts into one location so that we all can use them. They are located in /usr/local/PEGA/bin. For convenience, you can add this to your path as follows: csh: add this after your PATH statement (if it exists) in your .cshrc: setenv PATH "${PATH}:/usr/local/PEGA/bin" Or add /usr/local/PEGA/bin directly to your PATH statement. bash: add this after your PATH statement (if it exists) in your .profile: PATH=$PATH:/usr/local/PEGA/bin Or add /usr/local/PEGA/bin directly to your PATH statement. 3.2 The Scripts Here is a list of the scripts and what they do: 3.2.1 fits -- fits usage: fits {add, rem, del} [dir_name or dir_list] description: Adds or removes/deletes .fits extensions to/from in one self-contained, easy-to-use script that handles multiple directories as easily as single ones. examples: ]$ fits add will add fits extensions to all (*.???) files in the current directory. ]$ fits rem will remove fits extensions on all (*.fits) files in the current directory. ]$ fits del 20001027 will remove fits extensions for all files (*.fits) in the directory "20001027". "del" is a synonym for "rem". ]$ fits add * will add fits extensions to all (*.???) files in all directories matching "*". It should not change files in current directory. 3.2.2 num -- num-add usage: num-add dir_name description: Replaces the non-contiguous numerical extension with a contiguous one which can then be reversed with num-rem. Useful in CCDPhot when the frame numbers jump around. Changes processed images only (i.e. 20000717p.001). -- num-rem usage: num-rem dir_name description: Sister script to num-add. num-add creates a hidden file called .trans.tbl which stores the original extensions. num-rem restores those original extensions. -- num-rep usage: num-rep dir_name description: Irrevocably replaces the non-contiguous image number extensions with a contiguous set. Changes processed images only (i.e. 20000717p.001). 3.2.3 zero -- zero-rem usage: to be run in the intended directory description: Will convert those pesky 4-number extensioned files to 3-number extensions. i.e. 20000505.0001 -> 20000505.001 -- zero-add usage: to be run in the intended directory description: If, for some bizarre reason, you want to add that zero (for those times you have more than a thousand data files in one night ;), this converts them the other way and is included for completeness. i.e. 20000505.001 -> 20000505.0001 3.2.4 chpega -- chpega usage: chpega filespec description: Changes group ownership and permissions recursively on the files/directories specified so that all PEGA groups members can have full access to the files. It is a good idea to do this periodically on any directory tree you have added files to. 3.2.5 pega-combine -- pega-combine usage: pega-combine filespec_of_.log_files description: This script combines multiple .log files for use in creating an extended light curve. Concatenation of the files is insufficient since there is a chance the object name is different from file to file. This script overcomes that limitation. NOTE: changes in naming of the check stars is not addressed 3.2.6 pegasort -- pegasort usage: pegasort ccdphot_logfile_without_extension [no_header] description: This script takes a CCDPhot output file and converts it to something more usable for the PEGA team. The logfile name (without .log extension) is required, but the no_header argument is optional. Read the script for more information. NOTE: pegasort now handles a greater range of errors than previously, however, out of order check stars are not handled properly 3.2.7 pegaplot -- pegaplot.gp I have completed my plotting script for plotting PEGA-type data. It is in /usr/local/PEGA/gnuplot/pegaplot.gp. Just copy this file to your favorite location and it'll do the rest . . . (yeah, right!). Just copy it over and read through it for instructions on what needs to be modified and how. All you will need is this and the data file to create good plots for transparencies and plots for papers in TeX and LaTeX. If you have problems that aren't addressed in the script itself, check the documentation then check with me. -- pegaplot usage: pegaplot pegasorted_.txt_file description: Unlike pegaplot.gp, pegaplot (the BASH script) analyzes the input file and determines ranges for both Julian Date and magnitude. It then plots object-checkN and checkN-checkM. The standard plotting mode is the native X format of GNUPlot. This (and many other behaviors) can be modified through the use of command-line options. Type "pegaplot -h" for details. NOTE: as the help screen says, any options to pegaplot MUST come before the input file, and the input file MUST ALWAYS be LAST on the command-line 3.2.8 ccdfname -- ccdfname (the script, not the IRAF task) usage: ccdfname [dir_name] description: Will create an IRAF script to change the CCDFNAME keyword values in all the files in dir_name. This is desirable when the keyword value is of the form image0001.imh, includes a path, or is otherwise not in the standard form of [cc]yymmdd.xxx restrictions: This script will only work properly with named FITS images and when the IRAF imtype includes FITS images. NOTE: THIS CHANGES THE ORIGINAL FITS FILE. MAKE SURE A BACKUP OF THE DATA EXISTS BEFORE RUNNING. Also, this is one of the least straight-forward scripts I have written due to its multi-part nature. This was necessary since IRAF scripting is currently in an immature stage. USE THIS ONLY IF YOU REALLY NEED TO. YOU HAVE BEEN WARNED. procedure: 1) Add the fits extensions to the files in question (if necessary). 2) Run the ccdfname script to produce the output file "ccdfname". Within IRAF, make sure to prepend it with an exclamation point "!" so IRAF knows that it is not the task "ccdfname". 3) Run the just-created IRAF script "ccdfname" from within IRAF. Make sure you are in the correct directory and run the script like so: cl> cl < ccdfname where "cl> " is the IRAF command prompt. 4) Delete the file "ccdfname" (if necessary). 3.2.9 Post-calibration -- sort-object usage: sort-object file.inf_name dir_name [file.inf_location] description: From within the "filter" directory, will move processed files to their respective "object" directory. This assumes that the object directory already exists. Also, the file.inf file MUST exist. Its usual location is the previous directory (the "night" directory), but its location can be specified on the command line. The file.inf_name is the name of the object in file.inf, dir_name is the object-named directory, and file.inf_location is the optional location of file.inf. If file.inf is in the current directory, the command-line for moving the image files for 1510-089 to its proper directory named ./pks1510-089/ is: ]$ sort-object 1510-089 pks1510-089 ./file.inf -- mv-object usage: mv-object object run description: Moves processed files from where they were calibrated: ./run_date/night/filter/object_name/ to ./objects/object_name/filter/run_date/night/ while sitting in the base directory of the calibration drive (currently /nfs/compton2/pega_data/). As an example, running: ..._data]$ mv-object pks1510-089 200206 would move all the processed files in: /nfs/compton2/pega_data/200206/20020601/r/pks1510-089/ to: /nfs/compton2/pega_data/objects/pks1510-089/r/200206/20020601/ provided data was taken for PKS 1510-089 on onl the onew night. If there were multiple nights, they would also be sorted properly. 3.3 Modification of Scripts The above scripts are subject to change from time to time. Some of the more immediate changes will be removal of the "fits-add" and "fits-rem" scripts as the "fits" script does all they do. Also, the same will be happening to the num* and zero* scripts eventually. These changes will be noted here, so if you have any trouble with the scripts, check here first to see if they have been modified. 3.4 IDL Setup 3.4.1 Setup Scripts There exist a pair of scripts in the global IDL directory for setting up IDL. They are: /usr/local/etc/idl_setup /usr/local/etc/idl_setup.ksh "idl_setup" is specifically for csh and "idl_setup.ksh" is for bash. In order to use IDL, you will need to source one of those files depending on which shell you are currently in (type "echo $SHELL" to see which). bash: ...]$ source /usr/local/etc/idl_setup.ksh or ...]$ . /usr/local/etc/idl_setup.ksh csh: ~]$ source /usr/local/etc/idl_setup NOTE: If you will be using astromit under IDL, you will need to use the custom PEGA IDL setup script: /usr/local/PEGA/bin/idl_setup.ksh This is the preferred setup script to use under bash as it corrects a file duplication that will cause complications for astromit. REMEMBER: CCDPHOT needs to be run under csh while ASTROMIT needs to be run under bash with the custom setup file. 3.4.2 Running the Setup Scripts From Startup Files The above scripts can be sourced from you startup files so that IDL is available upon login. bash: Add the following to the .bashrc file located in your home directory: if [ -f /usr/local/PEGA/bin/idl_setup.ksh ] ; then . /usr/local/PEGA/bin/idl_setup.ksh fi csh: Add the following to the .cshrc file located in your home directory: if ( -f /usr/local/etc/idl_setup ) then source /usr/local/etc/idl_setup endif Once this is done, IDL will be accessible whenever you login. 4. Data Locations From time to time, PEGA Data locations are added or removed. Below is a current comprehensive list of these locations. 4.1 Data Stores /nfs/bokN/pega_data/ (where N=1,2,3,4) General scratch space and temporary storage (disk2 currently contains the cdtemp directory). /nfs/boltonN/pega_data/ (where N=1,2) General storage and work area. Most data being processed by object is here. /nfs/compton1/pega_data/ Raw data repository after processing on disk2 before deletion. Eventual location of the cdtemp directory. /nfs/compton2/pega_data/ Processed data repository. Most data from the last 10+ years is here and is currently undergoing processing. This will eventually allow simple reduction of the data by object over the long-term. /usr/local/PEGA/data/ A common location for all the above directories. The syntax is similar to the above directories, but not the same. The directory for each is the machine name followed immediately by the disk number (e.g. /usr/local/PEGA/data/compton2/ is /compton/disk2/pega_data/) 4.2 Other Media The data also exists in other digital forms from 9-track to 8mm EXA-Byte tapes, but is not all available. The primary backup for the raw data is on CD-ROM in my office (currently 712 1PPS). 5. Future Plans This HOWTO contains only information that I have compiled myself, but will eventually be expanded into a full-sized HOWTO by the addition of information from Elizabeth Ferrara. If anybody comes up with any good tips or important ideas, don't hesitate to bring them to my attention. Added 20020609 The reason for the recent changes is to try to keep all the data organized in a consistent fashion in preparation for a merging of many of the scripts mentioned here. I plan to construct a pipeline which will speed up the tedious aspects of the reduction process without losing sight of individual step necessary (i.e. zeroes, flats, etc.). This optimization will allow me to also streamline the analysis aspect of the reduction process and eventually create a web-based interface to access all the data. This is, of course, extremely ambitious, but the easier it is the organize the current database into something more useful, the easier it will be to realize the final goal.