PEGA-Data-Processing-Tips-HOWTO
  John P. McFarland <jpm@chara.gsu.edu>
  v1.1, 2002-06-09
  v1.0, 2000-10-31

  The following is a run-down of many of the ways I have made life easier for 
  myself and for the rest of the PEGA team.  If you have any questions, please
  feel free to ask.

  It has been updated to include recommended procedures necessary to keep the
  PEGA data archive as uniform and consistent as possible and to include 
  scripts created since the last version.  Most of the changes are in the 
  first section labeled "Changes".
  ----------------------------------------------------------------------------


  Table of Contents

  1. Changes
     1.1 Naming Conventions
     1.2 Directory Names and Hierarchies
     1.3 Procedures
         1.3.1 Fits Headers
         1.3.2 ccdfname
         1.3.3 file.inf
         1.3.4 imagetyp
         1.3.5 wfits

  2. Processing Tips

     2.1 Automated wfits Renaming
     2.2 Using hselect to pull out information from FITS headers
     2.3 Customizing IRAF
     2.4 Using the FITS Image Type for Processing

  3. Pre- and Post-Processing Scripts and Configuration

     3.1 Shell Configuration
     3.2 The Scripts
         3.2.1 fits
         3.2.2 num
         3.2.3 zero
         3.2.4 chpega
         3.2.5 pega-combine
         3.2.6 pegasort
         3.2.7 pegaplot
         3.2.8 ccdfname
         3.2.9 Post-calibration 
     3.3 Modification of Scripts
     3.4 IDL Setup
         3.4.1 Setup Scripts
	 3.4.2 Running the Setup Scripts From Startup Files

  4. Data Locations
     4.1 Data Stores
     4.2 Other Media

  5. Future Plans


1. Changes

   It has come to my attention that a several specifics in the processing 
   pipeline need to be updated.  Below follows a preliminary list of these 
   updates.

   1.1 Naming Conventions

       Many of the older files that have been processed or are otherwise 
       manipulated again, and many of the newer files that are coming in are 
       still in the older yymmdd.xxx format.  This is not desirable.  In order
       to keep the entire archive in a uniform state, they must be transformed
       into the standard ccyymmdd.xxx format.  

       Don't worry, there is a short script (add20) detailed below which will 
       do just this.  It will add 20 (or 19 for files acquired in the last 
       century) to the beginning of all filenames.  Just "add20 directory" to 
       a whole directory (see below for details).

   1.2 Directory Names and Hierarchies

       1) In addition to filenames, the directories containing these files 
          should be brought up to date.  

          Since there are far fewer directores in a run than images in a 
          night, the "mv" command can be used to manually change the directory
          names, even if there are files contained within.  simply type:

          ]$ mv source_dir dest_dir

          or 

          ]$ mv yymmdd ccyymmdd

          in the case of PEGA data directories.

       2) In addition to the naming conventions of the directories and files, 
          a specific directory hierarchy needs to be used to streamline the 
          calibration process.  The general hierarchy is as follows:

          /nfs/compton2/pega_data/run_date/night/filter/object_name

          This hierarchy is used by the "mv-object" script (detailed below) to
          create the proper objects directory hierarchy:

          /nfs/compton2/pega_data/objects/object_name/filter/run_date/night

          After processing, the frames for a particular object are put into 
          their correctly named (to avoid duplication in the objects 
          directory) object directory.  By running the mv-object script, the 
          files in these object directories are properly moved to their 
          respective directories in the objects hierarchy.

          A detailed directory listing for an example run follows:

          /nfs/compton2/pega_data/200206/
          /nfs/compton2/pega_data/200206/20020601/
          /nfs/compton2/pega_data/200206/20020601/zero/
          /nfs/compton2/pega_data/200206/20020601/flat/
          /nfs/compton2/pega_data/200206/20020601/r/
          /nfs/compton2/pega_data/200206/20020601/r/pks1510-089/
          /nfs/compton2/pega_data/200206/20020601/r/rawdata/

          By running:

          ]$ mv-object pks1510-089 200206

          from /nfs/compton2/pega_data/, the calibrated frames for pks1510-089
          will be moved to:

          /nfs/compton2/pega_data/objects/pks1510-089/r/200206/20020601/


   1.3 Procedures

       Below are some revised procedures which will help with data analysis 
       and post-processing.

       1.3.1 Fits Headers

             You must make sure that the FITS header in the raw data is as 
             accurate as possible.  This includes the file name, filters, and 
             other keywords.

             1) The standard we use for the file name is CCDFNAME.  to ensure 
                that it is contained within the header, use the "ccdfname" 
                script detailed below.

             2) There are 2 filter name keywords.  The current convention is 
                that the FILTERS keyword describes a proprietary code, usually
                numerical, describing the filter wheel position of the filter 
                in use.  The FILTNAME keyword is more useful since it contains
                the letter code for a given filter. These header keywords will
                hopefully be included in the header.  If not, they should be 
                added.

                NOTE: the definitions of the keywords is subject to change, 
                      but use this convention for now.

       1.3.2 ccdfname

             The ccdfname script is used to help add the CCDFNAME keyword to 
             the headers.  It is a BASH script which creates an IRAF script 
             which must currently be run in IRAF on the FITS images.  The FITS 
             images must have a .fits extension and IRAF must be setup to 
             recognize FITS images as valid image types for this to work 
             properly.  For details, see ccdfname description below.

             Hopefully soon, the ccdfname script will be updated to fully-
             fledged IRAF script so that adding/correcting the CCDFNAME 
             keyword will be a much simpler process.

       1.3.3 file.inf

             To allow the possibility of indexing all observations for 
             immediate and future use, the file.inf file should be created.  
             Once the headers are as complete and correct as possible (e.g. 
             CCDFNAME in ccyymmdd.xxx format, etc), this file should be made 
             in IRAF with the following command:

             cl> hselect *fits ccdfname,object,filters,filtname yes > file.inf

             This is an abbreviated form of the command which can be found in 
             /nfs/compton2/pega_data/file.inf.cmd

       1.3.4 imagetyp

             Contrary to what exists in most headers, the imagetyp keywords 
             should be lower case.  They should be one of the following:

             dark
             flat
             object
             zero

             The ccdtype parameter in IRAF parameter files and the IMAGETYP 
             keyword in the image header should always match one of these.

       1.3.5 wfits

             In order to combat "data folding" in which high values (over 
             32768) of a 16-bit FITS image are folded to negative values 
             (or some similar phenomenon), autoscaling can be used in the 
             wfits task.  This scaling is sometimes necessary since the 
             data type used to process the images is capable of holding 
             more information than a 16-bit FITS image.  This scaling 
             strives to keep the largest dynamic range in the data while 
             squeezing it into a relatively small 16-bit package.

             If it is necessary to enable autoscaling, set both the scale 
             and autoscale parameters to yes in the wfits parameter file.
             

2. Processing Tips

   2.1 Automating wfits Renaming

       1) After the images are rfits'ed (rfits filename.ext "" ob), create a 
          list of the original FITS images and .imh files.  E.g.:

            cl> files 20001027.* > proc
            cl> files ob*.imh > ls1

          This puts a list of the original FITS files in the file "proc" and 
          the newly created .imh files into the file "ls1".

       2) Modify the filenames in "proc" to the processed form in the 
          following way.  In vi (I know...), enter the command 
          ":%s/27./27p./g" without the quotes to change the above examples 
          from the form [cc]yymmdd.xxx to [cc]yymmddp.xxx (note the extra 
          "p").  The general form is: 

            :%s/old_string/new_string/g 
 
          This can also be done by hand in the text editor of your choice.

       3) To rewrite the processed .imh files into the new file names, simply 
          use the created lists with the wfits IRAF task:

            cl> wfits @ls1 @proc

          These lists can be used in other IRAF tasks, such as "imarith".


   2.2 Using hselect to Pull Out Information From FITS Headers

       This couldn't be simpler.

       Just simply figure out which FITS keyword(s) you want listed and 
       include them in the following command:

         cl> hselect filespec comma,separated,keywords yes

       The filespec can be one file, a @list, or a template (*.*).  This list 
       will print to the standard output.  It can also be piped into a file:

         cl> hselect filespec comma,separated,keywords yes > file.inf

       will pipe the output into the file "file.inf".


   2.3 Customizing IRAF

       These are some modifications to the standard IRAF setup that may make 
       life a little easier.

       1) In your login.cl, locate the line:

            #set     imtype          = "imh"

          You can add a comma delimited list to this value which will
          indicate to IRAF what are valid image file extensions.  Whichever
          is the first extension will be used when an image file is
          rfits'ed.

            set     imtype          = "imh,fits,fit"

          will cause IRAF to recognize *.imh, *.fits, and *.fit as valid
          image files while defaulting to the imh/pix file division.  Note the
	  removal of the "#" symbol.  This symbol is used to "comment out" a 
	  line in the login.cl file.  Your line may already be uncommented.

          This can be very handy if you need to quickly view or change a
          header keyword, for example.  All you need to do in this case is
          "fits add" a set of files, work with them in IRAF, then "fits rem"
          them.  It saves on both disk space and file I/O time.

       2) Since we changed over to the disk1, disk2, etc. hierarchy, you
          have no doubt been annoyed with the extra /nfs and disk?/ needed
          to access the directories.  Here is a simple solution:
          environmental variables.

            set     bolton1         = "/nfs/bolton1/pega_data/"
            set     bolton2         = "/nfs/bolton2/pega_data/"
            set     compton2        = "/nfs/compton2/pega_data/"
             .         .                             .
             .         .                             .
             .         .                             .

          These lines in the "set" section of your login.cl will allow you to 
          reference the pega_data directories far more simply, but there are 
          some caveats.  This is not a link and the variable will be expanded 
          to its value (which means the "cd bolton" goes to the directory 
          /nfs/bolton1/pega_data/.  This is the good part.  The bad part is 
          that if you want to add more to this path, you have to add a $ to 
          cause it to expand properly ("cd bolton$1308" changes directory to 
          /nfs/bolton1/pega_data/1308).

          I also noticed that the command "ls" doesn't work this way, but
          "cd" does.  Go figure.  If the plain variable doesn't work, try
          the $ expansion like above or below.

          This can also be expanded to include bash and csh (by adding it to
          your .bash_profile or .cshrc file respectively).  While in these
          shells, you need to *prefix* the variable with a $ like this:
          cd $compton to go to Compton's pega_data directory.

          Just look in my .bash_profile or .cshrc for examples.  And feel
          free to look into my login.cl file for IRAF stuff too
          (i.e. more ~jpm/iraf/login.cl).


   2.4 Using the FITS Image Type for Processing

       Reading in images via rfits has the advantage that you at no time 
       modify the original images.  But, when your images are 1024x1024 pixels
       or larger, the rfits process tends to be slow and can use up a 
       tremendous ammount of room.  For example, I just reduced about a 
       hundred 2048x2048 images and the space used by the all the images 
       involved after the new ones were rfits'ed was about 3GB!  I could have 
       saved time and reduced disk usage by at least a third had I kept them 
       in FITS format.

       The point of this section is just that: Save as much time and disk 
       space while still maintaining data integrity.  The trick is to use 
       @lists in zerocombine, flatcombine, and ccdproc.  It's quite simple 
       actually.

       Procedure still under testing . . . 


3. Pre- and Post-Processing Scripts and Configuration

   3.1 Shell Configuration

       I decided to compile all my useful little scripts into one location so
       that we all can use them.  They are located in /usr/local/PEGA/bin.  
       For convenience, you can add this to your path as follows:

       csh:

       add this after your PATH statement (if it exists) in your .cshrc:

         setenv PATH "${PATH}:/usr/local/PEGA/bin"

       Or add /usr/local/PEGA/bin directly to your PATH statement.

       bash:

       add this after your PATH statement (if it exists) in your .profile:

         PATH=$PATH:/usr/local/PEGA/bin

       Or add /usr/local/PEGA/bin directly to your PATH statement.


   3.2 The Scripts

       Here is a list of the scripts and what they do:

       3.2.1 fits

             -- fits
             usage: fits {add, rem, del} [dir_name or dir_list]
             description: Adds or removes/deletes .fits extensions to/from in 
             one self-contained, easy-to-use script that handles multiple 
             directories as easily as single ones.

             examples:

               ]$ fits add
               will add fits extensions to all (*.???) files in the current 
               directory.

               ]$ fits rem
               will remove fits extensions on all (*.fits) files in the 
               current directory.

               ]$ fits del 20001027
               will remove fits extensions for all files (*.fits) in the 
               directory "20001027".  "del" is a synonym for "rem".

               ]$ fits add *
               will add fits extensions to all (*.???) files in all 
               directories matching "*".  It should not change files in 
               current directory.


       3.2.2 num

             -- num-add
             usage: num-add dir_name
             description: Replaces the non-contiguous numerical extension with
             a contiguous one which can then be reversed with num-rem.  Useful
             in CCDPhot when the frame numbers jump around.  Changes processed
             images only (i.e. 20000717p.001).

             -- num-rem
             usage: num-rem dir_name
             description: Sister script to num-add.  num-add creates a hidden 
             file called .trans.tbl which stores the original extensions.  
             num-rem restores those original extensions.

             -- num-rep
             usage: num-rep dir_name
             description: Irrevocably replaces the non-contiguous image number
             extensions with a contiguous set.  Changes processed images only 
             (i.e. 20000717p.001).


       3.2.3 zero

             -- zero-rem
             usage: to be run in the intended directory
             description: Will convert those pesky 4-number extensioned files 
             to 3-number extensions.
             i.e. 20000505.0001 -> 20000505.001

             -- zero-add
             usage: to be run in the intended directory
             description: If, for some bizarre reason, you want to add that 
             zero (for those times you have more than a thousand data files in
             one night ;), this converts them the other way and is included 
             for completeness.
             i.e. 20000505.001 -> 20000505.0001


       3.2.4 chpega

             -- chpega
             usage: chpega filespec
             description: Changes group ownership and permissions recursively 
             on the files/directories specified so that all PEGA groups 
             members can have full access to the files.  It is a good idea to
             do this periodically on any directory tree you have added files 
             to.


       3.2.5 pega-combine

             -- pega-combine
	     usage: pega-combine filespec_of_.log_files
	     description: This script combines multiple .log files for use in 
	     creating an extended light curve.  Concatenation of the files is 
	     insufficient since there is a chance the object name is different
	     from file to file.  This script overcomes that limitation.
	     NOTE: changes in naming of the check stars is not addressed
	     
       
       3.2.6 pegasort

             -- pegasort
             usage: pegasort ccdphot_logfile_without_extension [no_header]
             description: This script takes a CCDPhot output file and converts
             it to something more usable for the PEGA team.  The logfile name 
             (without .log extension) is required, but the no_header argument 
             is optional.  Read the script for more information.
	     NOTE: pegasort now handles a greater range of errors than 
	           previously, however, out of order check stars are not 
		   handled properly


       3.2.7 pegaplot

             -- pegaplot.gp
             I have completed my plotting script for plotting PEGA-type data. 
             It is in /usr/local/PEGA/gnuplot/pegaplot.gp.  Just copy this 
             file to your favorite location and it'll do the rest . . . (yeah,
             right!).  Just copy it over and read through it for instructions
             on what needs to be modified and how.

             All you will need is this and the data file to create good plots 
             for transparencies and plots for papers in TeX and LaTeX.

             If you have problems that aren't addressed in the script itself, 
             check the documentation then check with me.

	     -- pegaplot
	     usage: pegaplot pegasorted_.txt_file
	     description: Unlike pegaplot.gp, pegaplot (the BASH script) 
	     analyzes the input file and determines ranges for both Julian
	     Date and magnitude.  It then plots object-checkN and 
             checkN-checkM.  The standard plotting mode is the native X format
	     of GNUPlot.  This (and many other behaviors) can be modified 
	     through the use of command-line options.  Type "pegaplot -h" for 
	     details.
	     NOTE: as the help screen says, any options to pegaplot MUST come 
	           before the input file, and the input file MUST ALWAYS be 
		   LAST on the command-line


       3.2.8 ccdfname

             -- ccdfname (the script, not the IRAF task)
             usage: ccdfname [dir_name]
             description: Will create an IRAF script to change the CCDFNAME 
             keyword values in all the files in dir_name.  This is desirable 
             when the keyword value is of the form image0001.imh, includes a 
             path, or is otherwise not in the standard form of [cc]yymmdd.xxx
             restrictions: This script will only work properly with named FITS
             images and when the IRAF imtype includes FITS images.

             NOTE: THIS CHANGES THE ORIGINAL FITS FILE.  MAKE SURE A BACKUP OF
                   THE DATA EXISTS BEFORE RUNNING.

             Also, this is one of the least straight-forward scripts I have 
             written due to its multi-part nature.  This was necessary since 
             IRAF scripting is currently in an immature stage.  

             USE THIS ONLY IF YOU REALLY NEED TO.  YOU HAVE BEEN WARNED.

             procedure: 
               1) Add the fits extensions to the files in question (if 
                  necessary).
               2) Run the ccdfname script to produce the output file 
                  "ccdfname".  Within IRAF, make sure to prepend it with an 
                  exclamation point "!" so IRAF knows that it is not the task 
                  "ccdfname".
               3) Run the just-created IRAF script "ccdfname" from within 
                  IRAF.  Make sure you are in the correct directory and run 
                  the script like so:

                    cl> cl < ccdfname

                  where "cl> " is the IRAF command prompt.
               4) Delete the file "ccdfname" (if necessary).


       3.2.9 Post-calibration

             -- sort-object
             usage: sort-object file.inf_name dir_name [file.inf_location]
             description: From within the "filter" directory, will move 
             processed files to their respective "object" directory.  This 
             assumes that the object directory already exists.  Also, the 
             file.inf file MUST exist.  Its usual location is the previous 
             directory (the "night" directory), but its location can be 
             specified on the command line.

             The file.inf_name is the name of the object in file.inf, 
             dir_name is the object-named directory, and file.inf_location 
             is the optional location of file.inf.  

             If file.inf is in the current directory, the command-line for 
             moving the image files for 1510-089 to its proper directory 
             named ./pks1510-089/ is:

             ]$ sort-object 1510-089 pks1510-089 ./file.inf

             -- mv-object
             usage: mv-object object run
             description: Moves processed files from where they were 
             calibrated:

             ./run_date/night/filter/object_name/

             to 

             ./objects/object_name/filter/run_date/night/

             while sitting in the base directory of the calibration drive 
             (currently /nfs/compton2/pega_data/).

             As an example, running:

             ..._data]$ mv-object pks1510-089 200206

             would move all the processed files in:

             /nfs/compton2/pega_data/200206/20020601/r/pks1510-089/

             to:

             /nfs/compton2/pega_data/objects/pks1510-089/r/200206/20020601/

             provided data was taken for PKS 1510-089 on onl the onew night.
             If there were multiple nights, they would also be sorted 
             properly.


   3.3 Modification of Scripts

      The above scripts are subject to change from time to time.  

      Some of the more immediate changes will be removal of the "fits-add" 
      and "fits-rem" scripts as the "fits" script does all they do.  Also, 
      the same will be happening to the num* and zero* scripts eventually.
  
      These changes will be noted here, so if you have any trouble with the 
      scripts, check here first to see if they have been modified.

   
   3.4  IDL Setup

       3.4.1 Setup Scripts

             There exist a pair of scripts in the global IDL directory for 
	     setting up IDL.  They are:

             /usr/local/etc/idl_setup
	     /usr/local/etc/idl_setup.ksh

             "idl_setup" is specifically for csh and "idl_setup.ksh" is for 
	     bash.  In order to use IDL, you will need to source one of those 
	     files depending on which shell you are currently in 
	     (type "echo $SHELL" to see which).

	     bash:

	     ...]$ source /usr/local/etc/idl_setup.ksh

	     or

	     ...]$ . /usr/local/etc/idl_setup.ksh

	     csh:

	     ~]$ source /usr/local/etc/idl_setup

	     NOTE: If you will be using astromit under IDL, you will need to 
	     use the custom PEGA IDL setup script:

	     /usr/local/PEGA/bin/idl_setup.ksh

	     This is the preferred setup script to use under bash as it 
	     corrects a file duplication that will cause complications for 
	     astromit.

	     REMEMBER: CCDPHOT needs to be run under csh while ASTROMIT needs 
	     to be run under bash with the custom setup file.

   
       3.4.2 Running the Setup Scripts From Startup Files

             The above scripts can be sourced from you startup files so that 
	     IDL is available upon login.

	     bash:

	     Add the following to the .bashrc file located in your home 
	     directory:

	     if [ -f /usr/local/PEGA/bin/idl_setup.ksh ] ; then
	             . /usr/local/PEGA/bin/idl_setup.ksh
	     fi
		     
	     csh:

	     Add the following to the .cshrc file located in your home 
	     directory:

	     if ( -f /usr/local/etc/idl_setup ) then
	             source /usr/local/etc/idl_setup
	     endif

	     Once this is done, IDL will be accessible whenever you login.
	

4. Data Locations

   From time to time, PEGA Data locations are added or removed.  Below is a 
   current comprehensive list of these locations.


   4.1 Data Stores

       /nfs/bokN/pega_data/		(where N=1,2,3,4)

         General scratch space and temporary storage (disk2 currently contains
         the cdtemp directory).

       /nfs/boltonN/pega_data/		(where N=1,2)

         General storage and work area.  Most data being processed by object 
         is here.

       /nfs/compton1/pega_data/

         Raw data repository after processing on disk2 before deletion.  
         Eventual location of the cdtemp directory.

       /nfs/compton2/pega_data/

         Processed data repository.  Most data from the last 10+ years is here 
         and is currently undergoing processing.  This will eventually allow 
         simple reduction of the data by object over the long-term.

       /usr/local/PEGA/data/

         A common location for all the above directories.  The syntax is 
	 similar to the above directories, but not the same.  The directory 
	 for each is the machine name followed immediately by the disk number 
	 (e.g. /usr/local/PEGA/data/compton2/ is /compton/disk2/pega_data/)
	 
	 
   4.2 Other Media

       The data also exists in other digital forms from 9-track to 8mm 
       EXA-Byte tapes, but is not all available.  The primary backup for the 
       raw data is on CD-ROM in my office (currently 712 1PPS).


5. Future Plans

   This HOWTO contains only information that I have compiled myself, but will 
   eventually be expanded into a full-sized HOWTO by the addition of 
   information from Elizabeth Ferrara.

   If anybody comes up with any good tips or important ideas, don't hesitate 
   to bring them to my attention.

   Added 20020609

   The reason for the recent changes is to try to keep all the data organized
   in a consistent fashion in preparation for a merging of many of the scripts
   mentioned here.  I plan to construct a pipeline which will speed up the 
   tedious aspects of the reduction process without losing sight of individual
   step necessary (i.e. zeroes, flats, etc.).

   This optimization will allow me to also streamline the analysis aspect of 
   the reduction process and eventually create a web-based interface to access
   all the data.

   This is, of course, extremely ambitious, but the easier it is the organize
   the current database into something more useful, the easier it will be to 
   realize the final goal.