Removed the copy of README.md in the header of the .py file

jgagneastro · Feb 1, 2018 · b393bc0 · b393bc0
1 parent 7bc44a4
commit b393bc0
Showing 1 changed file with 1 addition and 190 deletions.
diff --git a/banyan_sigma.py b/banyan_sigma.py
@@ -1,194 +1,5 @@
 """
- NAME:
-       BANYAN_SIGMA
-
- PURPOSE:
-       Calculate the membership probability that a given astrophysical object belongs to one of the currently
-       known 27 young associations within 150 pc of the Sun, using Bayesian inference. This tool uses the sky
-       position and proper motion measurements of an object, with optional radial velocity (RV) and distance (D)
-       measurements, to derive a Bayesian membership probability. By default, the priors are adjusted such that
-       a probability treshold of 90% will recover 50%, 68%, 82% or 90% of true association members depending on
-       what observables are input (only sky position and proper motion, with RV, with D, with both RV and D,
-       respectively).
-       
-       Please see Gagné et al. 2017 (ApJ, XX, XX) for more detail.
-       
-       An online version of this tool is available for 1-object queries at http://www.astro.umontreal.ca/~gagne/banyansigma.php.
-       
- REQUIREMENTS:
-       (1) A fits file containing the parameters of the multivariate Gaussian models of each Bayesian hypothesis must be included
-           at /data/banyan_sigma_parameters.fits in the directory where BANYAN_SIGMA() is compiled. 
-           The fits file can be written with the IDL MWRFITS.PRO function from an IDL array of structures of N elements, where N is the total
-           number of multivariate Gaussians used in the models of all Bayesian hypotheses. Each element of this structure contains
-           the following information:
-           NAME - The name of the model (scalar string).
-           CENTER_VEC - Central XYZUVW position of the model (6D vector, in units of pc and km/s).
-           COVARIANCE_MATRIX - Covariance matrix in XYZUVW associated with the model (6x6 matrix, in mixed units of pc and km/s).
-           PRECISION_MATRIX - (Optional) Matrix inverse of COVARIANCE_MATRIX, to avoid re-calculating it many times (6x6 matrix).
-           LN_NOBJ - (Optional) Natural logarithm of the number of objects used to build the synthetic model (scalar). This is not used in banyan_sigma().
-           COVARIANCE_DETERM - (Optional) Determinant of the covariance matrix, to avoid re-calculating it many times (scalar).
-           PRECISION_DETERM - (Optional) Determinant of the precision matrix, to avoid re-calculating it many times (scalar).
-           LN_ALPHA_K - (Optional) Natural logarithm of the alpha_k inflation factors that ensured a fixed rate of true positives
-                        at a given Bayesian probability treshold. See Gagné et al. 2017 (ApJ, XX, XX) for more detail (scalar or 4-elements vector). This is not used in BANYAN_SIGMA.
-           LN_PRIOR - Natural logarithm of the Bayesian prior (scalar of 4-elements vector). When this is a 4-elements vector,
-                      the cases with only proper motion, proper motion + radial velocity, proper motion + distance or proper motion + radial velocity + distance
-                      will be used with the corresponding element of the LN_PRIOR vector.
-           LN_PRIOR_OBSERVABLES - Scalar string or 4-elements vector describing the observing modes used for each element of ln_prior.
-                                  This is not used in banyan_sigma().
-           COEFFICIENT - Coefficient (or weight) for multivariate Gaussian mixture models. This will only be used if more than one element of the
-                         parameters array have the same model name (see below).  
-           
-           In Python, this fits file is read with the Astropy.Tables routine.
-           
-           When more than one elements have the same model name, BANYAN_SIGMA will use the COEFFICIENTs to merge its Bayesian probability,
-           therefore representing the hypothesis with a multivariate Gaussian model mixture.
-           
-       (2) (Optional) A fits file containing the various performance metrics (true positive rate, false positive rate, positive
-           predictive value) as a function of the Bayesian probability treshold, for each young association. Each element of this structure contains
-           the following information:
-           NAME - The name of the model (scalar string).
-           PROBS - N-elements array containing a list of Bayesian probabilities (%).
-           TPR - Nx4-elements array containing the rate of true positives that correspond to each of the Bayesian probability (lower) tresholds stored in PROBS.
-           FPR - Nx4-elements array containing the rate  of false positives that correspond to each of the Bayesian probability (lower) tresholds stored in PROBS.
-           PPV - Nx4-elements array containing the Positive Predictive Values that correspond to each of the Bayesian probability (lower) tresholds stored in PROBS.
-           NFP - Number of expected false positives (FPR times the ~7 million stars in the Besancon simulation of the Solar neighborhood)
-           Each component of the 4-elements dimension of TPR, FPR and PPV corresponds to a different mode of input data,
-           see the description of "LN_PRIOR" above for more detail.
-           
-           When this fits file is used, the Bayesian probabilities of each star will be associated with a TPR, FPR, NFP and PPV values in the METRICS sub-structure of
-           the output structure.
-           
-           This file must be located at /data/banyan_sigma_metrics.fits in the directory where BANYAN_SIGMA.pro is compiled.
-           
- CALLING SEQUENCE:
- 
-       OUTPUT_STRUCTURE = BANYAN_SIGMA(stars_data=None, column_names=None, hypotheses=None, ln_priors=None, ntargets_max=None, 
-           ra=None, dec=None, pmra=None, pmdec=None, epmra=None, epmdec=None, dist=None, edist=None, rv=None, erv=None,
-           psira=None, psidec=None, epsira=None, epsidec=None, plx=None, eplx=None,
-           constraint_dist_per_hyp=None, constraint_edist_per_hyp=None,
-          unit_priors=True/False, lnp_only=True/False, no_xyz=True/False, use_rv=True/False, use_dist=True/False, 
-          use_plx=True/False, use_psi=True/False)
-
- OPTIONAL INPUTS:
-       stars_data - A pandas DataFrame that contains at least the following informations:
-                    ra, dec, pmra, pmdec, epmra, and epmdec. It can also optionally contain the informations on
-                    rv, erv, dist, edist, plx, eplx, psira, psidec, epsira, epsidec. See the corresponding keyword
-                    descriptions for more information.
-                    If this input is not used, the keywords ra, dec, pmra, pmdec, epmra, and epmdec must all be specified.
-       column_names - A Python dictionary that contains the names of the "stars_data" columns columns which differ from the
-                     default values listed above. For example, column_names = {'RA':'ICRS_RA'} can be used to specify that
-                     the RA values are listed in the column of stars_data named ICRS_RA.
-       ra - Right ascension (decimal degrees). A N-elements array can be specified to calculate the Bayesian probability of several
-            stars at once, but then all mandatory inputs must also be N-elements arrays.
-       dec - Declination (decimal degrees).
-       pmra - Proper motion in the right ascension direction (mas/yr, must include the cos(dec) factor).
-       pmdec - Proper motion in the declination direction (mas/yr).
-       epmra - Measurement error on the proper motion in the right ascension direction (mas/yr, must not include the cos(dec) factor).
-       epmdec -  Measurement error on the proper motion in the declination direction (mas/yr).
-       rv - Radial velocity measurement to be included in the Bayesian probability (km/s).
-            If this keyword is set, erv must also be set.
-            A N-elements array must be used if N stars are analyzed at once.
-       erv - Measurement error on the radial velocity to be included in the Bayesian probability (km/s).
-             A N-elements array must be used if N stars are analyzed at once.
-       dist - Distance measurement to be included in the Bayesian probability (pc).
-              By default, the banyan_sigma() Bayesian priors are meant for this keyword to be used with trigonometric distances only.
-              Otherwise, the rate of true positives may be far from the nominal values described in Gagné et al. (ApJS, 2017, XX, XX).
-              If this keyword is set, edist must also be set.
-              A N-elements array must be used if N stars are analyzed at once.
-       edist - Measurement error on the distance to be included in the Bayesian probability (pc).
-               A N-elements array must be used if N stars are analyzed at once.
-       plx - Parallax measurement to be included in the Bayesian probability (mas). The distance will be approximated with dist = 1000/plx.
-             If this keyword is set, eplx must also be set.
-             A N-elements array must be used if N stars are analyzed at once.
-       eplx - Measurement error on the parallax to be included in the Bayesian probability (mas).
-              The distance error will be approximated with edist = 1000/plx**2*eplx.
-              A N-elements array must be used if N stars are analyzed at once.
-       psira - Parallax motion factor PSIRA described in Gagné et al. (ApJS, 2017, XX, XX), in units of 1/yr.
-                If this keyword is set, the corresponding psidec, epsira and epsidec keywords must also be set.
-                This measurement is only useful when proper motions are estimated from two single-epoch astrometric
-                measurements. It captures the dependence of parallax motion as a function of distance, and allows
-                banyan_sigma() to shift the UVW center of the moving group models, which is equivalent to
-                correctly treating the input "proper motion" pmra, pmdec, epmra, epmdec as a true apparent motion.
-                This keyword should *not* be used if proper motions were derived from more than two epochs, or if
-                they were obtained from a full parallax solution.
-                A N-elements array must be used if N stars are analyzed at once.
-       psidec - Parallax motion factor psidec described in Gagné et al. (ApJS, 2017, XX, XX), in units of 1/yr.
-                 A N-elements array must be used if N stars are analyzed at once.
-       epsira - Measurement error on the parallax motion factor psira described in Gagné et al. (ApJS, 2017, XX, XX),
-                 in units of 1/yr. A N-elements array must be used if N stars are analyzed at once.
-       epsidec - Measurement error on the parallax motion factor psidec described in Gagné et al. (ApJS, 2017, XX, XX),
-                  in units of 1/yr. A N-elements array must be used if N stars are analyzed at once.
-       ntargets_max - (default 10^6). Maximum number of objects to run at once in BANYAN_SIGMA to avoid saturating the RAM.
-                      If more targets are supplied, banyan_sigma() is run over a loop of several batches of ntargets_max objects. 
-       hypotheses - The list of Bayesian hypotheses to be considered. They must all be present in the parameters fits file
-                    (See REQUIREMENTS #1 above).
-       ln_priors - An dictionary that contains the natural logarithm of Bayesian priors that should be *multiplied with the
-                   default priors* (use unit_priors=True if you want only ln_priors to be considered). The structure must contain the name
-                   of each hypothesis as keys, and the associated scalar value of the natural logarithm of the Bayesian prior for each key. 
-       constraint_dist_per_hyp - A pandas DataFrame that contains a distance constraint (in pc).
-                   Each of the Bayesian hypotheses must be included as keys and the distance must be specified as its
-                   associated scalar value. constraint_edist_per_hyp must also be specified if constraint_dist_per_hyp is specified.
-                   This keyword is useful for including spectro-photometric distance constraints that depend on the age of the young association or field.
-       constraint_edist_per_hyp - A pandas DataFrame that contains a measurement
-                   error on the distance constraint (in pc). Each of the Bayesian hypotheses must be included as keys and the
-                   distance error must be specified as its associated scalar value.  
-
- OPTIONAL INPUT KEYWORD:
-       unit_priors - If this keyword is set, all default priors are set to 1 (but they are still overrided by manual priors input with the keyword ln_priors).
-       lnp_only - If this keyword is set, only Bayesian probabilities will be calculated and returned.
-       no_xyz - If this keyword is set, the width of the spatial components of the multivariate Gaussian will be widened by a large
-                 factor, so that the XYZ components are effectively ignored. This keyword must be used with extreme caution as it will
-                 generate a significant number of false-positives and confusion between the young associations.
-       use_rv - Use any radial velocity values found in the stars_data input structure.
-       use_dist - Use any distance values found in the stars_data input structure.
-       use_plx - Use any parallax values found in the stars_data input structure.
-       use_psi - Use any psira, psidec values found in the stars_data input structure.
-
- OUTPUT:
-      This routine outputs a pandas DataFrame, with the following keys:
-      NAME - The name of the object (as taken from the input structure).
-      ALL - A structure that contains the Bayesian probability (0 to 1) for each of the associations (as individual keys).
-      METRICS - A structure that contains the performance metrics associated with the global Bayesian probability of this target.
-                This sub-structure contains the following keys:
-        TPR - Rate of true positives expected in a sample of objects that have a Bayesian membership probability at least as large as that of the target.
-        FPR - Rate of false positives (from the field) expected in a sample of objects that have a Bayesian membership probability at least as large as that of the target.
-        PPV - Positive Predictive Value (sample contamination) expected in a sample of objects that have a Bayesian membership probability at least as large as that of the target.
-      [ASSOCIATION_1] - Sub-structure containing the relevant details for assiciation [ASSOCIATION_1].
-      [ASSOCIATION_2] - (...).
-      (...).
-      [ASSOCIATION_N] - (...).
-                        These sub-structures contain the following keys:
-        HYPOTHESIS - Name of the association
-        PROB - Bayesian probability (0 to 1)
-        D_OPT - Optimal distance (pc) that maximizes the Bayesian likelihood for this hypothesis.
-        RV_OPT - Optimal radial velocity (km/s) that maximizes the Bayesian likelihood for this hypothesis.
-        ED_OPT - Error on the optimal distance (pc), which approximates the 68% width of how the likelihood varies with distance.
-        ERV_OPT - Error on the optimal radial velocity (km/s), which approximates the 68% width of how the likelihood varies with
-                  radial velocity.
-        XYZUVW - 6-dimensional array containing the XYZ and UVW position of the star at the measured radial velocity and/or
-                 distance, or the optimal radial velocity and/or distance when the first are not available (units of pc and km/s).
-        EXYZUVW - Errors on XYZUVW (units of pc and km/s).
-        XYZ_SEP - Separation between the optimal or measured XYZ position of the star and the center of the multivariate Gaussian
-                  model of this Bayesian hypothesis (pc).
-        UVW_SEP - Separation between the optimal or measured UVW position of the star and the center of the multivariate Gaussian
-                  model of this Bayesian hypothesis (km/s).
-        XYZ_SEP - N-sigma separation between the optimal or measured XYZ position of the star and the multivariate Gaussian model
-                  of this Bayesian hypothesis (no units).
-        UVW_SEP - N-sigma separation between the optimal or measured UVW position of the star and the multivariate Gaussian model
-                  of this Bayesian hypothesis (no units).
-        MAHALANOBIS - Mahalanobis distance between the optimal or measured XYZUVW position of the star and the multivariate Gaussian
-                      model. A Mahalanobis distance is a generalization of a 6D N-sigma distance that accounts for covariances. 
-      BESTYA_STR - A sub-structure similar to those described above for the most probable young association (ignoring the field possibility).
-      YA_PROB - The Bayesian probability (0 to 1) that this object belongs to any young association (i.e., excluding the field).
-      LIST_PROB_YAS - A list of young associations with at least 5% Bayesian probability. Their relative probabilities (%) are specified
-                      between parentheses.
-      BEST_HYP - Most probable Bayesian hypothesis (including the field)
-      BEST_YA - Most probable single young association.
-
- MODIFICATION HISTORY:
-       WRITTEN, Olivier Loubier, July, 12 2017
-       MODIFIED, Jonathan Gagne, October, 25 2017
-         Added several options, comments and header, performance and syntax modifications.
+View the README.md file for a full description of this code and how to use it.
 """
 
 #Import the necessary packages