Structuring your code

Why structuring your code?

Why structuring your code?

  • Organization
  • Code quality
  • Allows for easy collaboration
  • Makes your project reproducible!
  • What did I do ???

Ultimately ...

other people will thank you ....

YOU WILL THANK YOU!!

Getting started

Getting started

You first need to install the requirements!


                        pip install cookiecutter
                        

cookiecutter is a

A command-line utility that creates projects from cookiecutters (project templates). E.g. Python package projects, jQuery plugin projects.

Starting a new project

Starting a new project is really easy! You don't need to create a directory first!


                        cookiecutter https://github.com/drivendata/cookiecutter-data-science
                        

or my version


                        cookiecutter https://github.com/vcalderon2009/cookiecutter-data-science
                        

See:
http://vanderbilt-astro-starting-grad-school.readthedocs.io/en/latest/
structuring_your_code.html

Starting a new project

https://drivendata.github.io/cookiecutter-data-science/

Folder structure

folder structure

Good coding and project management

Good coding and project management

There are some opinions that have grown out of experience with what works and what doesn't when collaborating on different projects.

Good coding and project management

  • Data is immutable
  • Notebooks are for exploration and communication
  • Analysis is a DAG → use Makefiles
  • Build from the environment up
  • Keep secrets and configuration out of version control
  • Be conservative in changing the default folder structure

And always use version control!

Writing a good script

Writing a good script

Why?

  • Easy to read
  • Easy to execute
  • Easy to debug
  • Removes unnecessary portions of the code (linting)

Easy to understand

Writing a good script


                        #! /usr/bin/env python
                        # -*- coding: utf-8 -*-

                        # AUTHOR
                        # Created      : DATE
                        # Last Modified: DATE
                        # UNIVERSITY OR PLACE
                        from __future__ import absolute_import, division, print_function 
                        __author__     = ['AUTHOR']
                        __copyright__  = ["Copyright 2018 AUTHOR, "]
                        __email__      = ['EMAIL ADDRESS']
                        __maintainer__ = ['AUTHOR']
                        """

                        """
                        # Importing Modules
                        from cosmo_utils       import mock_catalogues as cm
                        from cosmo_utils       import utils           as cu
                        from cosmo_utils.utils import file_utils      as cfutils
                        from cosmo_utils.utils import file_readers    as cfreaders
                        from cosmo_utils.utils import work_paths      as cwpaths
                        from cosmo_utils.utils import stats_funcs     as cstats
                        from cosmo_utils.utils import geometry        as cgeom
                        from cosmo_utils.mock_catalogues import catls_utils as cmcu

                        import numpy as num
                        import math
                        import os
                        import sys
                        import pandas as pd
                        import pickle
                        import matplotlib
                        matplotlib.use( 'Agg' )
                        import matplotlib.pyplot as plt
                        import matplotlib.ticker as ticker
                        plt.rc('text', usetex=True)
                        import seaborn as sns
                        #sns.set()
                        from progressbar import (Bar, ETA, FileTransferSpeed, Percentage, ProgressBar,
                                                ReverseBar, RotatingMarker)
                        from tqdm import tqdm

                        # Extra-modules
                        import argparse
                        from argparse import ArgumentParser
                        from argparse import HelpFormatter
                        from operator import attrgetter
                        from tqdm import tqdm

                        ## Functions
                        class SortingHelpFormatter(HelpFormatter):
                            def add_arguments(self, actions):
                                """
                                Modifier for `argparse` help parameters, that sorts them alphabetically
                                """
                                actions = sorted(actions, key=attrgetter('option_strings'))
                                super(SortingHelpFormatter, self).add_arguments(actions)

                        def _str2bool(v):
                            if v.lower() in ('yes', 'true', 't', 'y', '1'):
                                return True
                            elif v.lower() in ('no', 'false', 'f', 'n', '0'):
                                return False
                            else:
                                raise argparse.ArgumentTypeError('Boolean value expected.')

                        def _check_pos_val(val, val_min=0):
                            """
                            Checks if value is larger than `val_min`

                            Parameters
                            ----------
                            val: int or float
                                value to be evaluated by `val_min`

                            val_min: float or int, optional (default = 0)
                                minimum value that `val` can be

                            Returns
                            -------
                            ival: float
                                value if `val` is larger than `val_min`

                            Raises
                            -------
                            ArgumentTypeError: Raised if `val` is NOT larger than `val_min`
                            """
                            ival = float(val)
                            if ival <= val_min:
                                msg  = '`{0}` is an invalid input!'.format(ival)
                                msg += '`val` must be larger than `{0}`!!'.format(val_min)
                                raise argparse.ArgumentTypeError(msg)

                            return ival

                        def get_parser():
                            """
                            Get parser object for `eco_mocks_create.py` script.

                            Returns
                            -------
                            args: 
                                input arguments to the script
                            """
                            ## Define parser object
                            description_msg = 'Description of Script'
                            parser = ArgumentParser(description=description_msg,
                                                    formatter_class=SortingHelpFormatter,)
                            ## 
                            parser.add_argument('--version', action='version', version='%(prog)s 1.0')
                            parser.add_argument('-namevar', '--long-name',
                                                dest='variable_name',
                                                help='Description of variable',
                                                type=float,
                                                default=0)
                            ##
                            parser.add_argument('-namevar1', '--long-name1',
                                                dest='variable_name1',
                                                help='Description of variable',
                                                type=_check_pos_val,
                                                default=0.1)
                            ## `Perfect Catalogue` Option
                            parser.add_argument('-namevar2', '--long-name2',
                                                dest='variable_name2',
                                                help='Description of variable',
                                                type=_str2bool,
                                                default=False)
                            ## Program message
                            parser.add_argument('-progmsg',
                                                dest='Prog_msg',
                                                help='Program message to use throught the script',
                                                type=str,
                                                default=cfutils.Program_Msg(__file__))
                            ## Parsing Objects
                            args = parser.parse_args()

                            return args

                        def param_vals_test(param_dict):
                            """
                            Checks if values are consistent with each other.

                            Parameters
                            -----------
                            param_dict: python dictionary
                                dictionary with `project` variables

                            Raises
                            -----------
                            ValueError: Error
                                This function raises a `ValueError` error if one or more of the 
                                required criteria are not met
                            """
                            ##
                            ## This is where the tests for `param_dict` input parameters go.

                        def is_tool(name):
                            """Check whether `name` is on PATH and marked as executable."""

                            # from whichcraft import which
                            from shutil import which

                            return which(name) is not None

                        def add_to_dict(param_dict):
                            """
                            Aggregates extra variables to dictionary

                            Parameters
                            ----------
                            param_dict: python dictionary
                                dictionary with input parameters and values

                            Returns
                            ----------
                            param_dict: python dictionary
                                dictionary with old and new values added
                            """
                            # This is where you define `extra` parameters for adding to `param_dict`.

                            return param_dict

                        def directory_skeleton(param_dict, proj_dict):
                            """
                            Creates the directory skeleton for the current project

                            Parameters
                            ----------
                            param_dict: python dictionary
                                dictionary with `project` variables

                            proj_dict: python dictionary
                                dictionary with info of the project that uses the
                                `Data Science` Cookiecutter template.

                            Returns
                            ---------
                            proj_dict: python dictionary
                                Dictionary with current and new paths to project directories
                            """
                            ## In here, you define the directories of your project

                            return proj_dict



                        def main(args):
                            """

                            """
                            ## Reading all elements and converting to python dictionary
                            param_dict = vars(args)
                            ## Checking for correct input
                            param_vals_test(param_dict)
                            ## Adding extra variables
                            param_dict = add_to_dict(param_dict)
                            ## Program message
                            Prog_msg = param_dict['Prog_msg']
                            ##
                            ## Creating Folder Structure
                            # proj_dict  = directory_skeleton(param_dict, cwpaths.cookiecutter_paths(__file__))
                            proj_dict  = directory_skeleton(param_dict, cwpaths.cookiecutter_paths('./'))
                            ##
                            ## Printing out project variables
                            print('\n'+50*'='+'\n')
                            for key, key_val in sorted(param_dict.items()):
                                if key !='Prog_msg':
                                    print('{0} `{1}`: {2}'.format(Prog_msg, key, key_val))
                            print('\n'+50*'='+'\n')


                        # Main function
                        if __name__=='__main__':
                            ## Input arguments
                            args = get_parser()
                            # Main Function
                            main(args)

                        

See this file here

Writing a good script

This file assumes that you have a
fixed environment!!

This file includes

  • argparse - Parser for the command-line
  • Folder structure
  • Testing
  • Functions for easy import
  • and more ...

You can start using this structure for your future projects.

Environment

A project must always have a defined environment!

environment.yml

You can activate the environment by


                        # Install environment
                        conda env create -f environment.yml
                        
                        # Activate environment
                        source activate <environment>
                        

Environment


                        name: name_environment

                        channels:
                          - defaults

                        dependencies:
                          - python>=3.6
                          - ipython
                          - anaconda
                          - astropy
                          - h5py
                          - numpy
                          - pandas>=0.19.0
                          - scipy
                          - seaborn>=0.8.1
                          - scikit-learn>=0.19.1
                          - sphinx
                          - pip
                          - pip:
                            - GitPython
                            - progressbar2
                            - sphinx_rtd_theme
                            - tqdm
                            - tables
                        

This is an example of the environment.yml.

You should put this in the main directory of your project.

See this file here!

Exercise

Create a project repository with the following parameters:

  • Author - Your name
  • License - MIT
  • Project name - Fall_2018_Bootcamp_project

Do this in your 2018_Bootcamp folder

Resources

For more information, go to:

Back to main website:

https://tinyurl.com/bcb18