exploRations
Creating your own project template

A lot of times you are using the same stuff doing the same things across projects. Wouldn’t it be nice if a lot of this could be done by default? It would save you time and bring consistency in your projects. To adress this problem I’ve created a script, which you can download here, that I’ve been using ever since. In this tutorial I’ll show how to use a script that makes a template for this. I’ll also show you what is in the script so you can adapt it to suit your own needs.

What the script does

In this script the main function is open-project. When you call this function it will do the following in the file system:

  1. Create a base directory for the project
  2. Within this directory create sub directories for:
    • Input - a directory for all input data
    • Output data - a directory where you put the output of your analysis data checks.
    • Presentation - assuming you want to make a presentation, you can put all presentation related stuff in here
  3. A script called main.r which you can use that the basis of your project.

In R itself it will:

  1. Create variables you can use when referring to in/output files or other scripts:
    • dir_input - Directory containing input files
    • dir_output_data - Directory containing script output
    • dir_project - Base directory of the project
  2. Loads libraries, and will install them when they’re not available on your system.
  3. Creates some functions that I thought were handy
    • df_to_clipboard() - this function allows you you copy data-frames to your clipboard so they can easily be pasted in Excel.
    • format_euro_currency() – converts a number to a string formatted euro-style.
    • format_number() - converts the number to a presentable number (with dots and commas in the right places (at least: when you live in the correct region).
    • ggchis_mosaic() - A mosaic plot incorporating a Chi-squared test, I commonly use to express differences between groups. I want to thank Rick Scavetta of Science Craft for his wonderful course at DataCamp where I got this plot from.

Saving the script

To start off with you can save the project.r script in your the default working directory. If you have no idea what I’m talking about you can find this through the options screen: Character appearance

Probably your default working directory is different than mine. The ~ sign refers to the home directory. In Windows the home directory is not a familiar concept, but is is the My Documents directory instead.

Adjusting the script

If you’re a control freak like me, you want a different working directory than the default. If you have chosen another than the default you should also adjust the script to make it work; it’s not the most elegant way, but I haven’t found a way around it (if you did: pretty please let me know). In order to make the script work you must change the value of the variable this_file_location in your script to the same directory you just found in your preferences dialog. The variable is the first thing that is defined in the open_project function.

Creating a project

To create a project you start by opening and executing the project.r file. By doing this you make the function open_project available. You can then execute this function by passing two values to it:

Let’s say I want to create a project called ‘Test project’ in the sub directory ‘R Scripts’ of my default working directory. This would translate in the following function call:

open_project("Test project", "~/R Scripts")

Now the directories are created, and a script called ‘main.r’ is put in the project base directory ‘~/R Scripts/Test project’.

Using your project

You can start using your project by opening the main.r file. You can give the file any name you want. When you open the file you’ll see the following two statements:

source("~/R Scripts/project.r")
open_project("Test project", "~/R Scripts") # Creates standard variables and functions 

The open_project() call will not create file this time, but just open it. Now you can start by creating your project within this file.

Reading and writing data

When you read input files, such as CSV files, you can use the paste0() function to create a file string like so:

paste0(dir_input, "/", "filename.csv")

Explaining the script

Installing missing packages

The following is done in the script below:

list_of_packages <- c("ggplot2", "dplyr", "magrittr", "purrr", "fst", "ggmap", "ggthemes", 
                      "reshape2", "scales", "xlsx", "stringr", "RColorBrewer", "qgraph", 
                      "Hmisc", "factoextra", "cluster", "kimisc", "ggrepel", "class",
                      "lubridate", "tidyr", "broom", "funr")
new_packages <- list_of_packages[!(list_of_packages %in% installed.packages()[,"Package"])]
if(length(new_packages)) install.packages(new_packages)
lapply(list_of_packages, require, character.only = TRUE)
rm(list_of_packages, new_packages)

Create directory variables

Variables are created that are the placeholders for the subdirectories created as part of the project:

dir_project <- NULL
dir_input <- NULL
dir_output_data <- NULL

Creating/initializing project structure

The function open_project creates/reopens the project.

Miscellaneous functions

The rest of the script creates functions that are optional, functions I find useful in a lot of projects myself. Of course you can delete those or add to those as you please. The functions were already specified earlier in this tutorial.

Final thoughts

Although the script has some flaws to it, it made my life a lot easier so far. If you have any improvements on it, please let me know. Happy R-ing!

0 Comments