This guide is for those wishing to contribute code to the covidregionaldata package. For details on how the package works and its general functionality, as well as installation, see the main README.md file in this repository.
We are working to improve and expand the package: please see the Issues and feel free to comment. We are keen to standardise geocoding (issues #81 and #84) and include data on priority countries (#72). As our capacity is limited, we would very much appreciate any help on these and welcome new pull requests.
Set your working directory to the home directory of this project (or use the provided RStudio project). Install the analysis and all dependencies with:
remotes::install_github("epiforecasts/covidregionaldata", dependencies = TRUE)
Render the documentation with the following:
Rscript inst/scripts/render_output.R
This package is developed in a docker container based on the tidyverse docker image.
To build the docker image run (from the covidregionaldata
directory):
docker build . -t covidregionaldata
To run the docker image run:
docker run -d -p 8787:8787 --name covidregionaldata -e USER=covidregionaldata -e PASSWORD=covidregionaldata covidregionaldata
The RStudio client can be found on port :8787 at your local machines IP. The default username:password is covidregionaldata:covidregionaldata, set the user with -e USER=username, and the password with - e PASSWORD=newpasswordhere. The default is to save the analysis files into the user directory.
To mount a folder (from your current working directory - here assumed to be tmp
) in the docker container to your local system use the following in the above docker run command (as given mounts the whole covidregionaldata
directory to tmp
).
{bash, eval = FALSE} --mount type=bind,source=$(pwd)/tmp,target=/home/covidregionaldata
To access the command line run the following: {bash, eval = FALSE} docker exec -ti covidregionaldata bash
Alternatively, the package environment can be accessed via binder.
covidregionaldata has one main data getter function for subregional data, the get_regional_data()
function. The function calls country-specific data getters (dependent on the country given by the user), and then mostly uses helpers from the helper.R
file to clean and sanitise the data.
In general these are the steps it follows (see picture below for a diagram).
get_<country>_regional_cases()
function for the relevant country.get_iso_codes(<country>)
and left-joining them to the data.totalise_data()
helper function).
This is the most common task for developers. You will need a source of raw data before starting. Note that for the data to be suitable it must be:
You then need to follow these steps:
get_<country>_regional_cases()
function. If level 2 regions are available then make two functions - get_<country>_regional_cases_only_level_1()
and get_<country>_regional_cases_with_level_2()
. While these will share code, it saves writing one function with messy and confusing if statements.This needs to do the following: * load the raw data - the csv_reader() helper function is useful here if the format is CSV * select only the relevant columns and change the names of the columns to the package standards (found in the README)
For some countries there may be multiple data sources, in which case using the dplyr
join functions is recommended. There is no need to deal with NA values at this point, or to calculate new data.
Write documentation using #'
roxygen styling and then call roxygen2::roxygenise()
when done to update the manual and NAMESPACE. There is no need to export your function but be sure to import any functions you do use from other packages.
Add your function(s) in the switch() calls in the get_regional_data()
, get_iso_codes()
and rename_region_column()
functions. ISO-3166-2 codes can be found using Wikipedia.
If you need to write new functionality for the package, and the same functionality already exists in a different function, it may make sense to take this functionality and put it into a separate function to save code. This approach is also helpful if there are long pieces of code which can be split into logical parts.
If writing a helper function, follow these steps:
Add your function to the helpers.R
file.
Write unit tests (as above). At a minimum you should have a test which takes generic data (of a similar format as the data that the function is designed to use) and tests that the function handles it correctly. If there is a chance that user actions could cause the function to fail, then have tests to ensure that the function fails as expected in these instances.
Write documentation using #'
roxygen styling and then call roxygen2::roxygenise()
when done to update the manual and NAMESPACE. There is no need to export your function but be sure to import any functions you do use from other packages.