Benchmarking gigs against other software packages
Source:vignettes/articles/benchmarking.Rmd
benchmarking.Rmd
Rationale
This short article describes the performance of gigs relative to a non-exhaustive group of R and non-R packages designed to implement child growth standards.
Package name | Language | On CRAN? |
---|---|---|
anthro | R | Yes |
childsds | R | Yes |
gigs | R | No |
ki-tools/growthstandards | R | No |
gigs | Stata | No |
zanthro | Stata | No |
Thus far, there is no comprehensive benchmark comparing these different packages. This short article will compare the speed of each package from 1 to 100000 inputs, checking how fast each package can convert weight values to z-scores in the WHO Child Growth standards.
We have performed these benchmarks on a Windows 10 system running a Ryzen 7 3700X processor and 16GB of DDR4 RAM, using R version 4.3.2. The Stata benchmarks have been run on the same system in Stata 18.0 (revision 15 May 2023), using the benchmark package for Stata.
Set up benchmark dataset
The benchmarking and package comparisons will all use the same
100,000-row dataset comprised of data which can be used to convert
between z-scores and centiles in the WHO weight-for-age standard. The
z-scores, x
variable and sexes for each row are generated
randomly with a pre-specified seed. As gigs is validated against the
published growth curve data from the WHO, we use it to generate weight
values in kg for each observation. This dataset can then be used to
compare speed and accuracy in conversion between values and
z-scores/centiles for the packages mentioned above.
who_gs_dataset <- function(n, acronym = "wfa", seed = 154237890) {
withr::with_seed(
seed,
code = {
# Random z-scores around 0
z <- rnorm(n = n)
# X variables are non-discrete but within bounds of the desired acronym
xvars <- gigs::who_gs_coeffs[[acronym]][[1]][, 1]
x <- sample(xvars, size = n, replace = TRUE)
x_jitter <- c(runif(n = 5, min = -1, max = 1), 0)
x <- x + sample(x_jitter, size = n, replace = TRUE)
x <- pmax(pmin(x, max(xvars)), min(xvars))
# Sexes randomly sampled from male and female
sex <- sample(c("M", "F"), size = n, replace = TRUE)
}
)
acronym <- rep_len(acronym, n)
out <- data.frame(z = z, x = x, sex = sex, acronym = acronym) |>
dplyr::mutate(
y = gigs::who_gs_zscore2value(z = z, x = x, sex = sex, acronym = acronym)
)
purrr::set_names(out, c("z", "x", "sex", "acronym", "y"))
}
# Generate 100,000-row dataset
bench_dataset <- who_gs_dataset(n = 100000)
The first 10 rows of this dataset look like this:
bench_dataset[1:10, ]
z | x | sex | acronym | y |
---|---|---|---|---|
0.1783916 | 1480.29104 | F | wfa | 16.593835 |
1.0806829 | 758.04165 | F | wfa | 13.377723 |
-0.7026929 | 1076.37115 | F | wfa | 12.558484 |
0.0950173 | 1425.04165 | M | wfa | 16.348601 |
-0.4482927 | 368.91555 | F | wfa | 8.497343 |
-0.3574745 | 792.04165 | M | wfa | 12.043674 |
1.1661293 | 1358.10063 | F | wfa | 18.188497 |
-0.0874651 | 54.29104 | M | wfa | 5.300828 |
-0.4768519 | 1176.10063 | F | wfa | 13.485699 |
0.8001542 | 784.29104 | M | wfa | 13.710762 |
Benchmark code
The mbench_pkg()
function is used to benchmark each
package over a range of input sizes. Each call to it produces a tabular
output containing the median time required for pkg_expr
to
operate on the data, ranging from 1 to 100,000 inputs.
mbench_pkg <- function(pkg_expr, pkg_name) {
n_inputs <- c(1, 10, 100, 500, 1000, 5000, 10000, 25000, 50000, 75000, 100000)
purrr::map_dfr(.x = n_inputs,
.f = \(no_of_inputs) {
dataset <- bench_dataset[seq_len(no_of_inputs), ]
mbench <- microbenchmark::microbenchmark(
dplyr::mutate(dataset, test = eval(pkg_expr)),
times = 25
)
c(n_inputs = no_of_inputs,
median_time = summary(mbench)$median,
time_units = attr(summary(mbench), which = "unit"))
}) |>
dplyr::mutate(n_inputs = as.integer(n_inputs),
median_time = as.numeric(median_time),
package = pkg_name)
}
anthro
anthro_timings <- mbench_pkg(
pkg_expr = quote(anthro::anthro_zscores(sex = sex,
age = x,
weight = y)$zwei),
pkg_name = paste("R: anthro", packageVersion(pkg = "anthro"))
)
childsds
childsds_timings <- mbench_pkg(
pkg_expr = quote(childsds::sds(value = y,
age = x / 365.25,
sex = sex,
male = "M",
female = "F",
item = "weight",
ref = childsds::who.ref)),
pkg_name = paste("R: childsds", packageVersion(pkg = "childsds"))
)
growthstandards
growthstandards_timings <- mbench_pkg(
pkg_expr = quote(
growthstandards::who_value2zscore(y = y, x = x,
sex = ifelse(sex == "M", "Male",
"Female"),
y_var = "wtkg")
),
pkg_name = paste("R: growthstandards",
packageVersion(pkg = "growthstandards"))
)
gigs
gigs_timings <- mbench_pkg(
pkg_expr = quote(gigs::who_gs_value2zscore(y = y, x = x, sex = sex,
acronym = acronym)),
pkg_name = paste("R: gigs", packageVersion(pkg = "gigs"))
)
Stata
# Save .dta file equivalent of benchmarking table. This can be used to benchmark
# Stata packages.
haven::write_dta(data = bench_dataset,
path = file.path("exclude", "statabench", "bench_dataset.dta"))
In Stata, the commands are run inside a do-file which utilises the benchmark
package for Stata. This script essentially does the same as
mbench_pkg()
, but for the Stata gigs
package and the zanthro package.
// This is Stata code
foreach i in 1 10 100 500 1000 5000 10000 25000 50000 75000 100000 {
use "benchmarking/bench_dataset.dta", clear
qui drop if _n > `i'
di "Number of inputs: `i'"
restore last: ///
bench, reps(25) qui egen double z_gigs = who_gs(y, "wfa", "v2z"), ///
m=M, f=F)
xvar(x) sex(sex) sexcode(
}
foreach i in 1 10 100 500 1000 5000 10000 25000 50000 75000 100000 {
use "benchmarking/bench_dataset.dta", clear
qui drop if _n > `i'
di "Number of inputs: `i'"
restore last: ///
bench, reps(25) qui egen z_anthro = zanthro(y, wa, WHO), xvar(x) gender(sex) ///
F) ageunit(day)
gencode(male=M, female= }
The outputs from this script give a table of timings that look like this:
stata_timings
## n_inputs median_time time_units package
## 1 1 0.009 seconds Stata: gigs 0.4.0
## 2 10 0.008 seconds Stata: gigs 0.4.0
## 3 100 0.009 seconds Stata: gigs 0.4.0
## 4 500 0.010 seconds Stata: gigs 0.4.0
## 5 1000 0.012 seconds Stata: gigs 0.4.0
## 6 5000 0.028 seconds Stata: gigs 0.4.0
## 7 10000 0.047 seconds Stata: gigs 0.4.0
## 8 25000 0.104 seconds Stata: gigs 0.4.0
## 9 50000 0.198 seconds Stata: gigs 0.4.0
## 10 75000 0.296 seconds Stata: gigs 0.4.0
## 11 100000 0.405 seconds Stata: gigs 0.4.0
## 12 1 0.007 seconds Stata: zanthro 1.0.2
## 13 10 0.008 seconds Stata: zanthro 1.0.2
## 14 100 0.009 seconds Stata: zanthro 1.0.2
## 15 500 0.017 seconds Stata: zanthro 1.0.2
## 16 1000 0.027 seconds Stata: zanthro 1.0.2
## 17 5000 0.105 seconds Stata: zanthro 1.0.2
## 18 10000 0.203 seconds Stata: zanthro 1.0.2
## 19 25000 0.479 seconds Stata: zanthro 1.0.2
## 20 50000 0.951 seconds Stata: zanthro 1.0.2
## 21 75000 1.445 seconds Stata: zanthro 1.0.2
## 22 100000 2.046 seconds Stata: zanthro 1.0.2
Plotting
Only by plotting the various timings tables can we see the trends for each package:
On the whole, anthro
is by far the slowest R package,
taking around 2.24 seconds to process 100,000 inputs. This is in part
because anthro
computes results in every WHO Child Growth
standard each time anthro::anthro_zscores()
is called, but
also due to a slower implementation of the WHO LMS process conversion
than the other packages.
Next slowest is the Stata package zanthro
, which takes
around 2.05 seconds to compute results in a single WHO standard. About 4
times faster than zanthro
is gigs
for Stata
0.4.0, which scales more efficiently than zanthro
and takes
around 0.41 seconds to convert 100,000 measurements to z-scores.
Leading the pack are three R implementations:
growthstandards
, gigs
, and
childsds
. The growthstandards
package was the
fastest at ~125 ms for 100,000 inputs, followed by gigs
(~128 ms) and childsds
(~126 ms).
Package output similarity
The packages also differ slightly in how they convert values to
centiles/z-scores, which can affect the z-scores computed by each
package even when inputs are the same. In our testing of the WHO
standards, we found that the tested packages mostly agreed with each
other, but that childsds
did not correctly perform
z-scoring in the WHO standards.
This is because the WHO Child Growth standards constrain z-scores in
the outer tails (i.e. past 3 z-scores), as there were fewer data
available at these extreme values. More information on this constraining
procedure can be found in the reports referenced in the
gigs::who_gs_value2zscore()
documentation. As
childsds
does not perform this constraining procedure,
extreme values are computed incorrectly:
discrepancies <- data.frame(z = c(-3.03, -2.97, 2.97, 3.03),
age_days = 0,
sex = "M") |>
dplyr::mutate(
weight_kg = gigs::who_gs_wfa_zscore2value(z, age_days, sex),
# GIGS z-score
z_gigs = gigs::who_gs_wfa_value2zscore(weight_kg, age_days, sex),
# growthstandards z-score
z_growthstandards = growthstandards::who_wtkg2zscore(
age_days, weight_kg, "Male"
),
# childsds z-score
z_childsds = childsds::sds(
value = weight_kg, age = age_days / 365.25,
sex = sex, male = "M", female = "F",
item = "weight", ref = childsds::who.ref
)
)
When we look at these z-scores, you can see that both
growthstandards
and gigs
correctly apply the
constraining procedure; childsds
does not. From looking at
the anthro
source code, they also apply the constraining
procedure.
z | age_days | sex | weight_kg | z_gigs | z_growthstandards | z_childsds |
---|---|---|---|---|---|---|
-3.03 | 0 | M | 2.068938 | -3.03 | -3.03 | -3.031770 |
-2.97 | 0 | M | 2.091082 | -2.97 | -2.97 | -2.970000 |
2.97 | 0 | M | 5.011546 | 2.97 | 2.97 | 2.970000 |
3.03 | 0 | M | 5.048978 | 3.03 | 3.03 | 3.028744 |
Session information
## R version 4.3.2 (2023-10-31 ucrt)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 8 x64 (build 9200)
##
## Matrix products: default
##
##
## locale:
## [1] LC_COLLATE=English_United Kingdom.1252
## [2] LC_CTYPE=English_United Kingdom.1252
## [3] LC_MONETARY=English_United Kingdom.1252
## [4] LC_NUMERIC=C
## [5] LC_TIME=English_United Kingdom.1252
##
## time zone: Europe/London
## tzcode source: internal
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] withr_2.5.2 units_0.8-5 testthat_3.2.1 rmarkdown_2.25
## [5] knitr_1.45 vctrs_0.6.5 checkmate_2.3.1 gamlss.dist_6.1-1
##
## loaded via a namespace (and not attached):
## [1] sandwich_3.1-0 utf8_1.2.4 generics_0.1.3
## [4] tidyr_1.3.0 lattice_0.21-9 digest_0.6.34
## [7] magrittr_2.0.3 evaluate_0.23 grid_4.3.2
## [10] mvtnorm_1.2-4 fastmap_1.1.1 Matrix_1.6-5
## [13] backports_1.4.1 brio_1.1.4 DBI_1.2.1
## [16] survival_3.5-7 anthro_1.0.1 multcomp_1.4-25
## [19] gigs_0.4.1 purrr_1.0.2 fansi_1.0.6
## [22] scales_1.3.0 TH.data_1.1-2 codetools_0.2-19
## [25] microbenchmark_1.4.10 cli_3.6.2 mitools_2.4
## [28] rlang_1.1.3 childsds_0.8.0 munsell_0.5.0
## [31] splines_4.3.2 tools_4.3.2 growthstandards_0.1.6
## [34] dplyr_1.1.4 colorspace_2.1-0 ggplot2_3.4.4
## [37] R6_2.5.1 zoo_1.8-12 lifecycle_1.0.4
## [40] MASS_7.3-60 pkgconfig_2.0.3 pillar_1.9.0
## [43] gtable_0.3.4 glue_1.7.0 Rcpp_1.0.12
## [46] xfun_0.41 tibble_3.2.1 tidyselect_1.2.0
## [49] survey_4.2-1 htmltools_0.5.7 compiler_4.3.2
## [52] prettyunits_1.2.0