| Title: | Healthcare Analysis Methods |
|---|---|
| Description: | Conducts analyses for healthcare program evaluations or intervention studies. Calculates regression analyses for standard ordinary least squares (OLS or linear) or logistic models. Performs regression models used for causal modeling such as differences-in-differences (DID) and interrupted time series (ITS) models. Provides limited interpretations of model results and a ranking of variable importance in models. Performs propensity score models, top-coding of model outcome variables, and can return new data with the newly formed variables. Conducts Bayesian analysis summaries and graphs, decision curve analysis, and produces some Shewhart control charts. Also performs Cronbach's alpha for various scale items (e.g., survey questions). See Github URL for examples in the README file. For more details on the statistical methods, see Allen & Yen (1979, ISBN:0-8185-0283-5), Angrist & Pischke (2009, ISBN:9780691120355), Cohen (1988, ISBN:0-8058-0283-5), Gebski (2012) <doi:10.1017/S0950268812000179>, Gelman & Goodrich (2019) <doi:10.1080/00031305.2018.1549100>, Harrell (2016, ISBN:978-3-319-19424-0), Kline (1999, ISBN:9780415211581), Kruschke (2014, ISBN:9780124058880), Linden (2015) <doi:10.1177/1536867X1501500208>, Merlo (2006) <doi:10.1136/jech.2004.029454>, Muthen & Satorra (1995) <doi:10.2307/271070>, Rabe-Hesketh & Skrondal (2008, ISBN:978-1-59718-040-5), Ryan (2011, ISBN:978-0-470-59074-4), and Vickers & Elkin (2006) <doi:10.1177/0272989X06295361>. |
| Authors: | Stephen Zuniga [aut, cre, cph] (ORCID: <https://orcid.org/0000-0003-1458-3924>) |
| Maintainer: | Stephen Zuniga <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 1.2.0.9000 |
| Built: | 2026-06-09 07:48:12 UTC |
| Source: | https://github.com/szuniga07/ham |
Performs Cronbach's alpha of specified items from a data frame. Cronbach's Alpha is a formula for estimating the internal consistency reliability of a measurement instrument such as survey items (see Allen & Yang, 1979; Kline, 1999). Survey items can have 2 or more categories such as 5-point scales and contain 2 or more items.
alpha(items, data)alpha(items, data)
items |
Vector of item names that form a scale (e.g., 5-point Likert scales) |
data |
Data frame object. |
A list object with Cronbach's alpha summary statistics.
Allen, M. J., & Yen, W. M. (1979). Introduction to Measurement Theory. Brooks/Cole. ISBN: 0-8185-0283-5. Kline, Paul (1999). Handbook of Psychological Testing (2nd ed). Routledge, New York. ISBN: 9780415211581.
alpha(items=c("i1","i2","i3","i4","i5"), data=cas) # remove i1 as suggested in the previous example, returns higher alpha alpha(items=c("i2","i3","i4","i5"), data=cas)alpha(items=c("i1","i2","i3","i4","i5"), data=cas) # remove i1 as suggested in the previous example, returns higher alpha alpha(items=c("i2","i3","i4","i5"), data=cas)
Fit ordinary least squares (OLS) and logistic models. And fit models for causal inference such as differences-in-differences and interrupted time series. Run these models to evaluate program performance or test intervention effects (e.g., healthcare programs). Options are available for top coding the outcome variable as well as propensity scores. New data can optionally be returned that has these additional variables and constructed variables that are used for DID and ITS models.
assess( formula, data, regression = "none", did = "none", its = "none", intervention = NULL, int.time = NULL, treatment = NULL, interrupt = NULL, subset = NULL, stagger = NULL, topcode = NULL, propensity = NULL, trim = NULL, weights = NULL, offset = NULL, newdata = FALSE, family = NULL )assess( formula, data, regression = "none", did = "none", its = "none", intervention = NULL, int.time = NULL, treatment = NULL, interrupt = NULL, subset = NULL, stagger = NULL, topcode = NULL, propensity = NULL, trim = NULL, weights = NULL, offset = NULL, newdata = FALSE, family = NULL )
formula |
a formula object. Use 'Y ~ .' in DID and ITS models to only specify the constructed model variables (e.g., right side of the DID model: Y ~ Post.All + Int.Var + DID). If regression=ols or regression=logistic, 'Y ~ .' will use all variables in the data.frame as is standard in formulas. |
data |
a data.frame in which to interpret the variables named in the formula. |
regression |
Select a regression method for standard regression models (i.e., neither DID nor ITS). Options are regression="ols" (ordinary least squares AKA linear), regression="logistic", or regression='Poisson'. Default is regression="none" for no standard regression model. |
did |
option for Differences-in-Differences (DID) regression. Select did="two" for models with only 2 time points (e.g., pre/post-test). Select did="many" for >= 3 time points (e.g., monthly time points in 12 months of data). Default is did="none" for no DID. |
its |
option for Interrupted Time Series (ITS) regression. Select its="one" for one group (e.g., intervention only). Select its="two" for two groups (intervention and control). Default is did="none" for no ITS. |
intervention |
optional intervention variable name selected for DID, ITS, and propensity score models that indicate which cases are in the intervention or not when propensity is a character class (i.e., intervention can be NULL when propensity is a formula class for a non-DID or non-ITS model). |
int.time |
optional intervention time variable name selected for DID or ITS models. This indicates the duration of time relative to when the intervention started. |
treatment |
optional treatment start period variable name selected for DID models. Select 1 value from 'int.time' to indicate the start of the intervention. |
interrupt |
optional interruption (or intervention) period(s) variable name selected for ITS models. Select 1 or more values from 'int.time' to indicate the start and/or key intervention periods. There needs to be at least 2 time points per period, at least 3 is better. For example, interrupt= c(3, 5, 7) will suffice, especially if you want to isolate certain periods but interrupt= c(3, 6, 9) may provide more useful information. |
subset |
an expression defining a subset of the observations to use in the regression model. The default is NULL, thereby using all observations. Specify, for example, data$hospital == "NY" or c(1:100,200:300) respectively to use just those observations. This is helpful when doing a submodel for DID or ITS after identifying similar groups. DID and ITS models could be improved by limiting the choice of control groups to only those with similar values on the intervention indicator and baseline trend variable (e.g., 'ITS.Time' and 'ITS.Int') with p-values >= 0.10. |
stagger |
optional list to indicate staggered entry into the intervention or treatment group. Relevant model variables are re-coded to appropriate values and can be used for a form of 'stacked' DID or ITS. If a group of cases joins X months after the primary sample, model variables are adjusted X months. This three element list named: 'a' = a character vector for the name of the grouping column; 'b' = specific categories or levels that indicate which cases have a staggered entry; and 'c' = the time point values at staggered entry. Both 'b' and 'c' must have identical lengths. For ITS models, the staggered entry time must be: interrupt 1 < stagger time < interrupt 2. For example, a WHO health policy may have started in the 3rd year of the study period in NY and Toronto but Chicago and LA joined 6 and 12 months later, therefore stagger= list(a= 'city', b=c('Chicago', 'LA')), c=(30, 36) while interrupt= 25. Default is NULL. |
topcode |
optional value selected to top code Y (or left-hand side) of the formula. Analyses will be performed using the new top coded variable. |
propensity |
optional character vector of variable names or formula class to perform a propensity score model. This requires the 'intervention' option to be selected only when using a character vector. All models will include 'pscore' (propensity score) in the analysis as a covariate adjustment using the propensity score unless one of the weights options 'ipw' (Inverse Probability Weights for the ATE, Average Treatment Effect, estimator), 'nipw' (Normalized Inverse Probability Weights by Hajek estimator), or 'att' (Average Treatment Effect on the Treated) is selected for the analysis. |
trim |
an optional two-element numeric vector that sets limits between 0 and 1 for the propensity score. If NULL, the default values of >= 0 and <= 1 are used (i.e., c(0,1)). |
weights |
an optional 1-element character vector of the data frame column name or character vector of either 'ipw', 'nipw', or 'att' for Inverse Probability of Treatment Weighting Using the propensity score (see 'propensity' above) to be used in the fitting process. Should be NULL or a character vector. If non-NULL, weighting is used with weights; otherwise standard regression is used. |
offset |
this can be used to specify an a priori known component to be included in the linear predictor during fitting. This should be NULL or a numeric vector of length equal to the number of cases. One or more offset terms can be included in the formula instead or as well, and if more than one is specified their sum is used. |
newdata |
optional logical value that indicates if you want the new data returned. newdata=TRUE will return the data with any new columns created from the DID, ITS, propensity score, or top coding. The default is newdata=FALSE. No new data will be returned if none was created. |
family |
a description of the error distribution and link function to be used in the model. For glm this can be a character string naming a family function, a family function or the result of a call to a family function. For glm.fit only the third option is supported. See 'glm' function in Base R for more details. Currently available when regression= 'logistic' or 'Poisson'. |
a list of results from selected regression models. Will return new data if selected. And returns relevant model information such as variable names, type of analysis, formula, study information, and summary of ITS effects if analyzed.
Angrist, J. D., & Pischke, J. S. (2009). Mostly Harmless Econometrics: An Empiricist's Companion. Princeton University Press. ISBN: 9780691120355.
Gebski, V., et al. (2012). Modelling Interrupted Time Series to Evaluate Prevention and Control of Infection in Healthcare. Epidemiology & Infections, 140, 2131–2141. https://doi.org/10.1017/S0950268812000179
Linden, A. (2015). Conducting Interrupted Time-series Analysis for Single- and Multiple-group Comparisons. The Stata Journal, 15, 2, 480-500. https://doi.org/10.1177/1536867X1501500208
# ordinary least squares R^2 summary(assess(hp ~ mpg+wt, data=mtcars, regression="ols")$model) # logistic summary(assess(formula=vs~mpg+wt+hp, data=mtcars, regression="logistic")$model) # OLS with a propensity score summary(assess(formula=los ~ month+program, data=hosprog, intervention = "program", regression="ols", propensity=c("female","age","risk"))$model) # OLS: top coding los at 8.27 and propensity score means (top.los and pscore) summary(assess(formula=los ~ month+program, data=hosprog, intervention = "program", regression="ols", topcode=8.27, propensity=c("female","age","risk"), newdata=TRUE)$newdata[, c("los", "top.los", "pscore")]) # differences-in-differences model: using 2 time periods, pre- and post-intervention summary(assess(formula=los ~ ., data=hosprog, intervention = "program", int.time="month", treatment = 5, did="two")$DID) # DID model: using time points summary(assess(formula=los ~ ., data=hosprog, intervention = "program", int.time="month", treatment = 5, did="many")$DID) #interrupted time series model: two groups and 1 interruption (interrupt= 5) summary(assess(formula=los ~ ., data=hosprog, intervention = "program", int.time="month", its="two", interrupt = 5)$ITS) #interrupted time series model: two groups and 2 interruptions (interrupt= c(5,9)) summary(assess(formula=los ~ ., data=hosprog, intervention = "program", int.time="month", its="two", interrupt = c(5,9))$ITS)# ordinary least squares R^2 summary(assess(hp ~ mpg+wt, data=mtcars, regression="ols")$model) # logistic summary(assess(formula=vs~mpg+wt+hp, data=mtcars, regression="logistic")$model) # OLS with a propensity score summary(assess(formula=los ~ month+program, data=hosprog, intervention = "program", regression="ols", propensity=c("female","age","risk"))$model) # OLS: top coding los at 8.27 and propensity score means (top.los and pscore) summary(assess(formula=los ~ month+program, data=hosprog, intervention = "program", regression="ols", topcode=8.27, propensity=c("female","age","risk"), newdata=TRUE)$newdata[, c("los", "top.los", "pscore")]) # differences-in-differences model: using 2 time periods, pre- and post-intervention summary(assess(formula=los ~ ., data=hosprog, intervention = "program", int.time="month", treatment = 5, did="two")$DID) # DID model: using time points summary(assess(formula=los ~ ., data=hosprog, intervention = "program", int.time="month", treatment = 5, did="many")$DID) #interrupted time series model: two groups and 1 interruption (interrupt= 5) summary(assess(formula=los ~ ., data=hosprog, intervention = "program", int.time="month", its="two", interrupt = 5)$ITS) #interrupted time series model: two groups and 2 interruptions (interrupt= c(5,9)) summary(assess(formula=los ~ ., data=hosprog, intervention = "program", int.time="month", its="two", interrupt = c(5,9))$ITS)
Convert a list of Bayesian analysis chains (e.g., coda package mcmc.list objects) into a data frame for analysis and creating plots. For example, models from JAGS or Stan that were converted into coda class objects can be used to create the data frames. Calculates a set of descriptive statistics that summarize MCMC parameters. And values can be calculated for use in descriptive graphs such as values associated with specific percentiles and vice-versa to help set targets, summaries on hierarchical or multilevel models (up to 3 levels), and the R2 for Bayesian regression models with metric level predictors.
Bayes( x, y = "mcmc", parameter = NULL, mass = 0.95, compare = NULL, rope = NULL, newdata = FALSE, type = NULL, center = "mode", data = NULL, dv = NULL, iv = NULL, expand = NULL, targets = NULL )Bayes( x, y = "mcmc", parameter = NULL, mass = 0.95, compare = NULL, rope = NULL, newdata = FALSE, type = NULL, center = "mode", data = NULL, dv = NULL, iv = NULL, expand = NULL, targets = NULL )
x |
list object of multiple MCMC chains (e.g., matrix class list elements or coda mcmc.list). |
y |
character vector for the type of analysis or output to perform. Select 'post', 'multi', 'target', 'r2', or 'mcmc' for a posterior summary, multilevel/hierarchical model summary (up to 3 levels), target summary, Gelman R-squared statistic, or list object of MCMC chains converted into a data frame. Default is generic 'mcmc'(no analysis, just MCMC creation). |
parameter |
single or multiple element character vector name of parameter(s) in MCMC chains to produce summary statistics. When y='target', use the generally 2 to 3 parameters that represent the distribution parameters (e.g., parameter= c('mean', 'sd')). When y='r2', use the regression parameters in order, ending with the residual or level-1 variance (e.g., parameter= c('intercept', 'beta1', 'beta2', 'standard_deviation')). Default is NULL. |
mass |
numeric vector that specifies the credible mass used in the Highest Density Interval (HDI). Default is 0.95. |
compare |
numeric vector with one comparison value to determine how much of the distribution is above or below the comparison value. Default is NULL. |
rope |
numeric vector with two values that define the Region of Practical Equivalence (ROPE). Test hypotheses by setting low and high values to determine if the Highest Density Interval (HDI) is within or outside of the ROPE. Parameter value declared not credible if the entire ROPE lies outside the HDI of the parameter’s posterior (i.e., we reject the null hypothesis). For example, the ROPE of a coin is set to 0.45 to 0.55 but the posterior 95% HDI is 0.61 - 0.69 so we reject the null hypothesis value of 0.50. We can accept the null hypothesis if the entire 95% HDI falls with the ROPE. Default is NULL. |
newdata |
optional logical vector that indicates if you want the new MCMC data returned. When newdata=TRUE, it will return the list object of MCMC chains, converted into a data frame. This data is used for analysis and all plots. Please select newdata=TRUE to produce any graphs but not needed when y='multi'. The default is newdata=FALSE. |
type |
character vector of length == 1 that indicates the likelihood function used in the model when y='multi' or y='target'. Select 'n', 'ln', 'w', 'g', 't', 'bern', and 'bin' for these respective options in Bayesian estimation (multilevel): 'Normal', 'Log-normal', 'Weibull', 'Gamma', 't', 'Bernoulli', or 'binomial'. Default is NULL. |
center |
character vector that selects the type of central tendency to use when reporting parameter values when y='post', y='target', or y='r2'. Choices include: 'mean', 'median', and 'mode' when y='post', or 'mean' and 'median' when y='r2'. Default is 'mode' when y='post' or 'target' and 'median' when y='r2'. |
data |
object name for the observed data when y='multi' or y='r2'. Default is NULL. |
dv |
character vector of length == 1 for the dependent variable name in the observed data frame when y='multi'. Default is NULL. |
iv |
character vector of length >= 1 for the independent variable name(s) in the observed data frame when y='multi' or y='r2'. When y='multi', enter the lower to higher level clustering or group names (e.g, for health data, iv=c("patient", "hospital"). When type='taov', enter the name of the test group variable. When y='r2', enter the observed data variable names for the hierarchical or multilevel groups. Default is NULL. |
expand |
a character vector of length == 1 indicating the variable name to expand aggregated data into non-aggregated data frames when y='multi'. This variable is the denominator that can be used to calculate a rate in the formula numerator/denominator. For example, when the 'numerator' column equals 4 and the 'denominator' column equals 10, then this single row of data is expanded to 10 rows with four values of 1 and six values of 0 when expand='denominator'. Default is NULL. |
targets |
list of one or two named elements (p, y) with numeric values that represent quantile values (p) in the distribution to return associated outcome values and/or specific outcome values (y) to retrieve associated probabilities. For example, a distribution of harmful hospital readmission rates has an estimated median value of 0.25. Staff are considering 2 types of targets, percentiles (p) of key interest and specific outcome rates (y). They want to know the readmission rate that is at the 40th percentile for a reduced readmission rate (below what is 'average' at the 50th percentile) and the probability greater than a readmission rate of 0.20. They get this information by entering targets=list(p=0.40, y=0.20); calculating 1 - prob(y) from returned results gives them an idea about the effort needed to meet this target of a reduced readmission rate. Select type= one of these options: 'n', 'ln', 'w', 'g', 't', 'bern', 'bin'. Also select parameter= the appropriate center, spread, and possible 3rd shape distribution parameter (e.g., parameter=c('mean', 'sd')). And option to select center= 'mean', 'median', 'mode'. Default is NULL. |
data frame of summary statistics for the MCMC parameter's distribution and/or MCMC data frame. Statistics include highest density interval, effective sample size, proportion of distribution within and outside of a ROPE, distribution compared with a set value, and the parameter's mean, median, and mode. And distribution summaries for multilevel models, target summaries, and regression model R2.
Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences, Second Edition. Hillsdale, NJ: Lawrence Erlbaum Associates, Publishers. ISBN 0-8058-0283-5
Gelman, A., Goodrich, B., Gabry, J., & Vehtari, A. (2019). R-squared for Bayesian Regression Models. The American Statistician, 73, 3, 307–309. https://doi.org/10.1080/00031305.2018.1549100
Kruschke, J. (2014). Doing Bayesian Data Analysis: A Tutorial with R, JAGS, and Stan, Second Edition. New York: Academic Press. ISBN: 9780124058880
# Posterior estimates of length of stay (LOS) # # What is the 'average' LOS and are we beyond our goal of it being <= 3 days? blos1 <- Bayes(losmcmc, y="post", parameter="muOfY", newdata=TRUE, compare=4.5, rope=c(1,3)) #Where are we in relation to 4.5 days? print(blos1$Posterior.Summary) #looks like LOS is substantially higher than 3 days # Multilevel or hierarchical model summaries # # Code below does not run, no 'mcmc_sample' object # bmulti0 <- Bayes(x=mcmc_sample, parameter=c("theta", "omega","omegaO"), # y="multi", type="bern", data=mydf, dv="upbin", iv= c("Plant", "Group")) # Targets for length of stay (LOS) # # Our administrators ask how far are we from our goals, they ask about targets # in increments of 5 points of probability or specific days. We answer both. btarget1 <- Bayes(x=losmcmc, y="target", type="n", parameter=c("muOfY","sigmaOfY"), newdata=TRUE, target=list(p=c(.35,.4,.45, .5, .55), y=c(3,4))) # 'newdata' for plots print(btarget1$Target$Est.Quantile.P) # 5-point increments print(btarget1$Target$Est.Prob.GT.Y) # specific day values, 4 days is plausible # Gelamn's R^2 # # the regression model using Base R data, CO2: update ~ conc bR2 <- Bayes(x=co2mcmc, y='r2', data=CO2, iv="uptake", parameter=c("b0", "b1", "sigma"), newdata=TRUE) # bR2 returns various information print(bR2$R2.Summary$R2) # R^2 print(bR2$R2.Summary$Variance.Pred.Y) # Variance of predicted outcome print(bR2$R2.Summary$Variance.Residuals) # Variance of residuals print(head(bR2$R2.Summary$yPRED)) # Predicted outcomes# Posterior estimates of length of stay (LOS) # # What is the 'average' LOS and are we beyond our goal of it being <= 3 days? blos1 <- Bayes(losmcmc, y="post", parameter="muOfY", newdata=TRUE, compare=4.5, rope=c(1,3)) #Where are we in relation to 4.5 days? print(blos1$Posterior.Summary) #looks like LOS is substantially higher than 3 days # Multilevel or hierarchical model summaries # # Code below does not run, no 'mcmc_sample' object # bmulti0 <- Bayes(x=mcmc_sample, parameter=c("theta", "omega","omegaO"), # y="multi", type="bern", data=mydf, dv="upbin", iv= c("Plant", "Group")) # Targets for length of stay (LOS) # # Our administrators ask how far are we from our goals, they ask about targets # in increments of 5 points of probability or specific days. We answer both. btarget1 <- Bayes(x=losmcmc, y="target", type="n", parameter=c("muOfY","sigmaOfY"), newdata=TRUE, target=list(p=c(.35,.4,.45, .5, .55), y=c(3,4))) # 'newdata' for plots print(btarget1$Target$Est.Quantile.P) # 5-point increments print(btarget1$Target$Est.Prob.GT.Y) # specific day values, 4 days is plausible # Gelamn's R^2 # # the regression model using Base R data, CO2: update ~ conc bR2 <- Bayes(x=co2mcmc, y='r2', data=CO2, iv="uptake", parameter=c("b0", "b1", "sigma"), newdata=TRUE) # bR2 returns various information print(bR2$R2.Summary$R2) # R^2 print(bR2$R2.Summary$Variance.Pred.Y) # Variance of predicted outcome print(bR2$R2.Summary$Variance.Residuals) # Variance of residuals print(head(bR2$R2.Summary$yPRED)) # Predicted outcomes
Artificial data of a 5 item hospital satisfaction survey for a Cronbach's alpha scale (cas).
cascas
casAn artificial data frame with 100 rows and 5 columns:
5 survey items
...
Artificial dataset created with rbinom for 5 items. For example, rbinom(100, 5, .9) generates 1 item. The prob argument is modified to give more or less consistent ratings per item.
Markov Chain Monte Carlo linear regression estimates of plant's CO2 uptake regressed on ambient carbon dioxide concentrations
co2mcmcco2mcmc
co2mcmcMCMC list of 3 chains with 2000 rows and 4 columns in each of the list elements:
Regression intercept.
Regression coefficient of the carbon dioxide concentration effect.
Square root of model residual variance.
Total log unnormalized joint posterior density of the model parameters.
...
co2mcmc is a list of 3 MCMC simulations. It estimates a linear regression using the CO2 data frame from base R. A regression model was specified using a normal distribution maximum likelihood and normal and uniform distribution priors for the beta coefficients and sigma parameters.
Bayes class object with summarized Markov Chain Monte Carlo estimates of plant's CO2 uptake as a binary variable that is above or below the median level
co2multico2multi
co2multiBayes class list of 7 elements with varying rows and columns in each of the list elements:
Standard element that is NA in this object because it is not applicable to this analysis.
Dataframe of 4 combined MCMC chain simulations with 8000 rows and 18 columns. Column 1 is CHAIN indicates the 4 chains, columns 2-13 are 'theta' and indicates each plant's rate of having above the median uptake level, columns 14-17 are 'omega' group rates of 'Type' by 'Treatment', and column 18 is the overall group rate.
Posterior estimate summaries in alphabetical order of plant or group names and numerical order plant or group rates.
Standard element that is NA in this object because it is not applicable to this analysis.
Standard element that is NA in this object because it is not applicable to this analysis.
Standard element that is NA in this object because it is not applicable to this analysis.
3 parameter names in the multilevel analysis, 'theta', 'omega', and 'omegaO'.
...
co2multi is a list of summarized Markov Chain Monte Carlo estimates of plant's CO2 uptake as a binary variable that is above or below the median level. It summarizes measurements nested within plants nested within the 4 levels from a combination of 'Treatment' by 'Type'. An estimation used the CO2 data frame from base R. A hierarchical estimation model was specified using a gamma distribution maximum likelihood and normal and uniform distribution priors at various group levels.
Calculate statistics that can be used to produce X-bar charts, p-charts, and u-charts. This includes producing means for center lines, 3-sigma upper and lower control limits. Users can also calculate values before and after an intervention to see if a change in the control process happened. Values are returned in a data frame.
control( x, y = NULL, time, data, type = "x", subset = NULL, n.equal = TRUE, intervention = NULL )control( x, y = NULL, time, data, type = "x", subset = NULL, n.equal = TRUE, intervention = NULL )
x |
character outcome variable in X bar charts or numerator variable name for u-charts or p-charts (when p-chart data is aggregated). |
y |
character variable name for u-charts or p-charts (when p-chart data is aggregated). When y is present, it becomes the denominator for a rate calculated as x/y. Default is NULL. |
time |
time variable name. |
data |
name of data frame object. |
type |
indicate what type of control chart is needed. Options for the X-bar, p-, and u-charts should be listed as 'x', 'p', and 'u'. Default is the 'x' chart. |
subset |
an expression defining a subset of the observations to use in the grouping. The default is NULL, thereby using all observations. Specify, for example, data$hospital == "NY" or c(1:100,200:300) respectively to use just those observations. |
n.equal |
whether there are or we assume equal subgroup (sample) sizes. If n.equal=TRUE, control limits are calculated using the overall mean n value. If n.equal=FALSE, control limits are based on each subgroup's sample size. Default is TRUE. |
intervention |
a single numeric value for the time when an intervention begins (e.g., intervention=25; intervention begins on the 25th day). Separate means and control limits are calculated pre- and post-intervention. Default is NULL. |
data frame of control chart statistics for X-bar charts, p-charts, and u-charts. Includes means, standard deviations, and 3-sigma upper and lower control limit values.
Ryan, T. (2011). Statistical Methods for Quality Improvement, Third Edition. New Jersey: John Wiley & Sons, Inc. ISBN: 978-0-470-59074-4.
## Hospital LOS and readmissions ## # X-bar chart statistics spc_x <- control(x="los", time="month", data=hosprog, type="x", n.equal=TRUE) print(spc_x) # get data frame output # X-bar chart statistics not assuming equal sample sizes, subsetting for females spc_x <- control(x="los", time="month", data=hosprog, type="x", n.equal=FALSE, subset=hosprog$female==1) print(spc_x) # get data frame output # p-chart statistics, using only the numerator (i.e., y=NULL). Specify unequal sample sizes spc_p <- control(x="rdm30", time="month", data=hosprog, type="p", n.equal=FALSE) print(spc_p) # get data frame output # u-chart for infection rates with an intervention at the 22nd month spc_u <- control(x="HAI", y="PatientDays", time="Month", data=infections, type="u", n.equal=FALSE, intervention=22) print(spc_u) # get data frame output## Hospital LOS and readmissions ## # X-bar chart statistics spc_x <- control(x="los", time="month", data=hosprog, type="x", n.equal=TRUE) print(spc_x) # get data frame output # X-bar chart statistics not assuming equal sample sizes, subsetting for females spc_x <- control(x="los", time="month", data=hosprog, type="x", n.equal=FALSE, subset=hosprog$female==1) print(spc_x) # get data frame output # p-chart statistics, using only the numerator (i.e., y=NULL). Specify unequal sample sizes spc_p <- control(x="rdm30", time="month", data=hosprog, type="p", n.equal=FALSE) print(spc_p) # get data frame output # u-chart for infection rates with an intervention at the 22nd month spc_u <- control(x="HAI", y="PatientDays", time="Month", data=infections, type="u", n.equal=FALSE, intervention=22) print(spc_u) # get data frame output
Calculate statistics such as sensitivity, specificity, positive and negative predictive values, and net benefit and interventions saved from a decision curve analysis.
decide(x, threshold)decide(x, threshold)
x |
assess regression object, currently logistic regression. |
threshold |
numeric vector of length == 1 that sets the threshold used to calculate sensitivity, specificity, positive and negative predictive values, and net benefit and interventions saved from a decision curve analysis. Select thresholds on appropriate linear scale such as logits for logistic regression models. |
summary of model classification based on the selected threshold/cutoff, area under the curve (AUC)/c-statistic, predicted outcomes (transformed if applicable), decision curve analysis values at various percentiles, and sensitivity and specificity related statistics for the regression model at the specified threshold.
Vickers, A. & Elkin, E. (2006). Decision Curve Analysis: A Novel Method for Evaluating Prediction Models. Society for Medical Decision Making, 26, 6, 565-574. https://doi.org/10.1177/0272989X06295361
## Predicting car engine shape type, v or straight ## # run the model car_m1 <- assess(formula=vs ~ hp + am, data=mtcars, regression="logistic") # create a decide object, enter the model name and a threshold on the logit scale d1 <- decide(x=car_m1, threshold= -0.767) # View model classification related statistics print(d1$Model.Summary$Classification) # View decision curve analysis results like 'net benefit' at various thresholds print(d1$DCA)## Predicting car engine shape type, v or straight ## # run the model car_m1 <- assess(formula=vs ~ hp + am, data=mtcars, regression="logistic") # create a decide object, enter the model name and a threshold on the logit scale d1 <- decide(x=car_m1, threshold= -0.767) # View model classification related statistics print(d1$Model.Summary$Classification) # View decision curve analysis results like 'net benefit' at various thresholds print(d1$DCA)
Group level confidence intervals and between-group variation
group( x, y, z = NULL, data, dist = "t", conf.int = 0.95, increment = 1, rolling = NULL, quarts = FALSE, cluster = FALSE, subset = NULL, asis = FALSE, dataf = NULL )group( x, y, z = NULL, data, dist = "t", conf.int = 0.95, increment = 1, rolling = NULL, quarts = FALSE, cluster = FALSE, subset = NULL, asis = FALSE, dataf = NULL )
x |
group predictor variable name. |
y |
outcome variable name. |
z |
time period variable name. |
data |
name of data frame object. |
dist |
indicate the distribution used for confidence intervals. Options for the t, binomial, and exact Poisson distributions. Options are 't', 'b', and 'p'. Default is the 't'. |
conf.int |
select the confidence interval level. Default is 0.95. |
increment |
specify the increment in time periods. Selecting 3 if data uses the month as the unit of time will give confidence intervals, each based on 3 months. Default is 1. |
rolling |
indicate the number of time periods for the 'rolling average'. The rolling average consists of >1 time periods but subsequent point estimate increase by a unit of 1. For example, the common 12-month rolling average is based on months 1-12 of data, followed by the next estimate using months 2-13, 3-14, and so on until the last month in the data has been reached. Default is NULL. |
quarts |
logical TRUE or FALSE that indicates whether to convert continuous x into 4 groups based on quartiles of x. Default is FALSE. |
cluster |
logical TRUE or FALSE to generate measures of between-group variation such as the Intra-Class Correlation, Median Odds Ratio, or Design Effect. Default is FALSE. Uses binary outcome formula (between-group variance/(between-group variance + (3.14^2/3)) for ICC in Rabe-Hesketh which may be more appropriate for multilevel models. ICC, MOR, DE may be less reliable for binomial and Poisson distributions, use caution. |
subset |
an expression defining a subset of the observations to use in the grouping. The default is NULL, thereby using all observations. Specify, for example, data$hospital == "NY" or c(1:100,200:300) respectively to use just those observations. |
asis |
a logical vector that indicates if data will be processed as having only 1 unique observation per 'x' and 'z' combination (i.e., this is intended for use with aggregated data). Default is FALSE. This will allow the plot function to graph single observation data for groups over time. Only the t distribution is used for the overall trend line and confidence band (works in conjunction with 'ocol' and 'oband'). |
dataf |
not currently used, please use 'data' instead. |
list of confidence intervals for outcomes by groups, over time, and clustering measures. Some values returned in alphabetical and numerical order based on the group.
Merlo, J. (2006). A brief conceptual tutorial of multilevel analysis in social epidemiology: using measures of clustering in multilevel logistic regression to investigate contextual phenomena. Journal of Epidemiological Health, 60, 4, 290-297. https://doi.org/10.1136/jech.2004.029454.
Muthen, B. & Satorra, A. (1995). Complex Sample Data in Structural Equation Modeling. Sociological Methodology, 25, 267-316. https://doi.org/10.2307/271070.
Rabe-Hesketh, S. & Skrondal, A. (2008). Multilevel and Longitudinal Modeling Using Stata, Second Edition. ISBN: 978-1-59718-040-5.
#default t distribution results group(x="program", y="los", data=hosprog) #Rounding LOS to integers hp2 <- hosprog; hp2$los2 <- round(hp2$los, 0) #Exact Poisson confidence intervals group(x="program", y="los2", data=hp2, dist="p") #Rolling 6-months of data group(x="program", y="los", z="month", data=hosprog, dist="t", rolling=6) #Data returned separately for rolling 6-months of data and 3-month increments (e.g., quarters) group(x="program", y="los", z="month", data=hosprog, dist="t", increment=3, rolling=6) #Quartile groups for continuous risk score and returned clustering info group(x="risk", y="los", data=hosprog, quarts=TRUE, cluster=TRUE) #Binomial distribution with less conservative 90% confidence intervals group(x="risk", y="rdm30", data=hosprog, quarts=TRUE, dist="b", conf.int=0.90)#default t distribution results group(x="program", y="los", data=hosprog) #Rounding LOS to integers hp2 <- hosprog; hp2$los2 <- round(hp2$los, 0) #Exact Poisson confidence intervals group(x="program", y="los2", data=hp2, dist="p") #Rolling 6-months of data group(x="program", y="los", z="month", data=hosprog, dist="t", rolling=6) #Data returned separately for rolling 6-months of data and 3-month increments (e.g., quarters) group(x="program", y="los", z="month", data=hosprog, dist="t", increment=3, rolling=6) #Quartile groups for continuous risk score and returned clustering info group(x="risk", y="los", data=hosprog, quarts=TRUE, cluster=TRUE) #Binomial distribution with less conservative 90% confidence intervals group(x="risk", y="rdm30", data=hosprog, quarts=TRUE, dist="b", conf.int=0.90)
Patient hospital program/intervention data, intervention group only
hosp1hosp1
hosp1An artificial data frame with 352 rows and 10 columns, intervention patients only:
Patient satisfaction survey mean score.
Hospital length of stay (los)
Hospital stay cost
Patient readmission within 30 days of discharge
Patient death within 30 days of discharge
Patient sex, 1 indicates female, 0 otherwise
Patient age
Patient health risk score ranging from 0 to 1
12 month indicator (1 to 12)
Indicates patient program participation. 1='yes', 0='no'
...
hosp1 is a subset of the artificial dataset hosprog. It is the intervention group's data used for single group interrupted time series.
Patient hospital program/intervention data
hosproghosprog
hosprogAn artificial data frame with 720 rows and 10 columns:
Patient satisfaction survey mean score.
Hospital length of stay (los)
Hospital stay cost
Patient readmission within 30 days of discharge
Patient death within 30 days of discharge
Patient sex, 1 indicates female, 0 otherwise
Patient age
Patient health risk score ranging from 0 to 1
12 month indicator (1 to 12)
Indicates patient program participation. 1='yes', 0='no'
...
Artificial dataset created by using runif. The strength in the association between each variable is weighted by multiplying each subsequent predictor in increments of 1. For example, Y equals runif(720) multiplied by 1 plus runif(720) multiplied by 2 and so on. This allows some predictors to have stronger correlations with Y.
Calculates partial chi-square (Wald chi-square for individual coefficients) from assess class objects. The importance is the partial chi-square minus its degrees of freedom based on the regression coefficients (Harrell, 2015). A higher chi-square indicates a larger effect by the predictors. Therefore, the rank of the chi-square can indicate which predictors can contribute more in explaining the variation in the outcome variable.
importance(model)importance(model)
model |
an assess class object or models with lm or glm class. |
a data.frame object with partial X^2 summary statistics.
Harrell, F. E., Jr. (2016). Regression Modeling Strategies. Springer International Publishing. ISBN: 978-3-319-19424-0.
# OLS regression importance(assess(mpg ~ hp + wt + cyl, data=mtcars, regression= "ols")$model) # logistic regression importance(assess(vs~mpg+wt+hp, data=mtcars, regression= "logistic")$model)# OLS regression importance(assess(mpg ~ hp + wt + cyl, data=mtcars, regression= "ols")$model) # logistic regression importance(assess(vs~mpg+wt+hp, data=mtcars, regression= "logistic")$model)
Hospital acquired infections (HAI) during 41 months.
infectionsinfections
infectionsAn artificial data frame with 41 rows and 3 columns:
Calendar month as an integer, ranging from 1 to 41.
Count of hospital acquired infections (HAI) within 1 month.
Count of hospital patient days within 1 month (i.e., the sum of days in the hospital for all patients).
...
infections is an artificial data frame of reasonable hospital acquired infections (HAI) estimates made entirely for the purpose of demonstrating control charts.
Provides simple interpretations of regression coefficients and Cronbach's alpha from assess and alpha function classes. The interpretations describe coefficients and significance values as well as modifying item scales. The interpretations are text comments associated with specific parameters of the various analyses.
interpret(object)interpret(object)
object |
alpha and assess class objects: alpha, ITS, DID, linear (ols) or logistic models. |
a list with interpretations of Cronbach's alpha scales or regression model results.
# Interpret Cronbach's alpha interpret(alpha(items=c("i1","i2","i3","i4","i5"), data=cas)) # interpret a standard linear (OLS) regression hos1 <- assess(formula=survey ~ program + month, data=hosprog, regression= "ols") interpret(hos1)$model # interpret a differences-in-differences model hos2 <- assess(formula=survey ~ ., data=hosprog, intervention = "program", int.time="month", treatment = 5, did="two", newdata=TRUE) interpret(hos2)$did #interpret(hos2) also runs, returns ITS results if present # interpret an interrupted time series model hos3 <- assess(formula=survey ~ ., data=hosprog, intervention = "program", int.time="month", its="two", interrupt = 5) interpret(hos3)$its# Interpret Cronbach's alpha interpret(alpha(items=c("i1","i2","i3","i4","i5"), data=cas)) # interpret a standard linear (OLS) regression hos1 <- assess(formula=survey ~ program + month, data=hosprog, regression= "ols") interpret(hos1)$model # interpret a differences-in-differences model hos2 <- assess(formula=survey ~ ., data=hosprog, intervention = "program", int.time="month", treatment = 5, did="two", newdata=TRUE) interpret(hos2)$did #interpret(hos2) also runs, returns ITS results if present # interpret an interrupted time series model hos3 <- assess(formula=survey ~ ., data=hosprog, intervention = "program", int.time="month", its="two", interrupt = 5) interpret(hos3)$its
Calculates effects for intervention and control groups based on interrupted time series models from an assess class object. Within a period (or interruption), the effect that represents the trend during the period is calculated for both groups as well as the difference between the groups. Summary statistics are provided that include the effect sizes, t-statistic, standard errors, p-values, and 95% confidence intervals of the effect sizes. These values are provided for the intervention group, control group (when applicable), and the differences between the two groups (Linden, 2015). These values are automatically generated while running a model in assess.
itsEffect(model, type, interruptions)itsEffect(model, type, interruptions)
model |
an interrupted time series (ITS) model with the "lm" class, |
type |
analysis type for single or multiple groups and single or multiple time periods. If selected type="sgst", it is single-group single-time; type="sgmt", it is single-group multiple-time; type="mgst", it is multiple-group single-time; and type="mgmt", it is multiple-group multiple-time. |
interruptions |
ITS number of interruptions as numeric vector greater than 0. |
a data.frame object of ITS effects and summary statistics. Row names are indicated with first, second, and additional numbers when there are multiple interruptions (e.g., 'Intervention.2'). Generally run within the assess function.
Linden, Ariel. (2015). Conducting Interrupted Time-series Analysis for Single- and Multiple-group Comparisons. The Stata Journal, 2015, 15(2), 480-500, https://doi.org/10.1177/1536867X1501500208
i21 <- assess(formula=survey ~ ., data=hosprog, intervention = "program",topcode =NULL, int.time="month", regression="none", interrupt=5, its="two", newdata=TRUE, propensity=NULL) itsEffect(model= i21$ITS, type= "mgst", interruptions= 1)i21 <- assess(formula=survey ~ ., data=hosprog, intervention = "program",topcode =NULL, int.time="month", regression="none", interrupt=5, its="two", newdata=TRUE, propensity=NULL) itsEffect(model= i21$ITS, type= "mgst", interruptions= 1)
Markov Chain Monte Carlo estimates of hospital length of stay from the hosprog data frame
losmcmclosmcmc
losmcmcMCMC list of 4 chains with 5000 rows and 2 columns in each of the list elements:
Estimate of the mean parameter.
Estimate of the square root of variance parameter.
...
losmcmc is a list of 4 MCMC simulations. It estimates the mean and standard deviation using the artificial hosprog data frame from ham. An estimation was specified using a normal distribution maximum likelihood and normal and uniform distribution priors for the mean and sigma parameters.
Provides partial prediction plots for treatment and control groups from difference-in-difference (DID) and interrupted time series (ITS) models. The graph will produce lines for treatment/intervention and control groups to gain understanding through a visual representation of the regression coefficients. By default, the treatment/intervention group is represented with a blue line, the control group is represented with a red line, and the counterfactual line, when available, is a dashed line. There are many options to change the plot.
## S3 method for class 'assess' plot( x, y, xlim = NULL, ylim = NULL, xlab = NULL, ylab = NULL, main = NULL, lwd = NULL, col = NULL, tcol = NULL, cfact = FALSE, conf.int = FALSE, adj.alpha = NULL, add.means = FALSE, add.legend = NULL, legend = NULL, tgt = NULL, tgtcol = "gray", cex = 1, cex.axis = NULL, cex.lab = NULL, cex.main = NULL, cex.text = NULL, cex.legend = NULL, name = FALSE, coefs = FALSE, round.c = NULL, pos.text = NULL, arrow = FALSE, xshift = NULL, x.axis = NULL, ... )## S3 method for class 'assess' plot( x, y, xlim = NULL, ylim = NULL, xlab = NULL, ylab = NULL, main = NULL, lwd = NULL, col = NULL, tcol = NULL, cfact = FALSE, conf.int = FALSE, adj.alpha = NULL, add.means = FALSE, add.legend = NULL, legend = NULL, tgt = NULL, tgtcol = "gray", cex = 1, cex.axis = NULL, cex.lab = NULL, cex.main = NULL, cex.text = NULL, cex.legend = NULL, name = FALSE, coefs = FALSE, round.c = NULL, pos.text = NULL, arrow = FALSE, xshift = NULL, x.axis = NULL, ... )
x |
assess object. Either difference-in-difference or interrupted time series model with no covariate adjustment. |
y |
type of model, specify either 'DID' (difference-in-difference) or 'ITS' (interrupted time series). Will not accept other models. |
xlim |
specify plot's x-axis limits with a 2 value vector. |
ylim |
specify plot's y-axis limits with a 2 value vector. |
xlab |
a vector label for the x-axis. |
ylab |
a vector label for the y-axis. |
main |
the main title of the plot. |
lwd |
select the line width. |
col |
specify intervention and control group colors in a vector. Defaults to, if nothing selected, c("blue", "red") or "blue" for single-group Interrupted Time Series models. |
tcol |
specify treatment or interruption line color as a single character vector. Defaults to "gray" if nothing selected. |
cfact |
logical TRUE or FALSE that indicates whether a counterfactual line should be included. Defaults to FALSE. |
conf.int |
logical TRUE or FALSE that indicates whether a 95% confidence interval bands should be included. Defaults to FALSE. |
adj.alpha |
factor modifying the opacity alpha of the confidence interval bands, in the range of 0 to 1. Default is NULL; if conf.int=TRUE, defaults to 0.4. |
add.means |
adds group means by time period based on model data. Default is FALSE |
add.legend |
add a legend by selecting the location as "bottomright", "bottom", "bottomleft", "left", "topleft", "top", "topright", "right", "center". No legend if nothing selected. |
legend |
a character vector of length >= 1 to appear in the DID and ITS model legends to represent the Intervention and Control groups. |
tgt |
specify 1 or more values on the x-axis of where to add a target line when y='group'. Or 1 or more values on the y-axis of where to add a target line when y='time' or 'roll'. Default is NULL. |
tgtcol |
select a color for the target line. Default is 'gray'. |
cex |
A numerical value giving the amount by which plotting text and symbols should be magnified relative to the default of 1. |
cex.axis |
The magnification to be used for axis annotation relative to the current setting of cex. |
cex.lab |
The magnification to be used for x and y labels relative to the current setting of cex. |
cex.main |
The magnification to be used for main titles relative to the current setting of cex. |
cex.text |
The magnification to be used for the text added into the plot relative to the current setting of 1. |
cex.legend |
The magnification to be used for the legend added into the plot relative to the current setting of 1. |
name |
logical TRUE or FALSE that indicates whether coefficient names should be added to the plot. Default is FALSE. It is overridden if coefs = TRUE. |
coefs |
logical TRUE or FALSE that indicates whether coefficient names, values, and p-value significance symbols ('+' p<0.10; '' p<0.05; '' p<0.01; '' p<0.001) should be added to the plot. Default is FALSE. coefs = TRUE overrides name = FALSE. |
round.c |
an integer indicating the number of decimal places to be used for rounding coefficient values. Default is 2. |
pos.text |
a list of named integer value(s) between 1 to 4 indicating the position of the text added into the plot. List name(s) should use coefficient variable names. |
arrow |
logical TRUE or FALSE that indicates whether arrows and coefficient names should be added to visualize effects. Default is FALSE. |
xshift |
shifts one or two of some of the overlapping intervention associated arrows along the x-axis for a better view. Vector values of at least length 1 or 2 can be positive or negative. And xshift should be specified in the order of the coefficients. Only 1 or 2 of the furthest right, vertical lines for the intervention group is shifted (i.e., not left). One line is shifted when there is 1 treatment/interruption period and 2 shifts for 2 periods. (e.g., "DID" before "DID.Trend" for DID models with argument did="many"). |
x.axis |
a vector of unique character or numeric values that makes up x-axis values to replace the intervention time variable values. This will be most helpful if you prefer current calendar years instead of values starting at 1 (e.g., x.axis= sort(unique(data$Year)) for 1900-1999, not 1-100). Must have equal lengths for unique x.axis values and unique replaced values. Default is NULL. |
... |
additional arguments. |
plot of partial predictions for treatment and control groups.
am2 <- assess(formula= los ~ ., data=hosprog, intervention = "program", topcode =NULL, int.time="month", regression="none", treatment= 5, interrupt=c(5,9), did="two", its="two", newdata=TRUE, propensity=NULL) plot(am2, "DID", add.legend="bottomleft", ylim=c(2, 8)) #DID model, basic plot plot(am2, "ITS", add.legend="top", ylim=c(2, 8)) #ITS model, basic plot plot(am2, "DID", add.legend="topleft", main="DID study", col=c("dodgerblue","magenta"), ylim=c(2, 8), lwd=6, cex=3, cex.axis=2, cex.lab=1.5, cex.main=3, cex.text=2, arrow=TRUE, xshift=0.02, coefs=TRUE, round.c=2 ) plot(am2, "ITS", add.legend="top", xlim=c(-.5, 13), ylim=c(2, 8), main="ITS study", col=c("cyan","hotpink"), tcol="springgreen", lwd=7, cex=2, cex.axis=2, cex.lab=2, cex.main=3, cex.text=1.2, cex.legend=1.25, name=FALSE, coefs=TRUE, round.c=1, pos.text= list("txp5"=3, "post9"=4), arrow=TRUE, xshift=c(.5, 1.5), cfact=TRUE, conf.int=TRUE, adj.alpha=0.2) # ITS: USA's unemployment rate with a single group, 10 interruption model key_time <- c(5, 14, 17, 29, 42, 59, 69, 73, 80,92)# key interruption periods u110 <- assess(formula=rate ~ ., data=unemployment, intervention="usa",# model int.time="year", its="one", interrupt= key_time) # Graph the results, note shift in intercept and slope during 1933's New Deal plot(u110, "ITS", add.means = TRUE, coefs=TRUE, conf.int=TRUE, adj.alpha = .2, lwd=2, col="slategray", tcol= "orange", main="US unemployment rate", xlab="Years (1929-2024)", ylab="Proportion of labor market", cex.main=2, cex.axis = 1.5, cex.lab = 1.5, cex=2, cex.text = 1.25, pos.text=list("ITS.Time"=4, "post42"=1,"txp42"=3,"txp92"=3), x.axis=unemployment$Year) for(i in 1:length(key_time)) { text(key_time[i], .22-(.01*i), cex=1.25, labels = paste0(unemployment[ key_time[i], "Year"], ": ", unemployment[ key_time[i], "event"])) }am2 <- assess(formula= los ~ ., data=hosprog, intervention = "program", topcode =NULL, int.time="month", regression="none", treatment= 5, interrupt=c(5,9), did="two", its="two", newdata=TRUE, propensity=NULL) plot(am2, "DID", add.legend="bottomleft", ylim=c(2, 8)) #DID model, basic plot plot(am2, "ITS", add.legend="top", ylim=c(2, 8)) #ITS model, basic plot plot(am2, "DID", add.legend="topleft", main="DID study", col=c("dodgerblue","magenta"), ylim=c(2, 8), lwd=6, cex=3, cex.axis=2, cex.lab=1.5, cex.main=3, cex.text=2, arrow=TRUE, xshift=0.02, coefs=TRUE, round.c=2 ) plot(am2, "ITS", add.legend="top", xlim=c(-.5, 13), ylim=c(2, 8), main="ITS study", col=c("cyan","hotpink"), tcol="springgreen", lwd=7, cex=2, cex.axis=2, cex.lab=2, cex.main=3, cex.text=1.2, cex.legend=1.25, name=FALSE, coefs=TRUE, round.c=1, pos.text= list("txp5"=3, "post9"=4), arrow=TRUE, xshift=c(.5, 1.5), cfact=TRUE, conf.int=TRUE, adj.alpha=0.2) # ITS: USA's unemployment rate with a single group, 10 interruption model key_time <- c(5, 14, 17, 29, 42, 59, 69, 73, 80,92)# key interruption periods u110 <- assess(formula=rate ~ ., data=unemployment, intervention="usa",# model int.time="year", its="one", interrupt= key_time) # Graph the results, note shift in intercept and slope during 1933's New Deal plot(u110, "ITS", add.means = TRUE, coefs=TRUE, conf.int=TRUE, adj.alpha = .2, lwd=2, col="slategray", tcol= "orange", main="US unemployment rate", xlab="Years (1929-2024)", ylab="Proportion of labor market", cex.main=2, cex.axis = 1.5, cex.lab = 1.5, cex=2, cex.text = 1.25, pos.text=list("ITS.Time"=4, "post42"=1,"txp42"=3,"txp92"=3), x.axis=unemployment$Year) for(i in 1:length(key_time)) { text(key_time[i], .22-(.01*i), cex=1.25, labels = paste0(unemployment[ key_time[i], "Year"], ": ", unemployment[ key_time[i], "event"])) }
Graph Bayesian diagnostic traceplots, density plots on convergence, autocorrelation factor, calculate Monte Carlo Standard Errors, and effective sample sizes. And plot summaries of posterior distributions, posterior predictive checks, summaries on hierarchical or multilevel models (up to 3 levels), and summary graphs of values associated with specific percentiles (and vice-versa) that can be used to help set targets. Plots are developed from Bayes class objects that converted multiple chains into data frames.
## S3 method for class 'Bayes' plot( x, y = NULL, type = "n", parameter = NULL, center = "mode", mass = 0.95, compare = NULL, rope = NULL, data = NULL, dv = NULL, iv = NULL, add.data = "n", group = NULL, main = NULL, xlab = NULL, ylab = NULL, xlim = NULL, ylim = NULL, vlim = NULL, curve = FALSE, lwd = NULL, breaks = 15, bcol = NULL, lcol = NULL, pcol = NULL, xpt = NULL, tgt = NULL, tgtcol = "gray", tpline = NULL, tpcol = NULL, pline = 20, pct = 95, add.legend = NULL, legend = NULL, cex = 1, cex.lab = NULL, cex.axis = NULL, cex.main = NULL, cex.text = NULL, cex.legend = NULL, HDItext = 0.7, math = "n", es = "n", subset = NULL, level = NULL, aorder = TRUE, round.c = 2, ... )## S3 method for class 'Bayes' plot( x, y = NULL, type = "n", parameter = NULL, center = "mode", mass = 0.95, compare = NULL, rope = NULL, data = NULL, dv = NULL, iv = NULL, add.data = "n", group = NULL, main = NULL, xlab = NULL, ylab = NULL, xlim = NULL, ylim = NULL, vlim = NULL, curve = FALSE, lwd = NULL, breaks = 15, bcol = NULL, lcol = NULL, pcol = NULL, xpt = NULL, tgt = NULL, tgtcol = "gray", tpline = NULL, tpcol = NULL, pline = 20, pct = 95, add.legend = NULL, legend = NULL, cex = 1, cex.lab = NULL, cex.axis = NULL, cex.main = NULL, cex.text = NULL, cex.legend = NULL, HDItext = 0.7, math = "n", es = "n", subset = NULL, level = NULL, aorder = TRUE, round.c = 2, ... )
x |
Bayes class object. |
y |
character vector for the type of plot to graph. Select 'post', 'dxa', 'dxd', 'dxg', 'dxt', 'check', 'multi' (specialized version of 'check'), or 'target' for posterior summary, diagnostics (4 'dx' plots produced: autocorrelation factor, density plots on chain convergence, Gelman-Rubin statistic, and traceplot), posterior predictive check, multilevel or hierarchical model summary (up to 3 levels), or target summary plots. Default is 'post'. |
type |
character vector of length == 1 that indicates the likelihood function used in the model when y='check' or y='multi'. Posterior predictive checks allow us to see how well our estimates match the observed data. These checks are available for Bayesian estimation of outcomes and regression trend lines (with polynomial terms) using various distributions in the likelihood function. Select 'n', 'ln', 'sn', 'w', 'g', 't', 'taov', 'taov1', 'ol', 'oq','oc', 'lnl', 'lnq', 'lnc', 'logl', 'logq', 'logc', 'bern', and 'bin' for these respective options in Bayesian estimation (multilevel): 'Normal', 'Log-normal', 'Skew-normal', 'Weibull', 'Gamma', 't', 't: ANOVA, side view', 't: ANOVA 1 group, side view'; and for regression trend lines: 'OLS: Linear', 'OLS: Quadratic', 'OLS: Cubic', 'Log-normal: Linear', 'Log-normal: Quadratic', 'Log-normal: Cubic', 'Logistic: Linear', 'Logistic: Quadratic', and 'Logistic: Cubic', 'Bernoulli', and 'binomial'. The first 8 selections are for Bayesian estimation of outcomes, the next 9 options were developed to assess regression trend lines from ordinary least squares (OLS), log-normal, and logistic models and also for hierarchical model versions. And the remaining two ('bern' and 'bin') are for when y='multi'. Additional models analogous to 'Generalized Linear Models' can also be graphed on the logit scale using 'OLS' options. For example, plot a logistic model on the logit scale when type='ol' (i.e., view straight trend lines). Or if you prefer viewing results on the probability scale, select type='logl' (i.e., curved lines). And consider using type= 'lnl', 'lnq', 'lnc' for log-normal and Poisson models with lines based on exponentiated values. In general, it is important to note that the observed data may not be on the same scale as the parameter estimates and may not be automatically visible in the graph (i.e., x and y-axis limits may help). When graphing target summary plots that use posterior predictive checks rather than the basic posterior summary graph (plots target values over the parameter estimate of the center such as the mean), enter y='target' and other arguments relevant to y='check'. However, type= 'bern', 'bin', 'sn' are not available but type='n', 'ln', 'w', 'g', or 't' are available. Default is NULL. |
parameter |
a character vector of length >= 1 or a 2 element list with the name(s) of parameter in MCMC chains to produce
summary statistics. Use a 1 element vector to get posterior estimates of a single parameter. Use a 2 or more element vector
to estimate the average joint effects of multiple parameters (e.g., average infection rate for interventions A and B when
parameter= c('IntA', 'IntB')). Use a 2 element list to perform mathematical calculations of multiple parameters (see 'math' below).
For example, use parameter=list('hospital_A', 'hospital_Z') if you want to estimate the difference between the hospital's outcomes.
Use parameter= list(c('hospital_A','hospital_B'), ('hospital_Y','hospital_Z')) to estimate how different the combined hospitals A
and B values are from the combined Hospital Y and Z values. When y='check', use either a multiple element character vector that
represents center, spread, and additional distribution parameters in order of 1st, 2nd, and 3rd distribution parameters. For example,
mean and sd for a normal distribution; mean log and sd log of a log-normal dist.; xi, omega, and alpha of a skew-normal distribution;
shape, scale, and lambda of a Weibull distribution; shape and rate of a Gamma distribution; and mean, SD and nu (i.e., degrees
of freedom) of a t-distribution. Or indicate regression parameters in order (e.g., intercept, Beta 1, Beta 2, etc.). When y='multi',
use a multiple element character vector to list the parameter names of the hierarchy, in order of the nesting with the lowest level
first (e.g., exams nested in patients nested in hospital). When y='multi', for parameters from multiple groups such as various
hospitals, only enter the first unit's prefix of each parameter and the remaining groups will be set up for graphing. For example,
parameter=c('theta', 'omega') will plot data for |
center |
character vector that selects the type of central tendency to use when reporting parameter values. Choices include: 'mean', 'median', and 'mode'. Default is 'mode'. |
mass |
numeric vector that specifies the credible mass used in the Highest Density Interval (HDI). Default is 0.95. |
compare |
numeric vector with one comparison value to determine how much of the distribution is above or below the comparison value. Default is NULL. |
rope |
numeric vector with two values that define the Region of Practical Equivalence (ROPE). Test hypotheses by setting low and high values to determine if the Highest Density Interval (HDI) is within or outside of the ROPE. Parameter values are declared not credible if the entire ROPE lies outside the HDI of the parameter’s posterior (i.e., we reject the null hypothesis). For example, the ROPE of a coin is set to 0.45 to 0.55 but the posterior 95% HDI is 0.61 - 0.69 so we reject the null hypothesis value of the rate of a head is 0.50. We can accept the null hypothesis if the entire 95% HDI falls with the ROPE. Default is NULL. |
data |
object name for the observed data when y='check', y='multi' or y='target'. |
dv |
character vector of length == 1 for the dependent variable name in the observed data frame when y='check', y='multi' or y='target'. Default is NULL. |
iv |
character vector of length >= 1 for the independent variable name(s) in the observed data frame when y='check' or y='multi'. When y='multi', enter the lower to higher level clustering or group names (e.g, for health data, iv=c("patient", "hospital"). When type='taov', enter the name of the test group variable (e.g., 'intervention'). Default is NULL. |
add.data |
character vector of length == 1 to determine the type of observed data added to the plot (to show model fit to data) when y='check' and type= 'ol', 'oq','oc', 'lnl', 'lnq', 'lnc', 'logl', 'logq', or 'logc'. Select 'a', 'u', 'al', 'ul', 'n' for these observed data options: 'All', 'Unit', 'All: Lines', 'Unit: Lines' (unit specific lines are linked), 'none'. Default is 'n' for none or no observed data shown. |
group |
character list of length == 2 for 1) the grouping variable name and 2) specific group(s) in the observed data frame. This is primarily used for multilevel or hierarchical models when y='check' or y='multi' that the hierarchies are based on (e.g., hospitals nested within health systems). |
main |
the main title of the plot. |
xlab |
a character vector label for the x-axis. |
ylab |
a character vector label for the y-axis. |
xlim |
specify plot's x-axis limits with a 2 element numeric vector. |
ylim |
specify plot's y-axis limits with a 2 element numeric vector. |
vlim |
two element vector to specify limits for minimum and maximum values used to extrapolate posterior lines along the x-axis. For example, when drawing a log-normal distribution, we may want to have our posterior lines fit within a narrower range while having our graph's x-axis limits extend past those lines. If so, the value limits (vlim) help us keep our posterior predictive check lines within desired limits. Default is NULL. |
curve |
select a curve to display instead of a histogram when y='post'. Default is FALSE. |
lwd |
select the line width. |
breaks |
number of breaks in a histogram. Default is 15. |
bcol |
a single or multiple element character vector to specify the bar or band color(s). When Bayesian estimates and observed values are present, the first colors are for Bayesian estimates while the last colors are observed values. Defaults to, if nothing selected, 'gray', except when y = 'multi' and then no overall HDI is graphed until a color is selected. |
lcol |
a single or multiple element character vector to specify the line color(s). When Bayesian estimates and observed values are present, the first colors are Bayesian estimates while the last colors are observed values. When multiple lines are needed, single item lines precede multiple use lines. For example, a single comparison value line will be assigned the first lcol while both rope lines will be given the same color of the second lcol when y='post'. Defaults to 'gray' if nothing selected. |
pcol |
a single or multiple element character vector to specify the point color(s). When Bayesian estimates and observed values are present, the first colors are Bayesian estimates while the last colors are observed values. Defaults to, if nothing selected, 'gray'. |
xpt |
a numeric vector of single or multiple values that indicate placement of points (+) on the x-axis when y='check'. This is intended for the graphs with predictive checks on Bayesian estimation (i.e., not trend lines). Default is NULL. |
tgt |
specify 1 or more values on the x- or y-axis of where to add one or more target lines when applicable. Default is NULL. |
tgtcol |
select one or multiple colors for one or multiple target lines. Default is 'gray'. |
tpline |
add one or more time point vertical lines using x-axis values. Default is NULL (i.e., no lines). |
tpcol |
specify a color for the time point line, tpline. Default is NULL. |
pline |
a numeric vector of length == 1 for the number of random posterior predictive check lines when y='check'. Default is 20. |
pct |
a numeric integer vector of length == 1 for the percentage of the posterior predictive check heavy tail lines to be drawn when type= 'taov' or 'taov1'. Valid values are 0 < pct < 100. Default is 95 (e.g., 95%). |
add.legend |
add a legend by selecting the location as "bottomright", "bottom", "bottomleft", "left", "topleft", "top", "topright", "right", "center". Default is no legend produced if nothing is selected. |
legend |
a character vector of length >= 1 to appear when y='check', y='multi', and sometimes y='target'. Legends to represent hierarchical estimates and observed values. |
cex |
A numerical value giving the amount by which plotting text and symbols should be magnified relative to the default of 1. |
cex.lab |
The magnification to be used for x and y labels relative to the current setting of cex. |
cex.axis |
The magnification to be used for axis annotation relative to the current setting of cex. |
cex.main |
The magnification to be used for main titles relative to the current setting of cex. |
cex.text |
The magnification to be used for the text added into the plot relative to the current setting of 1. |
cex.legend |
The magnification to be used for the legend added into the plot relative to the current setting of 1. |
HDItext |
numeric vector of length == 1 that can be a negative or positive value. Identifies placement of HDI text near credible interval when y='post'. Values are relative to the x-axis values. Default is 0.7. |
math |
mathematics function performed between multiple parameters when y='post'. Available functions are: 'add', 'subtract', 'multiply', 'divide', or 'n' for none (i,e., no functions). Indicate parameters with parameter argument. For example, when math='subtract', use parameter=list('hospital_A', 'hospital_Z') if you want to estimate the difference between the hospital's outcomes. Use parameter=list(c('hospital_A','hospital_B'), ('hospital_Y','hospital_Z')) to estimate how different the combined hospitals A and B values are from the combined hospitals Y and Z. Additionally, compute statistics like the coefficient of variation when math='divide' and parameter= list('Standard_Deviation', 'Mean'). Default is 'n' for no math function. |
es |
one element vector that indicates which type of likelihood distribution is relevant in calculating Jacob Cohen's effect sizes between 2 parameters when y='post'. Options are 'bern', 'bin', and 'n' for the Bernoulli or binomial distributions for binary outcomes and none (i.e., no distribution, hence no effect size calculated). For example, to get the posterior distribution summary for the difference between the intervention and control groups on 30-day readmissions or not, use es='bern' or 'bin' when y='post', math='subtract', and parameter=list('intMean', 'ctlMean'). Default is 'n' which indicates not to calculate the effect size. |
subset |
a single or multiple element character or numeric vector of group names that are a subset of the observations to use in the grouping when y='multi'. The default is NULL, thereby using all observations. Specify, for example, enter c('NY', 'Toronto', 'LA', 'Vancouver') to view a graph with only these cities. Default is NULL. |
level |
a numeric integer of length == 1, either 1, 2, or 3 that indicates the level of the hierarchical/multilevel model when y='multi' and the type of graph to plot. For example, a multilevel model that estimates the proportion of successful exams by patients is considered level=2. And the successful exam rates by patients from various hospitals is level=3. Graphs can be created separately for both level=2 and level=3 when there is a three-level model. The graph when y='multi' can be produced when level=1 for non-hierarchical models if there are estimates for groups. For example, estimating the patient infection rate of hospitals without a hierarchical structure in the model. Default is NULL. |
aorder |
a logical indicator on whether the ordering of the group levels are in alphabetical order or not when y='multi'. If aorder=TRUE, results are displayed in an increasing alphabetical order based on level name (e.g., 'LA' before 'NY'). If aorder=FALSE, an increasing numeric order based on group parameter values is performed (e.g., 0.65 before 0.70). Default is TRUE. |
round.c |
an integer indicating the number of decimal places when rounding numbers y='multi' and y='target'. Default is 2. |
... |
additional arguments. |
plots assessing model quality, posterior distributions, posterior predictive checks, hierarchical or multilevel model summaries, and target summaries.
Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences, Second Edition. Hillsdale, NJ: Lawrence Erlbaum Associates, Publishers. ISBN 0-8058-0283-5 Gelman, A., Goodrich, B., Gabry, J., & Vehtari, A. (2019). R-squared for Bayesian Regression Models. The American Statistician, 73, 3, 307–309. https://doi.org/10.1080/00031305.2018.1549100
Kruschke, J. (2014). Doing Bayesian Data Analysis: A Tutorial with R, JAGS, and Stan, Second Edition. New York: Academic Press. ISBN: 9780124058880
# Posterior estimates of length of stay # # Set up the Bayes object first, convert the chains blos1 <- Bayes(losmcmc, newdata=TRUE) #' # Examine Bayesian model diagnostics for estimated length of stay (LOS) # # Review traceplots, density plots, autocorrelation factor, and Gelman-Rubin statistic plot(blos1, y="dxt", parameter="muOfY") plot(blos1, y="dxd", parameter="muOfY") plot(blos1, y="dxa", parameter="muOfY") plot(blos1, y="dxg", parameter="muOfY") # And we can calculate statistics that combine multiple parameters (e.g., calculate # the mean difference between intervention and control groups). Here we get the # coefficient of variation by dividing the standard deviation by the mean. We use # the 'math' argument and arrange our parameters in the proper order. # We only need the first 4 arguments but we'll add a little more. plot(x=blos1, y="post", parameter=list("sigmaOfY", "muOfY" ),math="divide", bcol="cyan", HDItext=.3, main= "Coefficient of Variation") # Posterior Predictive Checks on how well our model fits the data # # Estimating center and spread for hospital length of stay # Generally, we only need the first 6 arguments but we'll modify the plot. # A model with a gamma likelihood would fit better but this is ok for now. plot(x=blos1, y="check", type="n", data=hosprog, dv="los", parameter=c("muOfY", "sigmaOfY"), breaks=30, cex.axis=1.3, lwd=3, xlab=NULL, pline=20, vlim=c(-2, 20), xlim=c(-2, 20), add.legend="topright", main="Length of Stay", cex.main=1.5, xpt=5, pcol="red", lcol="orange", cex.legend=1, bcol="cyan") # Estimating the regression trend line # Now lets look at the trend of conc on CO2 uptake from the CO2 data. # Using a quadratic model with conc^2 would help and an option in ham. # First, create the Bayes object bco2 <- Bayes(x=co2mcmc, newdata=TRUE ) # We generally only need the first 7 arguments below. plot(x=bco2, y="check", type="ol", data=CO2, dv="uptake", iv="conc", parameter=c("b0","b1"), add.data="al", cex.axis=1.3, lwd=1.5, pline=50, vlim=c(50, 1100), xlim=c(0, 1100), ylim=c(0, 50), cex=2, cex.lab=2, pcol="magenta", cex.main=2,cex.legend=1.2, add.legend="topleft", lcol="steelblue") #vlim lets me extrapolate a little # Hierarchical or Multilevel Model Summary # # We generally only need the first 3 arguments below. But we'll subset # on 8 of 12 plants in the level 2 model (observations nested in plants) # and modify other settings. plot(x=co2multi, y="multi", level=2, aorder=FALSE, subset= c("Qn2","Qn3","Qc3","Qc2","Mn3","Mn2","Mc2","Mc3"), lcol="blue", pcol= c("red", "skyblue"), round.c=1, bcol="yellow", xlim=c(-.1, 1), legend=NULL, add.legend="topright", lwd=3, cex.lab=1.2, cex= 2, cex.main=1.5, cex.axis=.75, cex.legend=1.5, X.Lab=NULL) # And now the level 3 plot (observation in plants in Treatment by type groups) plot(x=co2multi, y="multi", level=3, aorder=FALSE, lcol="blue", pcol= c("green", "pink"), round.c=1, bcol="lavender", xlim=c(-.1, 1), legend=NULL, add.legend="right", lwd=3, cex.lab =1.2, cex= 2, cex.main=1.5, cex.axis=.75, cex.legend=1.5, X.Lab=NULL) # Targets for length of stay (LOS) # # Our administrators ask how far are we from our goals, they ask about targets # in increments of 5 points of probability or specific days. We answer both. btarget1 <- Bayes(x=losmcmc, y="target", type="n", parameter=c("muOfY","sigmaOfY"), newdata=TRUE, target=list(p=c(.35,.4,.45, .5, .55), y=c(3,4))) # 'newdata' for plots # Target graph using the standard option, overlaying estimate of mode parameter plot(x=btarget1, y="target", type="n", lcol="purple", tgtcol="blue", xlim=c(3.5, 5)) # Target graph using a posterior predictive check, more intuitive plot(x=btarget1, y="target", type="n", data=hosprog, dv="los", breaks=30, cex.axis=1.3, lwd=3, pline=20, vlim=c(-1, 12), xlim=c(-1, 10), parameter=c("muOfY","sigmaOfY"), add.legend="topright", main="Length of Stay", cex.main=1.5, xpt=5, pcol="black", lcol="cyan", tgtcol="blue", bcol="orange", cex.legend=1.5, cex.text = 2)# Posterior estimates of length of stay # # Set up the Bayes object first, convert the chains blos1 <- Bayes(losmcmc, newdata=TRUE) #' # Examine Bayesian model diagnostics for estimated length of stay (LOS) # # Review traceplots, density plots, autocorrelation factor, and Gelman-Rubin statistic plot(blos1, y="dxt", parameter="muOfY") plot(blos1, y="dxd", parameter="muOfY") plot(blos1, y="dxa", parameter="muOfY") plot(blos1, y="dxg", parameter="muOfY") # And we can calculate statistics that combine multiple parameters (e.g., calculate # the mean difference between intervention and control groups). Here we get the # coefficient of variation by dividing the standard deviation by the mean. We use # the 'math' argument and arrange our parameters in the proper order. # We only need the first 4 arguments but we'll add a little more. plot(x=blos1, y="post", parameter=list("sigmaOfY", "muOfY" ),math="divide", bcol="cyan", HDItext=.3, main= "Coefficient of Variation") # Posterior Predictive Checks on how well our model fits the data # # Estimating center and spread for hospital length of stay # Generally, we only need the first 6 arguments but we'll modify the plot. # A model with a gamma likelihood would fit better but this is ok for now. plot(x=blos1, y="check", type="n", data=hosprog, dv="los", parameter=c("muOfY", "sigmaOfY"), breaks=30, cex.axis=1.3, lwd=3, xlab=NULL, pline=20, vlim=c(-2, 20), xlim=c(-2, 20), add.legend="topright", main="Length of Stay", cex.main=1.5, xpt=5, pcol="red", lcol="orange", cex.legend=1, bcol="cyan") # Estimating the regression trend line # Now lets look at the trend of conc on CO2 uptake from the CO2 data. # Using a quadratic model with conc^2 would help and an option in ham. # First, create the Bayes object bco2 <- Bayes(x=co2mcmc, newdata=TRUE ) # We generally only need the first 7 arguments below. plot(x=bco2, y="check", type="ol", data=CO2, dv="uptake", iv="conc", parameter=c("b0","b1"), add.data="al", cex.axis=1.3, lwd=1.5, pline=50, vlim=c(50, 1100), xlim=c(0, 1100), ylim=c(0, 50), cex=2, cex.lab=2, pcol="magenta", cex.main=2,cex.legend=1.2, add.legend="topleft", lcol="steelblue") #vlim lets me extrapolate a little # Hierarchical or Multilevel Model Summary # # We generally only need the first 3 arguments below. But we'll subset # on 8 of 12 plants in the level 2 model (observations nested in plants) # and modify other settings. plot(x=co2multi, y="multi", level=2, aorder=FALSE, subset= c("Qn2","Qn3","Qc3","Qc2","Mn3","Mn2","Mc2","Mc3"), lcol="blue", pcol= c("red", "skyblue"), round.c=1, bcol="yellow", xlim=c(-.1, 1), legend=NULL, add.legend="topright", lwd=3, cex.lab=1.2, cex= 2, cex.main=1.5, cex.axis=.75, cex.legend=1.5, X.Lab=NULL) # And now the level 3 plot (observation in plants in Treatment by type groups) plot(x=co2multi, y="multi", level=3, aorder=FALSE, lcol="blue", pcol= c("green", "pink"), round.c=1, bcol="lavender", xlim=c(-.1, 1), legend=NULL, add.legend="right", lwd=3, cex.lab =1.2, cex= 2, cex.main=1.5, cex.axis=.75, cex.legend=1.5, X.Lab=NULL) # Targets for length of stay (LOS) # # Our administrators ask how far are we from our goals, they ask about targets # in increments of 5 points of probability or specific days. We answer both. btarget1 <- Bayes(x=losmcmc, y="target", type="n", parameter=c("muOfY","sigmaOfY"), newdata=TRUE, target=list(p=c(.35,.4,.45, .5, .55), y=c(3,4))) # 'newdata' for plots # Target graph using the standard option, overlaying estimate of mode parameter plot(x=btarget1, y="target", type="n", lcol="purple", tgtcol="blue", xlim=c(3.5, 5)) # Target graph using a posterior predictive check, more intuitive plot(x=btarget1, y="target", type="n", data=hosprog, dv="los", breaks=30, cex.axis=1.3, lwd=3, pline=20, vlim=c(-1, 12), xlim=c(-1, 10), parameter=c("muOfY","sigmaOfY"), add.legend="topright", main="Length of Stay", cex.main=1.5, xpt=5, pcol="black", lcol="cyan", tgtcol="blue", bcol="orange", cex.legend=1.5, cex.text = 2)
Graph X-bar charts, p-charts, and u-charts. This includes producing means center lines, 3-sigma upper and lower control limits. Users can also calculate values before and after an intervention to see if a change in the control process happened. Values are returned in a data frame.
## S3 method for class 'control' plot( x, y = NULL, xlim = NULL, ylim = NULL, xlab = NULL, ylab = NULL, main = NULL, lwd = NULL, col = NULL, iname = NULL, icol = "black", trend = FALSE, trcol = "gray", tgt = NULL, tgtcol = "gray", tpline = NULL, tpcol = NULL, cex = 1, cex.lab = NULL, cex.axis = NULL, cex.main = NULL, cex.text = NULL, x.axis = NULL, y.axis = NULL, round.c = NULL, ... )## S3 method for class 'control' plot( x, y = NULL, xlim = NULL, ylim = NULL, xlab = NULL, ylab = NULL, main = NULL, lwd = NULL, col = NULL, iname = NULL, icol = "black", trend = FALSE, trcol = "gray", tgt = NULL, tgtcol = "gray", tpline = NULL, tpcol = NULL, cex = 1, cex.lab = NULL, cex.axis = NULL, cex.main = NULL, cex.text = NULL, x.axis = NULL, y.axis = NULL, round.c = NULL, ... )
x |
control object. |
y |
not currently used, default is NULL. |
xlim |
specify plot's x-axis limits with a 2 element numeric vector. |
ylim |
specify plot's y-axis limits with a 2 element numeric vector. |
xlab |
a character vector label for the x-axis. |
ylab |
a character vector label for the y-axis. |
main |
the main title of the plot. |
lwd |
select the line width. |
col |
a 2 element character vector to specify the 1) center line and 2) both control limit colors. Defaults to, if nothing selected, c("blue", "red"). |
iname |
intervention name text as a single character vector. Defaults to "Intervention" if nothing is selected. |
icol |
specify intervention line color as a single character vector. Defaults to "black" if nothing is selected. |
trend |
a logical vector that adds an ordinary least squares trend line (i.e., simple linear regression line) when trend=TRUE. Default is FALSE. |
trcol |
select a color for the trend line. Default is 'gray'. |
tgt |
specify 1 or more values on the y-axis of where to add one or more horizontal target lines. Default is NULL. |
tgtcol |
select one or multiple colors for one or multiple target lines. Default is 'gray'. |
tpline |
add one or more time point vertical lines using x-axis values. Default is NULL (i.e., no lines). |
tpcol |
specify a color for the time point line, tpline. Default is NULL. |
cex |
A numerical value giving the amount by which plotting text and symbols should be magnified relative to the default of 1. |
cex.lab |
The magnification to be used for x and y labels relative to the current setting of cex. |
cex.axis |
The magnification to be used for axis annotation relative to the current setting of cex. |
cex.main |
The magnification to be used for main titles relative to the current setting of cex. |
cex.text |
The magnification to be used for the iname text added into the plot relative to the current setting of 1. |
x.axis |
a vector of unique character or numeric values that makes up x-axis values to replace the time variable values. This will be most helpful if you prefer current calendar months/years instead of values starting at 1 (e.g., x.axis= sort(unique(data$Year)) for 1900-1999, not 1-100). Must have equal lengths for unique x.axis values and replaced values (i.e., nrow(x)). Default is NULL. |
y.axis |
a vector of unique character or numeric values that makes up y-axis values to replace the outcome variable values. This will be most helpful if your outcome needs to be converted such as rate per 1,000 patient days (e.g., y.axis= seq(min(x$HAI)*1000, max(x$HAI)*1000, length.out=nrow(x))). Must have equal lengths for unique y.axis values and replaced values (i.e., nrow(x)). Default is NULL. |
round.c |
an integer indicating the number of decimal places when rounding numbers such as for y.axis. Default is 2. |
... |
additional arguments. |
plot of Shewhart control charts: X-bar charts, p-charts, and u-charts with 3-sigma control limits.
Ryan, T. (2011). Statistical Methods for Quality Improvement, Third Edition. John Wiley & Sons, Inc. ISBN: 978-0-470-59074-4.
## Hospital LOS and readmissions ## # X-bar chart spc_x <- control(x="los", time="month", data=hosprog, type="x", n.equal=TRUE) # Basic X-bar chart plot(spc_x) # p-chart, using only the numerator (i.e., y=NULL). Specify unequal sample sizes spc_p <- control(x="rdm30", time="month", data=hosprog, type="p", n.equal=FALSE) # p-chart, adding target and time point lines plot(spc_p, tgt=c(0,.25), tgtcol="green", ylim=c(0,0.4), tpline=c(4,8), tpcol= c("yellow","black")) # u-chart for infection rates with an intervention spc_u <- control(x="HAI", y="PatientDays", time="Month", data=infections, type="u", n.equal=FALSE, intervention=22) # u-chart with trend lines, various graphing options, x.axis start at 2nd year # and y.axis changed to show HAIs per 1,000 patient days plot(spc_u, main="u-Chart: HAI per 1,000 Patient Days Pre/Post Intervention", col=c("green","dodgerblue"), trend=TRUE, trcol="red", x.axis=c((1:41+12)), round.c=1, y.axis=seq(min(spc_u$HAI)*1000, max(spc_u$HAI)*1000, length.out=nrow(spc_u)), xlab="Months (starting at year 2)", icol="gray", lwd=2, cex=2, cex.axis=1.1, cex.main=1.25, cex.text=1.25)## Hospital LOS and readmissions ## # X-bar chart spc_x <- control(x="los", time="month", data=hosprog, type="x", n.equal=TRUE) # Basic X-bar chart plot(spc_x) # p-chart, using only the numerator (i.e., y=NULL). Specify unequal sample sizes spc_p <- control(x="rdm30", time="month", data=hosprog, type="p", n.equal=FALSE) # p-chart, adding target and time point lines plot(spc_p, tgt=c(0,.25), tgtcol="green", ylim=c(0,0.4), tpline=c(4,8), tpcol= c("yellow","black")) # u-chart for infection rates with an intervention spc_u <- control(x="HAI", y="PatientDays", time="Month", data=infections, type="u", n.equal=FALSE, intervention=22) # u-chart with trend lines, various graphing options, x.axis start at 2nd year # and y.axis changed to show HAIs per 1,000 patient days plot(spc_u, main="u-Chart: HAI per 1,000 Patient Days Pre/Post Intervention", col=c("green","dodgerblue"), trend=TRUE, trcol="red", x.axis=c((1:41+12)), round.c=1, y.axis=seq(min(spc_u$HAI)*1000, max(spc_u$HAI)*1000, length.out=nrow(spc_u)), xlab="Months (starting at year 2)", icol="gray", lwd=2, cex=2, cex.axis=1.1, cex.main=1.25, cex.text=1.25)
Graph decide class object's model classification results as well as Decision Curve analysis output for net benefit and interventions saved.
## S3 method for class 'decide' plot( x, y = NULL, main = NULL, xlab = NULL, ylab = NULL, xlim = NULL, ylim = NULL, lwd = NULL, bcol = NULL, lcol = NULL, add.legend = NULL, legend = NULL, cex = 1, cex.lab = NULL, cex.axis = NULL, cex.main = NULL, cex.legend = NULL, round.c = 2, ... )## S3 method for class 'decide' plot( x, y = NULL, main = NULL, xlab = NULL, ylab = NULL, xlim = NULL, ylim = NULL, lwd = NULL, bcol = NULL, lcol = NULL, add.legend = NULL, legend = NULL, cex = 1, cex.lab = NULL, cex.axis = NULL, cex.main = NULL, cex.legend = NULL, round.c = 2, ... )
x |
decide object. |
y |
type of plot to display. Select either 'nb', 'is', or 'cl' for a decision curve analysis 'net benefit' and 'interventions saved', or a model classification (e.g., sensitivity and specificity) according to a selected threshold. Net benefit and interventions saved display results for specific percentiles found between the 1st and 99th percentiles. Default is 'nb'. |
main |
the main title of the plot. |
xlab |
a character vector label for the x-axis. |
ylab |
a character vector label for the y-axis. |
xlim |
specify plot's x-axis limits with a 2 element numeric vector. |
ylim |
specify plot's y-axis limits with a 2 element numeric vector. When y='is', the y-axis labels are in integers (e.g., 1 to 100) but the actual scale is in decimals (0.0 to 1.0), please note when making y-axis limits. |
lwd |
select the line width. |
bcol |
a multiple element character vector of length == 2 to specify the bar, band, or block colors that are the shading of the true and false classification regions of the plot (e.g., true-positive and false-negative). When y='cl', the first color represents 'true' and the second color is for 'negative'. Default is null, if none selected, the colors are c('green', 'red'). |
lcol |
a single or multiple element character vector to specify the line color(s). When y='nb', select up to 3 colors in this order for model, 'net benefit', 'all treated', and 'none treated' line colors. When y='is', select 1 color for the interventions saved line. And when y='cl', select 1 color to represent the selected threshold. |
add.legend |
add a legend by selecting the location as "bottomright", "bottom", "bottomleft", "left", "topleft", "top", "topright", "right", "center". Default is no legend produced if nothing is selected. |
legend |
a character vector of length >= 2 to appear when y='nb' and y='cl' with legend description. |
cex |
A numerical value giving the amount by which plotting text and symbols should be magnified relative to the default of 1. |
cex.lab |
The magnification to be used for x and y labels relative to the current setting of cex. |
cex.axis |
The magnification to be used for axis annotation relative to the current setting of cex. |
cex.main |
The magnification to be used for main titles relative to the current setting of cex. |
cex.legend |
The magnification to be used for the legend added into the plot relative to the current setting of 1. |
round.c |
an integer indicating the number of decimal places when rounding numbers y='multi' and y='target'. Default is 2. |
... |
additional arguments. |
plots of model classification, net benefit, and interventions saved.
Vickers, A. & Elkin, E. (2006). Decision Curve Analysis: A Novel Method for Evaluating Prediction Models. Society for Medical Decision Making, 26, 6, 565-574. https://doi.org/10.1177/0272989X06295361
## Predicting car engine shape type, v or straight ## # run the model car_m1 <- assess(formula=vs ~ hp + am, data=mtcars, regression="logistic") # create a decide object, enter the model name and a threshold on the logit scale d1 <- decide(x=car_m1, threshold= -0.767) #Plot the classification results plot(x=d1, y= "cl") #Plot the net benefit plot(x=d1, y= "nb") #Plot the interventions saved plot(x=d1, y= "is")## Predicting car engine shape type, v or straight ## # run the model car_m1 <- assess(formula=vs ~ hp + am, data=mtcars, regression="logistic") # create a decide object, enter the model name and a threshold on the logit scale d1 <- decide(x=car_m1, threshold= -0.767) #Plot the classification results plot(x=d1, y= "cl") #Plot the net benefit plot(x=d1, y= "nb") #Plot the interventions saved plot(x=d1, y= "is")
Confidence interval graphs for group class objects
## S3 method for class 'group' plot( x, y = "group", order = "alpha", gcol = "blue", gband = FALSE, pcol = "red", overall = FALSE, ocol = "gray", oband = FALSE, tgt = NULL, tcol = "gray", tpline = NULL, tpcol = "gray", xlim = NULL, ylim = NULL, main = NULL, xlab = NULL, ylab = NULL, lwd = 1, adj.alpha = 0.4, cex = 1, cex.axis = 1, cex.lab = 1, cex.main = 1, cex.text = 1, round.c = 2, name = FALSE, abbrv = 5, roll.x = "Stop", ... )## S3 method for class 'group' plot( x, y = "group", order = "alpha", gcol = "blue", gband = FALSE, pcol = "red", overall = FALSE, ocol = "gray", oband = FALSE, tgt = NULL, tcol = "gray", tpline = NULL, tpcol = "gray", xlim = NULL, ylim = NULL, main = NULL, xlab = NULL, ylab = NULL, lwd = 1, adj.alpha = 0.4, cex = 1, cex.axis = 1, cex.lab = 1, cex.main = 1, cex.text = 1, round.c = 2, name = FALSE, abbrv = 5, roll.x = "Stop", ... )
x |
group object. |
y |
type of confidence interval object, specify either 'group', 'time', or 'roll'. |
order |
specify confidence interval object order as 'alpha' or 'numeric' for alphabetical or numerical ordering in the 'group' graph. |
gcol |
pick confidence interval line colors for groups in the 'group' graph. Default is 'blue'. |
gband |
logical TRUE or FALSE that indicates whether group lines have confidence bands for trend over time results. Default is FALSE. |
pcol |
select point color for 'group' only confidence intervals. Default is 'red'. |
overall |
logical TRUE or FALSE that indicates whether to include the overall sample confidence intervals (i.e., not each group). Default is FALSE. |
ocol |
indicate the optional overall line color. Default is 'gray' when overall=TRUE. |
oband |
logical TRUE or FALSE that indicates whether to add an overall confidence band. Default is FALSE. |
tgt |
specify 1 or more values on the x-axis of where to add a target line when y='group'. Or 1 or more values on the y-axis of where to add a target line when y='time' or 'roll'. Default is NULL. |
tcol |
select a color for the target line. Default is 'gray'. |
tpline |
add one or more time point vertical line(s) using x-axis values when y='time' or y='roll'. Default is NULL. |
tpcol |
specify a color for the time point line, tpline. Default is NULL. |
xlim |
specify plot's x-axis limits with a 2 value vector. |
ylim |
specify plot's y-axis limits with a 2 value vector. |
main |
the main title of the plot. |
xlab |
a vector label for the x-axis. |
ylab |
a vector label for the y-axis. |
lwd |
select the line width. Default is 1. |
adj.alpha |
factor modifying the opacity alpha of the confidence interval bands, in the range of 0 to 1. Default is 0.4. |
cex |
A numerical value giving the amount by which plotting text and symbols should be magnified relative to the default of 1. |
cex.axis |
The magnification to be used for axis annotation relative to the current setting of cex. Default is 1. |
cex.lab |
The magnification to be used for x and y labels relative to the current setting of cex. Default is 1. |
cex.main |
The magnification to be used for main titles relative to the current setting of cex. Default is 1. |
cex.text |
The magnification to be used for the text added into the plot relative to the current setting of 1. |
round.c |
an integer indicating the number of decimal places to be used for rounding values. Default is 2. |
name |
logical TRUE or FALSE that indicates whether group names should be added to the 'time' or 'roll' plots. Default is FALSE. |
abbrv |
the minimum length of the abbreviations. Default is 5. |
roll.x |
one of 2 character options to graph rolling average x-axis values as either the 'Start' or 'Stop' of the time period. Selecting 'Start' indicates the time point on the x-axis as the first month (e.g., roll=12 will have a graph that starts at 1 on the x-axis). The default is 'Stop' (e.g., roll=12 will have a graph that starts at 12 on the x-axis). |
... |
additional arguments. |
plot of group level confidence intervals, including estimates over time periods.
#Simple graph for confidence intervals using the t-distribution gr1 <- group(x="program", y="los", z="month", data=hosprog, dist="t", increment=3, rolling=6) # Group level confidence intervals plot(x=gr1, y="group", order="numeric", lwd=4, gcol= "blue", pcol="red", overall=TRUE, oband=TRUE, ocol="gray", tcol="green", tgt=4.5, cex=1, cex.axis=1, cex.lab=1, cex.text=2, cex.main=1.25, name=TRUE, adj.alpha=.2) #Trend plots over time in the 3 month increments (i.e., quarters) plot(x=gr1, y="time", lwd=4, gcol=c("red", "blue"), gband=TRUE, overall=TRUE, oband=TRUE, ocol="gray", tcol="green", tgt=4, tpline=3, tpcol="yellow", name=TRUE, cex.axis=1, cex.lab=1, cex.text=2, cex.main=1.25, adj.alpha=.3) #Plot for rolling 6-month averages plot(x=gr1, y="roll", lwd=4, gcol=c("red", "blue"), gband=TRUE, overall=TRUE, oband=TRUE, ocol="gray", tcol="green", tgt=4, tpline=c(4,6), tpcol="yellow", name=TRUE, cex.axis=1, cex.lab=1, cex.text=2, cex.main=1.25, adj.alpha=.3) # Helpful note: You can plot aggregated data if you used the argument asis=TRUE in group(),#Simple graph for confidence intervals using the t-distribution gr1 <- group(x="program", y="los", z="month", data=hosprog, dist="t", increment=3, rolling=6) # Group level confidence intervals plot(x=gr1, y="group", order="numeric", lwd=4, gcol= "blue", pcol="red", overall=TRUE, oband=TRUE, ocol="gray", tcol="green", tgt=4.5, cex=1, cex.axis=1, cex.lab=1, cex.text=2, cex.main=1.25, name=TRUE, adj.alpha=.2) #Trend plots over time in the 3 month increments (i.e., quarters) plot(x=gr1, y="time", lwd=4, gcol=c("red", "blue"), gband=TRUE, overall=TRUE, oband=TRUE, ocol="gray", tcol="green", tgt=4, tpline=3, tpcol="yellow", name=TRUE, cex.axis=1, cex.lab=1, cex.text=2, cex.main=1.25, adj.alpha=.3) #Plot for rolling 6-month averages plot(x=gr1, y="roll", lwd=4, gcol=c("red", "blue"), gband=TRUE, overall=TRUE, oband=TRUE, ocol="gray", tcol="green", tgt=4, tpline=c(4,6), tpcol="yellow", name=TRUE, cex.axis=1, cex.lab=1, cex.text=2, cex.main=1.25, adj.alpha=.3) # Helpful note: You can plot aggregated data if you used the argument asis=TRUE in group(),
Plots an importance class object. Produces a dot chart that places the predictor variable with the highest partial chi-square (Wald chi-square for individual coefficients) at the bottom. It is a metric of the partial chi-square minus its degrees of freedom (Harrell, 2015). Predictor variables with significant p-values at the 0.05 alpha are highlighted red. Consider graphical parameters of mar=c(4.2, 2, 3.5, 3) and oma = c(0, 0, 0, 3).
## S3 method for class 'importance' plot( x, y, main = NULL, cex = NULL, pt.cex = NULL, pch = NULL, color = NULL, lcolor = NULL, ... )## S3 method for class 'importance' plot( x, y, main = NULL, cex = NULL, pt.cex = NULL, pch = NULL, color = NULL, lcolor = NULL, ... )
x |
importance object. |
y |
not currently used. |
main |
overall title for the plot, default is 'Variable Importance'. |
cex |
the character size to be used. Setting cex to a value smaller than 1 can be a useful way of avoiding label overlap. This sets the actual size, not a multiple of par('cex'). |
pt.cex |
the cex to be applied to plotting symbols, default is 2. |
pch |
the plotting character or symbol to be used, default is 19. |
color |
the color to be used for points and labels when there are significant results. Default is 'red'. |
lcolor |
the color(s) to be used for the horizontal lines. Default is 'gray'. |
... |
additional arguments. |
plot of variable importance, significant variables highlighted in red.
Harrell, F. E., Jr. (2016). Regression Modeling Strategies. Springer International Publishing. ISBN: 978-3-319-19424-0.
# OLS regression plot(importance(assess(mpg ~ hp + wt + cyl, data=mtcars, regression= "ols")$model)) # logistic regression plot(importance(assess(vs~mpg+wt+hp, data=mtcars, regression= "logistic")$model))# OLS regression plot(importance(assess(mpg ~ hp + wt + cyl, data=mtcars, regression= "ols")$model)) # logistic regression plot(importance(assess(vs~mpg+wt+hp, data=mtcars, regression= "logistic")$model))
Formats alpha class results to display summary statistics of scale information. These include the overall alpha, scale mean and standard deviation, item statistics, scale statistics if an item is removed from the scale, and the total sample size.
## S3 method for class 'alpha' print(x, ...)## S3 method for class 'alpha' print(x, ...)
x |
alpha object from Cronbach's alpha calculation. |
... |
Additional arguments. |
formatted alpha results.
print(alpha(items=c("i1","i2","i3","i4","i5"), data=cas))print(alpha(items=c("i1","i2","i3","i4","i5"), data=cas))
Formats interpretations from interpret class objects. Provides simple interpretations of regression coefficients and Cronbach's alpha. Print specific model interpretations (or run all), returned in sentence and paragraph formats.
## S3 method for class 'interpret' print(x, ...)## S3 method for class 'interpret' print(x, ...)
x |
interpret object. |
... |
Additional arguments. |
formatted interpret object results.
#Cronbach's alpha print(interpret(alpha(items=c("i1","i2","i3","i4","i5"), data=cas))) #' # interpret a standard linear (OLS) regression hos1 <- assess(formula=survey ~ program + month, data=hosprog, regression= "ols") print(interpret(hos1)$model) # interpret a differences-in-differences model hos2 <- assess(formula=survey ~ ., data=hosprog, intervention = "program", int.time="month", treatment = 5, did="two", newdata=TRUE) interpret(hos2)$did # interpret an interrupted time series model hos3 <- assess(formula=survey ~ ., data=hosprog, intervention = "program", int.time="month", its="two", interrupt = 5) interpret(hos3)$its#Cronbach's alpha print(interpret(alpha(items=c("i1","i2","i3","i4","i5"), data=cas))) #' # interpret a standard linear (OLS) regression hos1 <- assess(formula=survey ~ program + month, data=hosprog, regression= "ols") print(interpret(hos1)$model) # interpret a differences-in-differences model hos2 <- assess(formula=survey ~ ., data=hosprog, intervention = "program", int.time="month", treatment = 5, did="two", newdata=TRUE) interpret(hos2)$did # interpret an interrupted time series model hos3 <- assess(formula=survey ~ ., data=hosprog, intervention = "program", int.time="month", its="two", interrupt = 5) interpret(hos3)$its
USA's unemployment rate between 1929 and 2024
unemploymentunemployment
unemploymentAn artificial data frame with 96 rows and 5 columns:
Calendar year as an integer from 1929 to 2024.
Unemployment rate defined as the proportion of the US Labor Market that were unemployed.
Notable events in the United States that may have impacted the unemployment rate. Other than notable events (or unspecified) listed as 'other'.
Integer for the number of years for 1929 to 2024, ranging from 1 to 96.
Value of 1 to indicate data for the USA.
...
unemployment is an artificial data frame of reasonable estimates made entirely for the purpose of demonstrating interrupted time series with many interruptions. For precise rates, please see a reliable source.