The purpose of this vignette is to present a practical worked example
of how to design a discrete choice experiment without blocking. It
follows two examples both explained at greater lengths in the
complimentary vignette titled: “Theoretical introduction to
ExpertChoice
”
The process of choosing a design often involves iterating over steps
0 to 4. Some designs are more difficult to create than others. The
Theoretical Introduction to ExpertChoice
presents two
designs which are illustrated in this practical vignette.
Here are some practical suggestions as to what makes a good design. 1. In general avoid attributes with only two levels. The design such as the one below suffers because it is difficult to convert from the fractional factorial of this design into an efficient choice experiment design. The lack of efficiency is not from the methods of converting, but inherit in the fact that achieving the minimal overlap when there is only two levels is difficult.
First load the ExpertDesign
package into your R
environment.
Create a list object which specifies the name of the variables as
well as their respective levels.
NB: In ordinal data the levels should be integer sequential and
start from 0. NB: In categorical data the levels should
be integer sequential and start from 1. In an ordinal data
experiment the level at 0 is always the reference point. As the proposed
design above is a 55 I have
chosen to denote the object as attr55
:
attri55 <- list(
maker = c("0", "1", "2", "3", "4"),
technical = c("0", "1", "2", "3", "4"),
category_rarity = c("0", "1", "2", "3", "4"),
size = c("0", "1", "2", "3", "4"),
age = c("0", "1", "2", "3", "4")
)
Calling the list object something like this is advantageous because
you could have multiple competing designs still at this stage. The
following design is 4421 and then denoted
here as attri4521
:
attri4521 <- list(
maker = c("0", "1", "2", "3"),
technical = c("0", "1", "2", "3"),
category_rarity = c("0", "1", "2", "3"),
size = c("0", "1", "2", "3"),
age = c("0", "1", "2", "3"),
provenance = c("0", "1")
)
Create the full factorial object. Using the design specification as a suffix remains a handy way of keeping track of the design.
The full factorial will contain many rows. The first five rows and the last five are given below:
rbind(head(ff55, 5), tail(ff55, 5))
#> maker technical category_rarity size age
#> 1 0 0 0 0 0
#> 2 1 0 0 0 0
#> 3 2 0 0 0 0
#> 4 3 0 0 0 0
#> 5 4 0 0 0 0
#> 3121 0 4 4 4 4
#> 3122 1 4 4 4 4
#> 3123 2 4 4 4 4
#> 3124 3 4 4 4 4
#> 3125 4 4 4 4 4
Every variable in the full factorial has the standardised orthogonal contrast applied. These contrasts are very useful when evaluating the efficacy of a design. This is simply illustrative of what the contrasts look like for one of the variables:
Once the full factorial is constructed it is possible to augment it
with additional information. Many of these augmentations happen as
attributes. This includes adding the B-matrix for main effects, an
important matrix in DCE efficiency, as described by Street et al… The
prefix aff
is used to refer to the augmented (full)
factorial. (You could of course name the object whatever you
prefer.)
A console log will appear stating that the processes of applying the
B-matrix has started. If you do not get this message then the B-matrix
cannot be added and a warning will be given. (Please open a GitHub issue
if this is the case. I am not aware of instances where this should
happen.) The B-matrix plays an important role in the choice efficiency
of design. Below are ten random rows drawn from the augmented full
factorial. Notice the additional of the levels
column.
aff55[sample(nrow(aff55), 10), ]
#> maker technical category_rarity size age levels
#> 780 4 0 1 1 1 40111
#> 1250 4 4 4 4 1 44441
#> 1300 4 4 1 0 2 44102
#> 3055 4 0 2 4 4 40244
#> 2960 4 1 3 3 4 41334
#> 3040 4 2 1 4 4 42144
#> 2410 4 1 1 4 3 41143
#> 2920 4 3 1 3 4 43134
#> 1717 1 3 3 3 2 13332
#> 1193 2 3 2 4 1 23241
There are many ways to create a fractional factorial design.
Practically speaking though two methods are designed to be flawlessly
integrated into this package. These are the construction of a fractional
factorial design using an orthogonal array with either the
DoE.MIParray
or DoE.base
packages or using
D-optimal fractional factorial
designs from the AlgDesign
package. (The
AlgDesign
method could be better integrated. If you want to
use this please open a GitHub issue. Will be sorted quickly.)
DoE.MIParray
or
DoE.base
)The function oa_feasible()
from the
DoE.base
package (DoE.base::oa_feasible()
)
provides many methods for determining if a particular design can be
construed with ND rows. For the
silver research (by Jed Stephens) it was found that the following design
was feasible. It is possible to specify higher resolution designs. There
are reasons why higher resolution designs could be advantageous.
# Design: DF: 17, 32 OA (Resolution II), 64 OA (Resolution III)
nlevels <- unlist(purrr::map(ff55, function(x){length(levels(x))}))
oa_feasible(25, nlevels, strength = 2)
#> no violation of necessary criteria for strength 2 was found
#> [1] TRUE
Using the DoE.base
package it is possible to construct a
25 .
When constructing a design using the DoE.MIParray
… The
function mosek_MIParray()
was used to construct the example
64 run orthogonal array included with this package.
# Not run because it requires time as well as some setting up if this is your first time.
# See DoE.MIParray for more details.
#fractional_factorial_55_25 <- gurobi_MIParray(25, nlevels)
Note how the functions in DoE use a slightly different notation to index the factors. The base is given level 1 in this package. This is not really a concern because it will be seamlessly converted shortly.
Not yet discussed or though it should be achievable with minimal effort. If a reader wishes for this example to be completed before I have done so please open a GitHub issue and I shall happily oblige completing.
The ability to use multiple different packages to construct the fractional factorial design is ensured by this step. There can exist small differences between the different methods which require some fiddling.
The results of the mosek_MIParray
function are
orthogonal arrays without colnames. Hence in this instance the colnames
need to be added. This design clearly needed to be made with the full
factorial in mind. Hence the colnames from the ff4521
object are appropriate. Note: the colnames from the aff4521
would include the levels column – hence avoid these…
colnames(fractional_factorial_55_25) <- colnames(ff55)
fractional_f55_25 <- search_design(ff55, fractional_factorial_55_25)
The result is a fractional factorial design. Importantly though the fractional factorial design retains and inherits information from the full factorial such as the standardised orthogonal coding. To mark that many such attributes are held a special attribute is assigned to the object.
# Check to see if the searched attribute exists on the fractional_f4521_64 object.
attributes(fractional_f55_25)$searched
#> [1] TRUE
Once an object is search converted it is now easy to run diagnostics.
The generalised world length patterns gives a good overall summary of the design.
From this we can tell that this fractional factorial design is resolution IV i.e. strength of 3. Hence the all main effects are estimable free of each other, but some are confounded with two-attribute interactions.
The function fractional_factorial_efficiency
provides a
formula based method of investigating the proposed fractional factorial
design in more details. This function also includes in its list of
results the GWLP so there is no need to specify it.
Two examples are given which follow the two examples in the associated note.
# Test for main effects
main_effects <- fractional_factorial_efficiency(~ maker + technical + category_rarity + size + age, fractional_f55_25)
#> Your fractional factorial design has an A-efficiency of100%
#> Your fractional factorial design has a D-efficiency of100%
The resultant object has the following objects:
names(main_effects)
#> [1] "X" "information_mat" "inv_information_mat"
#> [4] "lamda_mat" "inv_diag" "GWLP"
#> [7] "A_eff" "D_eff"
Check the package help file for the
fractional_factorial_efficiency()
function for a full
description. Also see the associated note for a more technical
description.
# Test for main effects and interactions described in note.
#main_plus_interacts <- fractional_factorial_efficiency(~ maker * technical + category_rarity + size + age, fractional_f55_25)
This design supports only a single set of two-attribute interactions i.e. maker interact technical, or size interact age or age interact provenance etc. However it does not support more than two sets of two-attribute interactions: i.e. in this instance the maker interact technical and age interact provenance.
In instances where some of the stipulated effects cannot be estimated (such as above) then the D-efficiency would be NaN and similarly the A-efficiency is zero.
In general there are a few methods for converting between fractional
factorial to discrete choice experiments. The Theoretical introduction
to ExpertChoice
vignette gives more details. Although it is
intended to implement the Modulo Method, LMA, Rotation and mix-and-match,
currently only the Modulo method is implemented. Which is sensible
because as methods go it creates smaller and more efficient designs than
these others. Notwithstanding this, if it is of interest to implement
other methods these can be included in this package with limited
difficulty. Please raise a GitHub issue.
See the Theoretical introduction to ExpertChoice
vignette for more details on how this method works.
The modulators are supplied as vectors contained within a list. The number of vectors provided determines the number of choice cards within a given choice set.
Sometimes a particular choice may be Pareto dominate over all other choices. In such an instance there is no need to ask this question as it can be automatically answered and later augmented to the respondent data. It is important to remember to augment Pareto dominate choice sets if any exist.
checking_overshadow <- check_overshadow(dce_modulo)
# The matrix of indixes indicate that for row 1, 7, 13, 19 and 25 there are Pareto dominate solutions.
checking_overshadow
#> [,1] [,2]
#> [1,] TRUE TRUE
#> [2,] FALSE FALSE
#> [3,] FALSE FALSE
#> [4,] FALSE FALSE
#> [5,] FALSE FALSE
#> [6,] FALSE FALSE
#> [7,] TRUE TRUE
#> [8,] FALSE FALSE
#> [9,] FALSE FALSE
#> [10,] FALSE FALSE
#> [11,] FALSE FALSE
#> [12,] FALSE FALSE
#> [13,] TRUE TRUE
#> [14,] FALSE FALSE
#> [15,] FALSE FALSE
#> [16,] FALSE FALSE
#> [17,] FALSE FALSE
#> [18,] FALSE FALSE
#> [19,] TRUE TRUE
#> [20,] FALSE FALSE
#> [21,] FALSE FALSE
#> [22,] FALSE FALSE
#> [23,] FALSE FALSE
#> [24,] FALSE FALSE
#> [25,] TRUE TRUE
This calculates the D-efficiency of the design as per Street, D.J.,
Burgess, L. and Louviere, J.J., 2005. Quick and easy choice sets:
constructing optimal and nearly optimal stated choice experiments.
International Journal of Research in Marketing, 22(4), pp.459-470. The
pages of direct interest are 462 - 463 and these should be read in
conjunction with the Theoretical introduction to
ExpertChoice
vignette. All the cases are printed and you
can compare the calculation to the original paper.
dce_modulo_efficacy <- dce_efficiency(aff55, dce_modulo)
#>
#> q is1
#> L is5
#> Case 4
#> s is3
#>
#> q is2
#> L is5
#> Case 4
#> s is3
#>
#> q is3
#> L is5
#> Case 4
#> s is3
#>
#> q is4
#> L is5
#> Case 4
#> s is3
#>
#> q is5
#> L is5
#> Case 4
#> s is3
#>
#>
#> The D-efficiency of this discrete choice experiment is98.883%
The function construct_question_frame
is helpful with
the final stages. It consistently converts a choice_set
arrangement into a data.frame
.
It is now time to add some useful information back to the levels. Originally these were described in Step 0, but up until this point it has been necessary to work with only integer values. (Also just imagine if you had worked with these very long names up until this point…)
levels(question_table_f55$maker) <- c("common (bottom 50% of makers)", "known to specialists (50% to 65% of makers)", "recognised (65% to 80%)", "famous (80% to 90%)", "celebrated top 10%")
levels(question_table_f55$technical) <- c("below average (below 50% of craftmanship)", "good (50% to 65%)", "meritorious (65% to 80%)", "distinguished (80% to 90%)", "exquisite (top 10%)")
levels(question_table_f55$category_rarity) <- c("common (bottom 20%)", "uncommon (20% to 40%)", "rare (40% to 60%)", "very rare (60% to 80%)", "exceptional (top 20%)")
levels(question_table_f55$size) <- c("petite: under 125g", "small: between 126g and 275g", "medium: between 276g and 600g", "large: between 601g and 1200g", "extra large: exceeds 1200g")
levels(question_table_f55$age) <- c("1951-present", "1951-present", "1951-present", "1951-present", "before 1800")
# View(question_table_f4521)
Viola! Here is the completed question dataframe.
question_table_f55
#> # A tibble: 75 × 8
#> question choice levels maker technical category_rarity size age
#> <int> <int> <chr> <fct> <fct> <fct> <fct> <fct>
#> 1 1 1 23401 recognised (65%… distingu… exceptional (t… peti… 1951…
#> 2 1 2 40123 celebrated top … below av… uncommon (20% … medi… 1951…
#> 3 1 3 12340 known to specia… meritori… very rare (60%… extr… 1951…
#> 4 2 1 40123 celebrated top … below av… uncommon (20% … medi… 1951…
#> 5 2 2 12340 known to specia… meritori… very rare (60%… extr… 1951…
#> 6 2 3 34012 famous (80% to … exquisit… common (bottom… smal… 1951…
#> 7 3 1 11111 known to specia… good (50… uncommon (20% … smal… 1951…
#> 8 3 2 33333 famous (80% to … distingu… very rare (60%… larg… 1951…
#> 9 3 3 00000 common (bottom … below av… common (bottom… peti… 1951…
#> 10 4 1 01234 common (bottom … good (50… rare (40% to 6… larg… befo…
#> # ℹ 65 more rows
This DCE construction is described in the Theory Introduction to
ExpertChoice
vignette. It is intended to show more
succinctly (than the silver example above) how to design such an
experiment.
#Step 0
# Described in Theory
attri3261 <- list(
starter = c("1", "2", "3"),
main = c("1", "2", "3", "4", "5", "6"),
dessert = c("1", "2", "3")
)
# Step 1
ff_examp <- full_factorial(attri3261)
# Step 2
aff_examp <- augment_levels(ff_examp)
#> Applying B mat
#write.csv(ff_examp, "example.csv")
# Step 3
nlevels <- unlist(purrr::map(ff_examp, function(x){length(levels(x))}))
#oa_feasible(36, nlevels, strength = 3)
fractional_factorial_3261_18 <- oa.design(nlevels = nlevels, columns = "min34")
# The fractional_factorial design is generated using the DoE.MIParray package.
# The following is the command to run this generation.
# The result is saved in the package.
# Step 4
# Confirming that this is an efficient design.
colnames(fractional_factorial_3261_18) <- colnames(ff_examp)
fractional_factorial_3261_18 <- search_design(ff_examp, fractional_factorial_3261_18)
# Step 5.
# This table is reported as Table
# Confirm that this design supports all interactions.
row1_main_effects <- fractional_factorial_efficiency(~ starter + main + dessert, fractional_factorial_3261_18)
#> Your fractional factorial design has an A-efficiency of100%
#> Your fractional factorial design has a D-efficiency of100%
# Step 6.
# Two different card options
# Option 1
dce_modulo_examp1 <- modulo_method(
fractional_factorial_3261_18,
list(c(1, 0, 1), c(0,1,0))
)
# Option 2.
dce_modulo_examp2 <- modulo_method(
fractional_factorial_3261_18,
list(c(1, 0, 1), c(1,3,1), c(0,5,0))
)
# Step 7
# This experiment uses categorical data (not ordinal) hence there can be no pareto dominate solution.
# Each category is merely a choice.
# Step 8.
# Compare the efficiencies.
dce_efficency_menu_example1 <- dce_efficiency(aff_examp, dce_modulo_examp1)
#>
#> q is1
#> L is3
#> Case 3
#> The implied x is1and y is0
#> s is3
#>
#> q is2
#> L is6
#> Case 4
#> s is3
#>
#> q is3
#> L is3
#> Case 3
#> The implied x is1and y is0
#> s is3
#>
#>
#> The D-efficiency of this discrete choice experiment is61.038%
dce_efficency_menu_example2 <- dce_efficiency(aff_examp, dce_modulo_examp2)
#>
#> q is1
#> L is3
#> Case 3
#> The implied x is1and y is1
#> s is5
#>
#> q is2
#> L is6
#> Case 4
#> s is6
#>
#> q is3
#> L is3
#> Case 3
#> The implied x is1and y is1
#> s is5
#>
#>
#> The D-efficiency of this discrete choice experiment is80.683%
# Option 2 is much more efficient so let's use that version!
# Step 9
# Construct the question table
menu_question_table <- construct_question_frame(aff_examp, dce_modulo_examp2)
# Finally augment the question table. See Table 1 in the Theoretical Vignette.
levels(menu_question_table$starter) <- c("Tomato Soup", "Duck Rillettes", "Seafood Chowder")
levels(menu_question_table$main) <- c("Roast Pheasant", "Pan Fried Hake", "Pork Belly", "Mushroom Risotto", "Sirloin Steak", "Vegetable Bake")
levels(menu_question_table$dessert) <- c("Sticky Toffee Pudding", "Chocolate & Hazelnut Brownie", "Cheesecake")
#View(menu_question_table)
Finally here is the resulting question table.
menu_question_table
#> # A tibble: 72 × 6
#> question choice levels starter main dessert
#> <int> <int> <chr> <fct> <fct> <fct>
#> 1 1 1 163 Tomato Soup Vegetable Bake Cheesecake
#> 2 1 2 261 Duck Rillettes Vegetable Bake Sticky Toffee Pudding
#> 3 1 3 231 Duck Rillettes Pork Belly Sticky Toffee Pudding
#> 4 1 4 153 Tomato Soup Sirloin Steak Cheesecake
#> 5 2 1 142 Tomato Soup Mushroom Risotto Chocolate & Hazelnut …
#> 6 2 2 243 Duck Rillettes Mushroom Risotto Cheesecake
#> 7 2 3 213 Duck Rillettes Roast Pheasant Cheesecake
#> 8 2 4 132 Tomato Soup Pork Belly Chocolate & Hazelnut …
#> 9 3 1 132 Tomato Soup Pork Belly Chocolate & Hazelnut …
#> 10 3 2 233 Duck Rillettes Pork Belly Cheesecake
#> # ℹ 62 more rows
The purpose of this section is to replicate the running example in Street, D.J., Burgess, L. and Louviere, J.J., 2005. Quick and easy choice sets: constructing optimal and nearly optimal stated choice experiments. International Journal of Research in Marketing, 22(4), pp.459-470.
The steps commented in the code follow those of the tutorial.
# Step 0
atttravel <- list(
airfaire = c("0", "1"),
travel_time = c("0", "1", "2")
)
# Step 1
travel2131 <- full_factorial(atttravel)
# Step 2
aff_travel2131 <- augment_levels(travel2131)
#> Applying B mat
# Step 3.
# The full factorial is already so small that selecting a fraction of it would be silly.
# Therefore re-use the full factorial as the fractional factorial.
# Step 4.
# Confirming that this is an efficient design.
fractional_travel2131 <- search_design(travel2131, travel2131)
# Step 5.
# Confirm that this design supports all interactions.
full_factorial_efficiacy <- fractional_factorial_efficiency(~ (airfaire + travel_time)^2, fractional_travel2131)
#> Your fractional factorial design has an A-efficiency of100%
#> Your fractional factorial design has a D-efficiency of100%
# Step 6 & Step 7.
# Street gives two examples of choice sets.
travel_choice_set1 <- list(c("00", "11", "02"), c("10", "02", "12"))
class(travel_choice_set1) <- c(class(travel_choice_set1), "choice_set")
travel_example <- dce_efficiency(aff_travel2131, travel_choice_set1)
#>
#> q is1
#> L is2
#> Case 1
#> s is2
#>
#> q is2
#> L is3
#> Case 3
#> The implied x is1and y is0
#> s is3
#>
#>
#> The D-efficiency of this discrete choice experiment is62.996%
# Note, if you want to rearrange the columns of the lamda matrix so that they are the same as Street use the following:
# lamda_street_cols <- matrix(c(travel_example$Lamda$mat[,1],
# travel_example$Lamda$mat[,3],
# travel_example$Lamda$mat[,5],
# travel_example$Lamda$mat[,2],
# travel_example$Lamda$mat[,4],
# travel_example$Lamda$mat[,6]), ncol = 6)
# lamda_street_paper <- matrix(c(lamda_street_cols[1,],
# lamda_street_cols[3,],
# lamda_street_cols[5,],
# lamda_street_cols[2,],
# lamda_street_cols[4,],
# lamda_street_cols[6,]), ncol = 6)
# Street gives a second arrangement:
travel_choice_set2 <- list(c("00", "11", "02"), c("10", "01", "12"))
class(travel_choice_set2) <- c(class(travel_choice_set2), "choice_set")
# This version is 100% efficient.
travel_example2 <- dce_efficiency(aff_travel2131, travel_choice_set2)
#>
#> q is1
#> L is2
#> Case 1
#> s is2
#>
#> q is2
#> L is3
#> Case 3
#> The implied x is1and y is0
#> s is3
#>
#>
#> The D-efficiency of this discrete choice experiment is100%
# Step 8
travel_questions <- construct_question_frame(aff_travel2131, travel_choice_set2, randomise_choice_sets = FALSE)
levels(travel_questions$airfaire) <- c("$350", "$650")
levels(travel_questions$travel_time) <- c("4 hours", "5 hours", "6 hours")
#View(travel_questions)
Street, D.J., Burgess, L. and Louviere, J.J., 2005. Quick and easy choice sets: constructing optimal and nearly optimal stated choice experiments. International Journal of Research in Marketing, 22(4), pp.459-470.