Chapter 14 Alternative-Specific Multinomial Logit Models

14.1 Introduction: Why Alternative-Specific MNL?

In the last chapter, we modeled brand choice using standard multinomial logit (MNL) models, where all predictors were case-specific. That is, they described the consumer or choice situation and took the same value for all brands in a given choice set.

In many real marketing applications, however, the most important predictors vary by brand. Examples include:

Price of each brand
Package size
Sugar content or nutritional attributes
Promotional indicators
Brand-specific features

Alternative-specific multinomial logit (AS-MNL) models allow us to include these variables directly, providing richer managerial insight into how brand attributes drive choice.

In this chapter, you will learn how to:

Work with long-format choice data
Split alternative-specific data correctly into training and test samples
Estimate an alternative-specific MNL model
Evaluate model fit and classification performance
Interpret predicted probabilities and marginal effects in a marketing context

Throughout the chapter, we will use the yogurt dataset.

14.2 The Yogurt Choice Data

The yogurt dataset records consumer brand choices in repeated choice situations. Each row represents one alternative within one choice situation, not a single consumer.

Key implications:

Each choice situation appears multiple times (once per brand)
Exactly one alternative is chosen per choice set
Many predictors vary across brands within the same choice set

This “long” structure is required for alternative-specific MNL models and differs from the wide-format data used earlier in the course.

14.3 Preparing the Data for Modeling

14.3.1 Why Splitting Is Different for Choice Data

With alternative-specific data, we cannot randomly split rows into training and test sets. Doing so would break apart choice sets and contaminate model evaluation.

Instead, we must split at the choice-set level, ensuring that all rows belonging to the same choice situation stay together.

14.3.2 Creating Training and Test Samples

We still use the splitsample() function from the MKT4320BGSU package, which supports group-level splitting. Whereas before we didn’t use several parameters, will will use them for alternative specific MNL.

Usage:

splitsample(data, outcome = NULL, group = NULL, choice = NULL, alt = NULL,
p = 0.75, seed = 4320)
where:
- data is the data frame to split, in long-format.
- outcome is NOT (USUALLY) USED FOR ALTERNATIVE SPECIFIC MNL
- group is the grouping variable (e.g., choice situation id or respondent id). If provided, splitting is done at the group level. Required for alternative specific MNL.
- choice is the 0/1 (or TRUE/FALSE) indicator for the chosen alternative. Used only when group is provided. Required for alternative specific MNL.
- alt is the optional alternative label/ID. Used with choice to stratify at the group level. Required for alternative specific MNL.
- p is the proportion of observations to place in the training set. Must be strictly between 0 and 1. Default is 0.75.
- seed is the random seed for reproducibility. Default is 4320.

Before, we were interested in the $train and $test data frames. Now, we are interested in the train.mdata and test.mdata objects that are saved. They are in the format needed for the using mlogit (see below). However, to avoid a console error, you’ll access the a slightly different way.

sp <- splitsample(data = yogurt, group = "id", choice = "choice", alt = "brand")

train <- sp[["train.mdata"]]
test  <- sp[["test.mdata"]]

At this point:

train contains complete choice sets for model estimation
test contains unseen choice sets for out-of-sample evaluation

14.4 Specifying an Alternative-Specific MNL Model

In an alternative-specific MNL model:

Case-specific variables enter once
Alternative-specific variables enter as brand-varying predictors

We use the mlogit function from the mlogit package to estimate the model. We separate the alternative specific from the case specific variables with a |. Alternative specific come first, then the case specific. We can use the base R summary() function to get the raw log-odds estimates.

library(mlogit)
as_mnl_fit <- mlogit(choice ~ price + feat | income, data = train)
summary(as_mnl_fit)


Call:
mlogit(formula = choice ~ price + feat | income, data = train, 
    method = "nr")

Frequencies of alternatives:choice
  Dannon   Hiland   Weight  Yoplait 
0.401988 0.029818 0.229155 0.339039 

nr method
8 iterations, 0h:0m:0s 
g'(-H)^-1g = 0.000171 
successive function values within tolerance limits 

Coefficients :
                      Estimate Std. Error  z-value  Pr(>|z|)    
(Intercept):Hiland   0.7587200  0.5677111   1.3365  0.181401    
(Intercept):Weight  -0.0263906  0.2078931  -0.1269  0.898986    
(Intercept):Yoplait -3.9886941  0.2679762 -14.8845 < 2.2e-16 ***
price               -0.4424450  0.0295572 -14.9691 < 2.2e-16 ***
feat                 0.4230830  0.1491240   2.8371  0.004552 ** 
income:Hiland       -0.1081164  0.0149201  -7.2464 4.281e-13 ***
income:Weight       -0.0114764  0.0037707  -3.0436  0.002338 ** 
income:Yoplait       0.0729207  0.0040281  18.1030 < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Log-Likelihood: -1618.4
McFadden R^2:  0.23972 
Likelihood ratio test : chisq = 1020.6 (p.value = < 2.22e-16)

Interpretation notes:

Coefficients reflect changes in relative utility
Signs and magnitudes should be interpreted in marketing terms
Alternative-specific variables capture within-choice substitution effects

14.5 Evaluating Model Performance

14.5.1 Model Fit and Coefficients

We use the eval_as_mnl() function from the MKT4320BGSU package to obtain fit statistics, coefficients (both log-odds and odds ratio), and classification diagnostics.

Usage:

eval_as_mnl(model, digits = 4, ft = FALSE, newdata = NULL,
label_model = "Model data", label_newdata = "New data", class_digits = 3)
where:
- model is a fitted mlogit model.
- digits is an integer; decimals to round coefficient and fit results (default 4).
- ft is logical; if TRUE, return coefficient and classification tables as flextable objects (default FALSE).
- newdata is an optional dfidx object (e.g., test.mdata) for an additional classification matrix. If NULL, only the training-data matrix is produced.
- label_model is a character string label for the training-data classification matrix (default “Model data”).
- label_newdata is a character string label for the newdata classification matrix (default “New data”).
- class_digits is an integer; decimals to round classification results (default 3).

Key outputs include:

Log-likelihood $\chi^2$ test
McFadden’s pseudo $R^2$
Odds ratios for interpretation
Classification accuracy and diagnostics

as_eval <- eval_as_mnl(as_mnl_fit, ft = TRUE, newdata = test)
as_eval$coef_table

LR chi2 (5) = 1020.5649; p < 0.0001
McFadden's Pseudo R-square = 0.2397
term	logodds	OR	std.error	statistic	p.value
(Intercept):Hiland	0.7587	2.1355	0.5677	1.3365	0.1814
(Intercept):Weight	-0.0264	0.9740	0.2079	-0.1269	0.8990
(Intercept):Yoplait	-3.9887	0.0185	0.2680	-14.8845	0.0000
price	-0.4424	0.6425	0.0296	-14.9691	0.0000
feat	0.4231	1.5267	0.1491	2.8371	0.0046
income:Hiland	-0.1081	0.8975	0.0149	-7.2464	0.0000
income:Weight	-0.0115	0.9886	0.0038	-3.0436	0.0023
income:Yoplait	0.0729	1.0756	0.0040	18.1030	0.0000

as_eval$classify_model

Classification Matrix - Model data
Accuracy = 0.621
PCC = 0.330
	Reference
Predicted	Dannon	Hiland	Weight	Yoplait	Total
Dannon	577	39	324	97	1037
Hiland	1	12	0	2	15
Weight	18	2	38	18	76
Yoplait	132	1	53	497	683
Total	728	54	415	614	1811
Statistics by Class:
Sensitivity	0.793	0.222	0.092	0.809
Specificity	0.575	0.998	0.973	0.845
Precision	0.556	0.800	0.500	0.728

as_eval$classify_newdata

Classification Matrix - New data
Accuracy = 0.607
PCC = 0.331
	Reference
Predicted	Dannon	Hiland	Weight	Yoplait	Total
Dannon	199	14	104	38	355
Hiland	2	2	1	1	6
Weight	8	1	12	13	34
Yoplait	33	0	21	152	206
Total	242	17	138	204	601
Statistics by Class:
Sensitivity	0.822	0.118	0.087	0.745
Specificity	0.565	0.993	0.952	0.864
Precision	0.561	0.333	0.353	0.738

14.5.2 Classification Performance

Classification is evaluated at the choice-set level:

The predicted brand is the one with the highest predicted probability
Accuracy reflects correct brand predictions
PCC provides a baseline comparison

This approach mirrors how managers think about predicting actual consumer choices.

14.6 Predicted Probabilities and Marginal Effects

14.6.1 Why Predicted Probabilities Matter

Coefficients are not always intuitive. Predicted probabilities translate the model into outcomes managers care about:

Market shares
Brand switching
Competitive responses

14.6.2 Why Marginal Effects Are Useful

Marginal effects quantify how much choice probabilities change in response to a small change in an attribute, holding everything else constant. Marginal effects can be computed in two common ways:

At observed values (Average Marginal Effects, AME)
Marginal effects are calculated for each observation using its actual attribute values and then averaged.
At means (Marginal Effects at the Mean, MEM)
Marginal effects are calculated at a single “average” profile, where each attribute is set to its sample mean.

Both approaches summarize how sensitive choice probabilities are to changes in attributes, but they differ in interpretation.

Marginal effects at observed values:

Reflect the full distribution of the data
Avoid relying on a potentially unrealistic “average consumer”
Are often preferred for descriptive and policy interpretation

Marginal effects at means:

Are easier to reproduce by hand or with software defaults
Provide a clear, single reference point
Can be useful for illustrating model mechanics and comparing effects across variables

The marginal effects tables can therefore answer questions such as:

“On average, how does a $1 increase in price affect brand choice?”
“How would choice probabilities change for a typical consumer if an attribute increased slightly?”
“Which brands are most sensitive to changes in a specific attribute?”

In practice, the choice between observed values and means depends on the goal of the analysis. For interpretation and real-world impact, average marginal effects at observed values are often preferred. For teaching, demonstration, or simplified comparisons, marginal effects at means can be equally informative.

14.6.3 The `pp_as_mnl()` Function

For both case-specific and alternative-specific predictors, we use the pp_as_mnl() function from the MKT4320BGSU package to get both predicted probabilities and marginal effects.

Usage:

pp_as_mnl(model,focal_var, focal_type = c("auto", "alt", "case"),
grid_n = 25, digits = 4, ft = FALSE, marginal = TRUE,
me_method = c("observed", "means"), me_step = 1)
where:
- model is a fitted mlogit model.
- focal_var is a character string name of the focal variable.
- focal_type is a character string; one of “case”, “alt”, or “auto” (default = “auto”).
- grid_n is an integer; number of points used to construct the grid of focal values for predicted probability plots when the focal variable is continuous (default = 25).
- digits is an integer; rounding for numeric output (default = 4).
- ft is logical; if TRUE, return tables as flextable objects (default = FALSE).
- marginal is logical; if TRUE, compute marginal effects (default = TRUE).
- me_method is a character string; one of “observed” AME or “means” (default = “observed”).
- me_step is numeric; finite-difference step size for AME (default = 1).

14.6.4 Case-Specific Predictors

We first examine how a consumer-level variable affects brand choice probabilities.

pp_income <- pp_as_mnl(as_mnl_fit, focal_var = "income", ft = TRUE, me_method="means")
pp_income$me_table

Marginal effects for income (at means)
Dannon	Hiland	Weight	Yoplait
-0.0073	-0.0006	-0.0068	0.0147

pp_income$pp_table

Predicted Probability Table (income) - Model data
focal_value	Dannon	Hiland	Weight	Yoplait
60.1438	0.4788	0.0060	0.2608	0.2544
61.1438	0.4729	0.0053	0.2545	0.2672
62.1438	0.4667	0.0047	0.2482	0.2804
Because income is continuous, the values shown include the mean and +/- 1 unit.

pp_income$pp_plot

14.6.5 Alternative-Specific Predictors

Now we examine a brand-specific variable such as price (a continuous variable) and feature (a categorical variable).

pp_price <- pp_as_mnl(as_mnl_fit, focal_var = "price", ft=TRUE, me_method="means")
pp_price$me_table

Marginal effects for price (at means)
Alternative	Dannon	Hiland	Weight	Yoplait
Dannon	-0.1105	0.0010	0.0550	0.0545
Hiland	0.0010	-0.0021	0.0005	0.0005
Weight	0.0550	0.0005	-0.0845	0.0290
Yoplait	0.0545	0.0005	0.0290	-0.0840

pp_price$pp_table

Predicted Probability Table (price) - Model data
varied_alt	focal_value	Dannon	Hiland	Weight	Yoplait
Dannon	7.1628	0.4918	0.0250	0.1883	0.2949
Dannon	8.1628	0.3980	0.0313	0.2353	0.3355
Dannon	9.1628	0.3088	0.0375	0.2821	0.3716
Hiland	4.3663	0.3966	0.0394	0.2258	0.3382
Hiland	5.3663	0.4035	0.0268	0.2305	0.3392
Hiland	6.3663	0.4083	0.0179	0.2338	0.3399
Weight	6.9421	0.3516	0.0256	0.3060	0.3169
Weight	7.9421	0.4019	0.0301	0.2287	0.3392
Weight	8.9421	0.4446	0.0340	0.1648	0.3566
Yoplait	9.6874	0.3673	0.0302	0.2138	0.3886
Yoplait	10.6874	0.4069	0.0311	0.2350	0.3269
Yoplait	11.6874	0.4438	0.0318	0.2545	0.2699
Because price is continuous, the values shown include the mean and +/- 1 unit.

pp_price$pp_plot

pp_feat <- pp_as_mnl(as_mnl_fit, focal_var = "feat", ft=TRUE, me_method="means")
pp_feat$me_table

Marginal effects for feat (at means)
Alternative	Dannon	Hiland	Weight	Yoplait
Dannon	0.1057	-0.0010	-0.0526	-0.0521
Hiland	-0.0010	0.0020	-0.0005	-0.0005
Weight	-0.0526	-0.0005	0.0808	-0.0277
Yoplait	-0.0521	-0.0005	-0.0277	0.0804

pp_feat$pp_table

Predicted Probability Table (feat) - Model data
varied_alt	focal_value	Dannon	Hiland	Weight	Yoplait
Dannon	0	0.3988	0.0301	0.2308	0.3403
Dannon	1	0.4853	0.0244	0.1870	0.3033
Hiland	0	0.4027	0.0285	0.2297	0.3391
Hiland	1	0.3960	0.0407	0.2251	0.3381
Weight	0	0.4038	0.0300	0.2264	0.3398
Weight	1	0.3565	0.0257	0.2990	0.3188
Yoplait	0	0.4043	0.0299	0.2306	0.3352
Yoplait	1	0.3681	0.0289	0.2112	0.3918
Because feat is binary, only the two observed values are shown.

pp_feat$pp_plot

14.7 Managerial Insights

Alternative-specific MNL models allow managers to:

Evaluate pricing and promotion strategies
Understand competitive substitution patterns
Predict market share changes under different scenarios

Compared to standard MNL models, AS-MNL models provide more realistic insights when brand attributes vary within choice sets.

Chapter 14 Alternative-Specific Multinomial Logit Models

14.1 Introduction: Why Alternative-Specific MNL?

14.2 The Yogurt Choice Data

14.3 Preparing the Data for Modeling

14.3.1 Why Splitting Is Different for Choice Data

14.3.2 Creating Training and Test Samples

14.4 Specifying an Alternative-Specific MNL Model

14.5 Evaluating Model Performance

14.5.1 Model Fit and Coefficients

14.5.2 Classification Performance

14.6 Predicted Probabilities and Marginal Effects

14.6.1 Why Predicted Probabilities Matter

14.6.2 Why Marginal Effects Are Useful

14.6.3 The pp_as_mnl() Function

14.6.4 Case-Specific Predictors

14.6.5 Alternative-Specific Predictors

14.7 Managerial Insights

14.6.3 The `pp_as_mnl()` Function