| Leon Shpaner

Table 1

Dataset Distribution

Feature	Feature options
Destination	Home (3,237); No urgent place (6,283); work (3,164)
Passenger	Alone (7,305); friends (3,298); kids (1,006); partner (1,075)
Weather	Sunny (10,069); snowy (1,405); rainy (1,210)
Temperature	80 (6,528); 55 (3,840); 30 (2,316)
Time	7am (3,164); 10am (2,275); 2pm (2,009); 6pm (3,230); 10pm (2,006)
Coupon	Take away (2,393); restaurant less than 20 (2,786); bar (2,017); coffee house (3,996); restaurant 20-50 (1,492)
Expiration	1-day (7,091); 2-hour (5,593)
Gender	Female (6,511); Male (6,173)
Age	Below 21 (547); 21 (2,653); 26 (2,559); 31 (2,039); 36 (1,319); 41 (1,093); 46 (686), over 50 (1,788)
Marital Status	Married (5,100); single (4,752); unmarried partner (2,186); divorced (516); widowed (130)
Has children	0/no (7,431); 1/yes (5,253)
Education	High school (88); High school graduate (905); associates (1,153); some college (4,351); bachelors (4,335); graduate degree (1,852)
Occupation	Twenty-five categories with 43 to 1,870 respondents for each
Income	Categories in $12,500 increments and over $100,000, skewed toward lower values and then $100,000
Car	Car too old for OnStar (21); mazda5 (22); scooter/motorcycle (22); crossover (21); do not drive (22)
Bar	Never (5,197); less than 1 (3,482); 1-3 (2,473); 4-8 (1,076); over 8 (349)
Coffee House	Never (2,962); less than 1 (3,225); 1-3 (3,225); 4-8 (1,784); over 8 (1,111)
Carry Away	Never (153); less than 1 (1,856); 1-3 (4,672); 4-8 (4,258); over 8 (1,594)
Restaurantless20	Never (220); less than 1 (2,093); 1-3 (5,376); 4-8 (3,580); over 8 (1,285)
Restaurant20to50	Never (2,136); less than 1 (6,077); 1-3 (3,290); 4-8 (728); over 8 (264)
To coupon over 5	All responses 1/yes
To coupon over 15	0/No (5,562); 1/yes (7,122)
To coupon over 25	0/No (11,173); 1/yes (1,511)
Direction same	0/No (9,960); 1/yes (2,724)
Direction opp	0/No (2,724); 1/yes (9,960)

Note. This dataset exemplifies counts by each feature distribution.

Table 2

Model Metrics

Model	Accuracy	ROC/AUC	Recall	F1-Score
Neural Network	71%	78%	70%	69%
Linear Discriminant Analysis	69%	73%	79%	74%
Quadratic Discriminant Analysis	67%	71%	69%	70%
Gradient Boosting	76%	83%	83%	80%
K-Nearest Neighbors (Euclidean Distance)	67%	65%	78%	72%
K-Nearest Neighbors (Manhattan Distance)	69%	67%	83%	75%
Random Forest	76%	74%	84%	79%
Naïve Bayes	62%	71%	65%	66%
Tuned Decision Tree	70%	69%	76%	74%
Tuned Logistic Regression	69%	67%	78%	74%
Support Vector Machines	76%	75%	78%	74%

Note. Even though we have “locked-in” the seed value for the random state at 42, the following models have shown variability in results. Gradient Boosting, Neural Networks, and Random Forests have a random component to them; therefore, upon re-running the code, some variations may exist between the reported values on both sides.