Table 1

Dataset Distribution

Feature Feature options
Destination Home (3,237); No urgent place (6,283); work (3,164)
Passenger Alone (7,305); friends (3,298); kids (1,006); partner (1,075)
Weather Sunny (10,069); snowy (1,405); rainy (1,210)
Temperature 80 (6,528); 55 (3,840); 30 (2,316)
Time 7am (3,164); 10am (2,275); 2pm (2,009); 6pm (3,230); 10pm (2,006)
Coupon Take away (2,393); restaurant less than 20 (2,786); bar (2,017); coffee house (3,996); restaurant 20-50 (1,492)
Expiration 1-day (7,091); 2-hour (5,593)
Gender Female (6,511); Male (6,173)
Age Below 21 (547); 21 (2,653); 26 (2,559); 31 (2,039); 36 (1,319); 41 (1,093); 46 (686), over 50 (1,788)
Marital Status Married (5,100); single (4,752); unmarried partner (2,186); divorced (516); widowed (130)
Has children 0/no (7,431); 1/yes (5,253)
Education High school (88); High school graduate (905); associates (1,153); some college (4,351); bachelors (4,335); graduate degree (1,852)
Occupation Twenty-five categories with 43 to 1,870 respondents for each
Income Categories in $12,500 increments and over $100,000, skewed toward lower values and then $100,000
Car Car too old for OnStar (21); mazda5 (22); scooter/motorcycle (22); crossover (21); do not drive (22)
Bar Never (5,197); less than 1 (3,482); 1-3 (2,473); 4-8 (1,076); over 8 (349)
Coffee House Never (2,962); less than 1 (3,225); 1-3 (3,225); 4-8 (1,784); over 8 (1,111)
Carry Away Never (153); less than 1 (1,856); 1-3 (4,672); 4-8 (4,258); over 8 (1,594)
Restaurantless20 Never (220); less than 1 (2,093); 1-3 (5,376); 4-8 (3,580); over 8 (1,285)
Restaurant20to50 Never (2,136); less than 1 (6,077); 1-3 (3,290); 4-8 (728); over 8 (264)
To coupon over 5 All responses 1/yes
To coupon over 15 0/No (5,562); 1/yes (7,122)
To coupon over 25 0/No (11,173); 1/yes (1,511)
Direction same 0/No (9,960); 1/yes (2,724)
Direction opp 0/No (2,724); 1/yes (9,960)

Note. This dataset exemplifies counts by each feature distribution.

Table 2

Model Metrics

Model Accuracy ROC/AUC Recall F1-Score
Neural Network 71% 78% 70% 69%
Linear Discriminant Analysis 69% 73% 79% 74%
Quadratic Discriminant Analysis 67% 71% 69% 70%
Gradient Boosting 76% 83% 83% 80%
K-Nearest Neighbors (Euclidean Distance) 67% 65% 78% 72%
K-Nearest Neighbors (Manhattan Distance) 69% 67% 83% 75%
Random Forest 76% 74% 84% 79%
Naïve Bayes 62% 71% 65% 66%
Tuned Decision Tree 70% 69% 76% 74%
Tuned Logistic Regression 69% 67% 78% 74%
Support Vector Machines 76% 75% 78% 74%

Note. Even though we have “locked-in” the seed value for the random state at 42, the following models have shown variability in results. Gradient Boosting, Neural Networks, and Random Forests have a random component to them; therefore, upon re-running the code, some variations may exist between the reported values on both sides.