Table 1
Dataset Distribution
Feature | Feature options |
---|---|
Destination | Home (3,237); No urgent place (6,283); work (3,164) |
Passenger | Alone (7,305); friends (3,298); kids (1,006); partner (1,075) |
Weather | Sunny (10,069); snowy (1,405); rainy (1,210) |
Temperature | 80 (6,528); 55 (3,840); 30 (2,316) |
Time | 7am (3,164); 10am (2,275); 2pm (2,009); 6pm (3,230); 10pm (2,006) |
Coupon | Take away (2,393); restaurant less than 20 (2,786); bar (2,017); coffee house (3,996); restaurant 20-50 (1,492) |
Expiration | 1-day (7,091); 2-hour (5,593) |
Gender | Female (6,511); Male (6,173) |
Age | Below 21 (547); 21 (2,653); 26 (2,559); 31 (2,039); 36 (1,319); 41 (1,093); 46 (686), over 50 (1,788) |
Marital Status | Married (5,100); single (4,752); unmarried partner (2,186); divorced (516); widowed (130) |
Has children | 0/no (7,431); 1/yes (5,253) |
Education | High school (88); High school graduate (905); associates (1,153); some college (4,351); bachelors (4,335); graduate degree (1,852) |
Occupation | Twenty-five categories with 43 to 1,870 respondents for each |
Income | Categories in $12,500 increments and over $100,000, skewed toward lower values and then $100,000 |
Car | Car too old for OnStar (21); mazda5 (22); scooter/motorcycle (22); crossover (21); do not drive (22) |
Bar | Never (5,197); less than 1 (3,482); 1-3 (2,473); 4-8 (1,076); over 8 (349) |
Coffee House | Never (2,962); less than 1 (3,225); 1-3 (3,225); 4-8 (1,784); over 8 (1,111) |
Carry Away | Never (153); less than 1 (1,856); 1-3 (4,672); 4-8 (4,258); over 8 (1,594) |
Restaurantless20 | Never (220); less than 1 (2,093); 1-3 (5,376); 4-8 (3,580); over 8 (1,285) |
Restaurant20to50 | Never (2,136); less than 1 (6,077); 1-3 (3,290); 4-8 (728); over 8 (264) |
To coupon over 5 | All responses 1/yes |
To coupon over 15 | 0/No (5,562); 1/yes (7,122) |
To coupon over 25 | 0/No (11,173); 1/yes (1,511) |
Direction same | 0/No (9,960); 1/yes (2,724) |
Direction opp | 0/No (2,724); 1/yes (9,960) |
Note. This dataset exemplifies counts by each feature distribution.
Table 2
Model Metrics
Model | Accuracy | ROC/AUC | Recall | F1-Score |
---|---|---|---|---|
Neural Network | 71% | 78% | 70% | 69% |
Linear Discriminant Analysis | 69% | 73% | 79% | 74% |
Quadratic Discriminant Analysis | 67% | 71% | 69% | 70% |
Gradient Boosting | 76% | 83% | 83% | 80% |
K-Nearest Neighbors (Euclidean Distance) | 67% | 65% | 78% | 72% |
K-Nearest Neighbors (Manhattan Distance) | 69% | 67% | 83% | 75% |
Random Forest | 76% | 74% | 84% | 79% |
Naïve Bayes | 62% | 71% | 65% | 66% |
Tuned Decision Tree | 70% | 69% | 76% | 74% |
Tuned Logistic Regression | 69% | 67% | 78% | 74% |
Support Vector Machines | 76% | 75% | 78% | 74% |
Note. Even though we have “locked-in” the seed value for the random state at 42, the following models have shown variability in results. Gradient Boosting, Neural Networks, and Random Forests have a random component to them; therefore, upon re-running the code, some variations may exist between the reported values on both sides.