1 Data Preparation - Preprocessing¶

Team Members: Leonid Shpaner, Christopher Robinson, and Jose Luis Estrada

This notebook goes into preprocessing the dataset by various methods of imputation, reclassifying useful categorical features, feature engineering, and removing columns not used for model development (exclusion criteria). It culminates with the data being partioned into a train, validation, and test hold out sets.

from google.colab import drive
drive.mount('/content/drive', force_remount=True)

Mounted at /content/drive

%cd /content/drive/Shared drives/Capstone - Best Group/GitHub Repository/navigating_crime/Code Library

/content/drive/Shared drives/Capstone - Best Group/GitHub Repository/navigating_crime/Code Library

####################################
## import the requisite libraries ##
####################################
import os
import csv
import pandas as pd
import numpy as np

# import data_types .py file created for inspecting data types
from functions import data_types

# library for one-hot encoding
from sklearn.preprocessing import OneHotEncoder

# import library from sci-kit learn for splitting the data
from sklearn.model_selection import train_test_split

# check current working directory
current_directory = os.getcwd()
current_directory

'/content/drive/Shared drives/Capstone - Best Group/GitHub Repository/navigating_crime/Code Library'

1.0.1 Assign Paths to Folders¶

# path to the data file
data_file = '/content/drive/Shareddrives/Capstone - Best Group/' \
          + 'Final_Data_20220719/LA_Streets_with_Crimes_By_Division.csv'

# path to data folder
data_folder = '/content/drive/Shareddrives/Capstone - Best Group/' \
            + 'GitHub Repository/navigating_crime/Data Folder/'

# path to the image library
eda_image_path = '/content/drive/Shareddrives/Capstone - Best Group/' \
               + 'GitHub Repository/navigating_crime/Image Folder/EDA Images'

# read in the csv file to a dataframe using pandas
df = pd.read_csv(data_file, low_memory=False).set_index('OBJECTID')
df.head()

# show the columns of the dataframe for inspection
df.columns

Index(['Shape', 'Join_Count', 'TARGET_FID', 'Join_Count_1', 'TARGET_FID_1',
       'JOIN_FID', 'FullName', 'Type', 'Elevation', 'Surface', 'Status',
       'DrivingDir', 'From_L', 'From_R', 'To_L', 'To_R', 'Parity_L',
       'Parity_R', 'StPreDir', 'StPreMod', 'StPreType', 'StArticle', 'StName',
       'StPostType', 'StPostDir', 'StPostMod', 'Zip_L', 'Zip_R', 'LCity_L',
       'LCity_R', 'NameCat_L', 'NameCat_R', 'Accuracy', 'Jurisdiction',
       'Source', 'SourceID', 'UpdateDate', 'MSAG_LCity', 'MSAG_RCity',
       'MSAG_LESN', 'MSAG_RESN', 'Crime_Cost', 'StreetOID', 'Field1', 'DR_NO',
       'Date_Rptd', 'DATE_OCC', 'TIME_OCC', 'AREA', 'AREA_NAME', 'Rpt_Dist_No',
       'Part_1_2', 'Crm_Cd', 'Crm_Cd_Desc', 'Mocodes', 'Vict_Age', 'Vict_Sex',
       'Vict_Descent', 'Premis_Cd', 'Premis_Desc', 'Weapon_Used_Cd',
       'Weapon_Desc', 'Status_1', 'Status_Desc', 'Crm_Cd_1', 'Crm_Cd_2',
       'Crm_Cd_3', 'Crm_Cd_4', 'LOCATION', 'Cross_Street', 'LAT', 'LON',
       'OBJECTID_1', 'APREC', 'PREC', 'AREA_1', 'PERIMETER', 'SHAPE_Leng',
       'Shape_Length_1', 'Shape_Length'],
      dtype='object')

# re-inspect the shape of the dataframe. This is also done on EDA file.
print(' There are', df.shape[0], 'and', df.shape[1], 'columns in the dataframe.')

 There are 183362 and 80 columns in the dataframe.

1.1 Introduction¶

The data is ingested from a shared google drive path and in its raw form, contains 183,362 rows and 80 columns. The preprocessing stage begins by inspecting each column's data types along with its respective null values.

# run the python file created for showing columns, data types, and their
# respective null counts (this was imported as a library at the top of this nb)
dtypes = data_types(df)
pd.set_option('display.max_rows', None)
dtypes = dtypes.sort_values(by='# of Nulls', ascending=False)

# here only the top null columns are subset
# a good amount of these columns are to be subsequently removed
dat_typ = dtypes[(dtypes['Percent Null']<84) & (dtypes['# of Nulls']>0)]

# any column with '_L' and '_R' is of no interest and will be dropped
dat_typ = dat_typ[dat_typ['Column/Variable'].str.contains('_L')==False]
dat_typ = dat_typ[dat_typ['Column/Variable'].str.contains('_R')==False]
dat_typ

# object types are typically removed for ML processing, but some imputation
# may be necessary, which is why they are examined prior

For example, there are 3,274 missing values for weapon description. Imputation is necessary to properly assign the corresponding numerical value to the corresponding weapons used code, which is also missing 3,274 observations.

The victim descent column only has 171 missing values, the rows corresponding to these values can be dropped, but they are instead imputed with 'Unknown' since the ethnicities here remain unknown.

The same holds true for victim's sex, where there exist 167 missing values.

Ethnicity and age are important features which will be numericized and adapted for use with machine learning. Jurisdiction and other string columns presented herein will be subsequently omitted from the development (modeling) set.

1.2 Imputation Strategies¶

Weapons used in crimes contain useful information, and the rows of missing data presented therein are imputed by a value of 500, corresponding to 'unknown' in the data dictionary. Similarly, the missing values in victim's sex and ethnicity, respectively, are imputed by a categorical value of "X," for "unknown." The 211 missing values contained in the zip code column are dropped since they cannot be logically imputed.

# Since the code for unknown weapons is 500, missing values for 'Weapon_Used_Cd' 
# where 'Weapon_Desc' is 'UNKNOWN WEAPON/OTHER WEAPON' will be 500.
# However, since missing values in one column correspond to the other, 
# missing values in 'Weapon_Desc' will need to be imputed first with
# 'UNKNOWN WEAPON/OTHER WEAPON'

# impute missing weapon descriptions
df['Weapon_Desc'] = df['Weapon_Desc'].fillna('UNKNOWN WEAPON/OTHER WEAPON')

# impute missing 'Weapon_Used_Cd' with 500 for reasons discussed above.
df['Weapon_Used_Cd'] = df['Weapon_Used_Cd'].fillna(500)

# impute missing values for victim's gender with 'X,' since this corresponds to
# 'Unknown' in the data dictionary
df['Vict_Sex'] = df['Vict_Sex'].fillna('X')

# impute missing values for victim's race with 'X,' since this corresponds to
# 'Unknown' in the data dictionary
df['Vict_Descent'] = df['Vict_Descent'].fillna('X')

# Since there are only 211 rows with missing zip codes, those rows are dropped
df.dropna(subset=['Zip_R'], inplace=True)

1.3 Feature Engineering¶

There exist two date columns, date reported and date occurred, of which only the latter is used for extracting the year and month into two new columns, respectively, since a date object by itself cannot be used in a machine learning algorithm. In addition, descriptions of street premises are first converted into lower case strings within the column and subsequently encoded into separate binarized columns. Similarly, new columns for time (derived from military time) and city neighborhoods (derived from area) are encoded into binarized columns, respectively.

More importantly, the target column (Crime_Code) is defined by the values in the 'Crm_Cd' column where 0 is for any crime that ranges from 1-499 (most severe) and 1 is for any crime above 500 (less severe).

# create new column only for year of 'Date_Occurred'
df['Year'] = pd.DatetimeIndex(df['DATE_OCC']).year

# create new numeric column for month of year
df['Month_No'] = pd.DatetimeIndex(df['DATE_OCC']).month
# create new string column for month of year
df['Month'] = df['Month_No'].map({1: 'January', 2: 'February', 3: 'March',
                                  4: 'April', 5: 'May', 6: 'June', 7: 'July',
                                  8: 'August', 9: 'September', 10: 'October',
                                  11: 'November', 12: 'December'})

# bin the ages using pd.cut() function
# 
df['age_bin'] = pd.cut(df['Vict_Age'],
                       bins=[-1, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 
                             60, 65, 70, 75, 80, 85, 90, 95, 100, 105, 110, 
                             115, 120],
                       labels=[' 0-5', ' 5-10', '10-15', '15-20', '20-25',
                               '25-30', '30-35', '35-40', '40-45', '45-50',
                               '50-55', '55-60', '60-65', '65-70', '70-75',
                               '75-80', '80-85', '85-90', '90-95', '95-100', 
                               '100-105', '105-110', '110-115', '115-120'])

# binarize the 'Crm_Cd' column; this is the target column
# 0 is for any crime that ranges 1-499 (most severe)
# anything above 500 is less severe
df['Crime_Code'] = df['Crm_Cd'].apply(lambda value: 1 if value <= 499 else \
                                     (0 if value >= 500 else 2))

# re-categorize ground truth as serious vs. not serious crimes where serious 
# crimes are '1' and less serious crimes are '0'
df['crime_severity'] = df['Crime_Code'].map({0:'Less Serious', 1:'More Serious'})

# creating a new column of premises types by renaming the premises types to lower
# case characters and eliminating spaces between words
df['Premises'] = df['Premis_Desc'].replace({'STREET': 'Street',
                                            'SIDEWALK': 'Sidewalk',
                                            'PARKING LOT': 'Parking_Lot',
                                            'ALLEY': 'Alley',
                                            'DRIVEWAY': 'Driveway',
                                            'PARK/PLAYGROUND': 'Park_Playground',
                                            'VACANT LOT': 'Vacant_Lot',
                                            'TUNNEL': 'Tunnel',
                                            'PEDESTRIAN OVERCROSSING': \
                                            'Pedestrian_Overcrossing'})

# replace strings with spaces of neighborhoods within 'AREA_NAME' to 
# strings without spaces by replacing the space with an underscore since 
# this is a more acceptable format for a final dataframe
df['AREA_NAME'] = df['AREA_NAME'].replace({'77th Street': '77th_Street',
                                           'N Hollywood': 'N_Hollywood',
                                           'Van Nuys': 'Van_Nuys',
                                           'West Valley': 'West_Valley',
                                           'West LA': 'West_LA'})

# creating instance of one-hot-encoder using pd.get_dummies
# perform one-hot encoding on 'Vict_Descent' column 
premise_encode = pd.get_dummies(df['Premises'])
# join these new columns back to the original dataframe
df = premise_encode.join(df)

def applyFunc(s):
    if s==0:
        return 'Midnight'
    elif s<=1159:
        return 'Morning'
    elif s>=1200 and s<=1800:
        return 'Afternoon'
    elif s>=1800 and s<=2200:
        return 'Evening'
    elif s>2200:
        return 'Night'

df['Time_of_Day'] = df['TIME_OCC'].apply(applyFunc)

# dummy encode military time into 'Morning', 'Afternoon', 'Evening', 'Night',
# and 'Midnight' time of day strings, respectively, making them their own
# separate and distinct columns
military_encode = pd.get_dummies(df['Time_of_Day'])
df = military_encode.join(df)

# dummy encode area names into their own separate and distinct columns
area_encode = pd.get_dummies(df['AREA_NAME'])
df = area_encode.join(df)

1.4 Reclassifying Useful Categorical Features¶

Gender and ethnicity are critical factors in making informed decisions about crime. New columns are created for both variables to better represent their characteristics. For example, unidentified genders are replaced with unknowns, and one-letter abbreviations for victim descent are re-categorized into full-text ethnicity descriptions from the city of Los Angeles' data dictionary.

# narrow down victim sex column to three categories (M, F, X=Unknown). Currently,
# there are four, with "H" presented. "H" is re-categorized to "X" since it is not
# in the data dictionary, and thus, remains unknown.
df['Victim_Sex'] = df['Vict_Sex'].map({'F':'F', 'H':'X', 'M':'M', 'X':'X'})
df['Gender'] = df['Victim_Sex'].map({'F':1, 'M':2, 'X':3})

# reclassify letters in victim description back to full race description from
# data dictionary nomenclature and add as new column; this will be helpful for 
# additional EDA post pre-processing.
df['Victim_Desc'] = df['Vict_Descent'].map({'A':'Other Asian', 'B':'Black', 'C':
                                         'Chinese', 'D': 'Cambodian', 'F':
                                         'Filipino', 'G': 'Guamanian', 'H':
                                         'Hispanic/Latin/Mexican', 'I':
                                         'American Indian/Alaskan Native', 'J':
                                         'Japanese', 'K': 'Korean', 'L': 'Laotian',
                                         'O': 'Other', 'P': 'Pacific Islander',
                                         'S': 'Samoan', 'U': 'Hawaiian', 'V':
                                         'Vietnamese', 'W': 'White', 'X': 'Unknown',
                                         'Z': 'Asian Indian'})

# write out the preliminarily preprocessed df to new .csv file
df.to_csv('/content/drive/Shareddrives/Capstone - Best Group/Final_Data_20220719'
         +'/df.csv')

1.5 Remove Columns Not Used for Model Development (Exclusion Criteria)¶

Any columns with all null rows and object data types are immediately dropped from the dataset, but cast into a new dataframe for posterity. Furthermore, only data from the year 2022 is included, since older retrospective data does not represent an accurate enough depiction of crime in the city of Los Angeles. Longitude and latitude columns are dropped since city neighborhood columns take their place for geographic information. Moreover, columns like retained joins, identifications, report numbers, and categorical information encoded to the original dataframe are omitted from this dataframe. Columns that only present one unique value (Pedestrian Overcrossing and Tunnel) are removed. Lastly, any column (military time) that presents over and above the Pearson correlation threshold of r = 0.75 is dropped to avoid between-predictor collinearity; all numeric values in the dataframe are cast to integer type formatting.

# make a unique list of cols to remove that are completely missing
cols_remove_null = data_types(df)[(data_types(df)["Percent Null"]==100)] \
                            ['Column/Variable'].unique()
                            
# make a unique list of cols to remove that are non-numeric
cols_remove_string = data_types(df)[(data_types(df)["Data Type"]=='object')] \
                              ['Column/Variable'].unique()

intersect = np.intersect1d(cols_remove_null, cols_remove_string)
print('The following columns intersect as being objects and being null:',
      intersect, '\n')

# Since these columns exist as objects too, they can be removed from the
# list of nulls such that they are not doubled up between the two.'
refined_null = [x for x in cols_remove_null if x not in cols_remove_string]

# set-up new df without dropped columns (dataframe used later for modeling)
df_prep = df.drop(columns=refined_null) 

# Subset dataframe for observations only taking place in year 2022
df_prep = df_prep[df_prep['Year']==2022]

print('Removed Fully Null columns:')
for c in cols_remove_null:
    print(f'\t{c}')
print()

# these columns are to be removed b/c they are object datatypes 
# (i.e., free text, strings)
df_prep = df_prep.drop(columns=cols_remove_string)

print('Removed string and object columns:')
for c in cols_remove_string:
    print(f'\t{c}')

# drop latitude & longitude columns; they will not be used to inform ML model(s)
print()
print('Removed Latitude and Longtitude Columns')
for col in df_prep.columns:
    if 'LAT' in col:
        df_prep = df_prep.drop(columns=col)
        print(f'\t{col}')
    elif 'LON' in col:
        df_prep = df_prep.drop(columns=col)
        print(f'\t{col}')

# drop any additional crime code columns, since only Crm_Cd without number
# following it will be used as target
print()
print('Removed Extraneous Crm_Cd Columns:')
for col in df_prep.columns:
    if 'Crm_Cd' in col:
        df_prep = df_prep.drop(columns=col)
        print(f'\t{col}')
print()

# drop any columns that persisted after join or feature engineering
print()
print('Removed Columns Persisting After JOIN:')

for col in df_prep.columns:
    if 'JOIN_' in col:
        df_prep = df_prep.drop(columns=col)
        print(f'\t{col}')
    elif 'Join_' in col:
        df_prep = df_prep.drop(columns=col)
        print(f'\t{col}')
    elif 'OBJECTID' in col:
        df_prep = df_prep.drop(columns=col)
        print(f'\t{col}')
    elif 'Premis' in col:
        df_prep = df_prep.drop(columns=col)
        print(f'\t{col}')
    elif 'Time' in col:
        df_prep = df_prep.drop(columns=col)
        print(f'\t{col}')   

# removed 'Victim_Sex' column because numericized as new 'Gender' column
# removed 'Victim_Desc' (ethnicity) since these cannot be numerically 
# classified without introducing bias

# drop any columns with '_L' or '_R' nomenclature since these are positional
# street indexes which are used to compile other location-based information

# drop the 'DR_NO' column since these are associated with records
df_prep = df_prep.drop(columns=['From_L', 'From_R', 'To_L', 'To_R', 'Zip_L', 
                                'Zip_R', 'SHAPE_Leng', 'Shape_Length_1',
                                'Shape_Length', 'DR_NO', 'TARGET_FID', 'age_bin',
                                'TARGET_FID_1', 'Accuracy', 'Crime_Cost',
                                'StreetOID', 'Field1', 'AREA', 'AREA_1', 
                                'Part_1_2', 'Weapon_Used_Cd', 'PREC', 'Year',
                                'PERIMETER', 'Rpt_Dist_No'])

print()
print('Removed columns with only one unique value:')
for col in df_prep:
    # remove any columns that have only one unique value in the data
    if df_prep[col].nunique() == 1:
        df_prep = df_prep.drop(columns=col)
        print(f'\t{col}')

# identify and drop any columns exceeding a correlation threshold of 0.75
corr_df = df_prep.reset_index(drop=True) # omit index by subsetting in new df
corr_matrix = corr_df.corr().abs()

# Select upper triangle of correlation matrix
upper = corr_matrix.where(np.triu(np.ones(corr_matrix.shape), k=1).astype(bool))

# Find index of feature columns with correlation greater >= 0.75
to_drop = [column for column in upper.columns if any(upper[column] > 0.75)]
# Drop features exceeding a correlation threshold of 0.75
df_prep = df_prep.drop(corr_df[to_drop], axis=1)
print()
print('These are the columns we should drop: %s'%to_drop)

# cast all numeric variables in preprocessed dataset to integers
df_prep = df_prep.astype(int)

The following columns intersect as being objects and being null: ['MSAG_LCity' 'MSAG_RCity' 'StPostDir' 'StPreMod'] 

Removed Fully Null columns:
	StPreMod
	StPostDir
	StPostMod
	MSAG_LCity
	MSAG_RCity
	MSAG_LESN
	MSAG_RESN
	Crm_Cd_4

Removed string and object columns:
	Shape
	FullName
	Type
	Elevation
	Surface
	Status
	DrivingDir
	Parity_L
	Parity_R
	StPreDir
	StPreMod
	StPreType
	StArticle
	StName
	StPostType
	StPostDir
	LCity_L
	LCity_R
	NameCat_L
	NameCat_R
	Jurisdiction
	Source
	SourceID
	UpdateDate
	MSAG_LCity
	MSAG_RCity
	Date_Rptd
	DATE_OCC
	AREA_NAME
	Crm_Cd_Desc
	Mocodes
	Vict_Sex
	Vict_Descent
	Premis_Desc
	Weapon_Desc
	Status_1
	Status_Desc
	LOCATION
	Cross_Street
	APREC
	Month
	crime_severity
	Premises
	Time_of_Day
	Victim_Sex
	Victim_Desc

Removed Latitude and Longtitude Columns
	LAT
	LON

Removed Extraneous Crm_Cd Columns:
	Crm_Cd
	Crm_Cd_1
	Crm_Cd_2
	Crm_Cd_3


Removed Columns Persisting After JOIN:
	Join_Count
	Join_Count_1
	JOIN_FID
	Premis_Cd
	OBJECTID_1

Removed columns with only one unique value:
	Pedestrian_Overcrossing
	Tunnel

These are the columns we should drop: ['TIME_OCC']

# describe the new preprocessed dataset as the development set and list its cols
development_cols = df_prep.columns.to_list()
print('Development Data Column List:', '\n')
for x in development_cols:
    print(x)
print()
print('There are', df_prep.shape[0], 'rows and', df_prep.shape[1], 
      'columns in the development set.')

Development Data Column List: 

77th_Street
Central
Devonshire
Foothill
Harbor
Hollenbeck
Hollywood
Mission
N_Hollywood
Newton
Northeast
Olympic
Pacific
Rampart
Southeast
Southwest
Topanga
Van_Nuys
West_LA
West_Valley
Wilshire
Afternoon
Evening
Morning
Night
Alley
Driveway
Park_Playground
Parking_Lot
Sidewalk
Street
Vacant_Lot
Vict_Age
Month_No
Crime_Code
Gender

There are 42072 rows and 36 columns in the development set.

Data types are examined once more on this preprocessed dataset to ensure that all data types are integers and no columns remain null. At this stage, there are 42,072 rows and 38 columns.

# run the python file for checking data types again 
# this time, for the development set
data_types(df_prep)

df_prep = df_prep.copy()
df_prep.shape

(42072, 36)

1.6 Train-Test-Validation Split¶

The data is split into a fifty percent development (training and validation) and fifty percent holdout (testing) sets, respectively, saving 3 separate files to the data folder path.

# set random state for reproducibility
rstate = 222

# Divide train set by 50%, valid set by 25%, and test set by 25%
train_prop = 0.50
valid_prop = 0.25
test_prop = 0.25
size_train = np.int64(round(train_prop*len(df_prep),2))
size_valid = np.int64(round(valid_prop*len(df_prep),2))
size_test = np.int64(round(test_prop*len(df_prep),2))
size_total = size_test + size_valid + size_train

# split the data into the development (train and validation)
# and test set, respectively
train, test = train_test_split(df_prep, train_size=size_train,\
                              random_state=rstate)
valid, test = train_test_split(test, train_size=size_valid,\
                              random_state=rstate)

print('Training size:', size_train)
print('Validation size:', size_valid)
print('Test size:', size_test)
print('Total size:', size_train + size_valid + size_test, '\n')
print('Training percentage:', round(size_train/(size_total),2))
print('Validation percentage:', round(size_valid/(size_total),2))
print('Test percentage:', round(size_test/(size_total),2))
print('Total percentage:', (size_train/size_total + size_valid/size_total \
                          + size_test/size_total)*100)

Training size: 21036
Validation size: 10518
Test size: 10518
Total size: 42072 

Training percentage: 0.5
Validation percentage: 0.25
Test percentage: 0.25
Total percentage: 100.0

# subset the development and test sets into their own respective dataframes
train_set = train
valid_set = valid
test_set = test

1.6.1 Save Train, Validation, and Test Sets as Separate .CSV Files¶

# write out the train, validation, and test sets to .csv files to the data folder
# such that they are contained in their own unique files
# the train and validation sets can be used without prejudice. However, the data
# can ONLY be run on the test set once
train_set.to_csv(data_folder + '/train_set.csv')
valid_set.to_csv(data_folder + '/valid_set.csv')
test_set.to_csv(data_folder + '/test_set.csv')

	Shape	Join_Count	TARGET_FID	Join_Count_1	TARGET_FID_1	JOIN_FID	FullName	Type	Elevation	Surface	...	LAT	LON	OBJECTID_1	APREC	PREC	AREA_1	PERIMETER	SHAPE_Leng	Shape_Length_1	Shape_Length
OBJECTID
18	(-13168057.17375, 3992141.1804500036)	1	18	1	18	-1	NaN	Alley	Unknown	Unknown	...	33.7261	-118.2907	20.0	HARBOR	5.0	8.928780e+08	272451.139908	0.807721	0.807721	120.220887
36	(-13180677.51995, 4031566.0669)	1	36	1	36	-1	NaN	Secondary	Surface	Paved	...	34.0205	-118.4040	17.0	PACIFIC	14.0	7.176129e+08	246934.321606	0.757182	0.757182	15.473452
37	(-13180677.51995, 4031566.0669)	1	36	1	37	-1	NaN	Secondary	Surface	Paved	...	34.0205	-118.4040	17.0	PACIFIC	14.0	7.176129e+08	246934.321606	0.757182	0.757182	15.473452
44	(-13160890.369035002, 4035687.0216437643)	1	43	1	44	-1	NaN	Minor	Surface	Paved	...	34.0509	-118.2265	11.0	HOLLENBECK	4.0	4.330323e+08	111317.952873	0.329096	0.329096	118.766240
56	(-13165346.8195, 4030644.1469999994)	1	55	1	56	-1	NaN	Secondary	Surface	Paved	...	34.0138	-118.2663	16.0	NEWTON	13.0	2.723760e+08	81476.922847	0.241128	0.241128	9.731604

	Column/Variable	Data Type	# of Nulls	Percent Null
18	StPreDir	object	83241	45.0
69	Cross_Street	object	74575	41.0
23	StPostType	object	6189	3.0
61	Weapon_Desc	object	3274	2.0
60	Weapon_Used_Cd	float64	3274	2.0
6	FullName	object	1357	1.0
22	StName	object	1357	1.0
75	AREA_1	float64	686	0.0
76	PERIMETER	float64	686	0.0
74	PREC	float64	686	0.0
73	APREC	object	686	0.0
72	OBJECTID_1	float64	686	0.0
54	Mocodes	object	219	0.0
57	Vict_Descent	object	171	0.0
56	Vict_Sex	object	167	0.0
33	Jurisdiction	object	28	0.0

	Column/Variable	Data Type
0	77th_Street	int64
1	Central	int64
2	Devonshire	int64
3	Foothill	int64
4	Harbor	int64
5	Hollenbeck	int64
6	Hollywood	int64
7	Mission	int64
8	N_Hollywood	int64
9	Newton	int64
10	Northeast	int64
11	Olympic	int64
12	Pacific	int64
13	Rampart	int64
14	Southeast	int64
15	Southwest	int64
16	Topanga	int64
17	Van_Nuys	int64
18	West_LA	int64
19	West_Valley	int64
20	Wilshire	int64
21	Afternoon	int64
22	Evening	int64
23	Morning	int64
24	Night	int64
25	Alley	int64
26	Driveway	int64
27	Park_Playground	int64
28	Parking_Lot	int64
29	Sidewalk	int64
30	Street	int64
31	Vacant_Lot	int64
32	Vict_Age	int64
33	Month_No	int64
34	Crime_Code	int64
35	Gender	int64