Chapter 3 Data

3.1 Sources

As described in the proposal, we look at three types of data over time:

3.1.1 Crime

Crime data is sourced using the Uniform Crime Reporting (UCR) program. The UCR program is used by criminal justice researchers and students.

Yearwise UCR data is visited using the index of UCR publications. In the proposal, we had planned to extract data of 1995 to 1998 as well, but these are in pdf format and was turning out to be rather difficult to extract. Besides, we can take data from 1999 as well to roughly compare the Bush administration vs the Obama administration in terms of crime. From 1999 to 2019, crime data is directly available in Table 5 (except for 2016, where the data is available in Table 3) in xls format categorized by state, nature of offense and kind of area. In the proposal, we had planned on downloading total crime rates separately, but then figured that we could sum up statewise and make do with that.

3.1.2 Imprisonment

Imprisonment data is sourced from the National Prisoner Statistics (NPS) program. The Bureau of Justice Statistics has compiled data from NPS as quick tables. We use total number of prison admissions from 1978 to 2019, and total number of prison releases from 1978 to 2019. In the proposal we only decided upon the previously mentioned dataset, however we are also exploring imprisonment rate of sentenced prisoners from 1978 to 2019. This will allow us to see the rate which is the number of prisoners under state or federal jurisdiction with a sentence of more than 1 year per 100,000 U.S. residents.

3.2 Cleaning / transformation

3.2.1 Crime

Raw crime data across all years is made in Excel and doesn’t have a clear table structure. For example,

Snapshot of data

Data from 1999-2002 have a similar format so they are extracted using data_collection/1999-2002.R. Data from 2003 and 2004 are peculiar so they ar extracted using data_collection/2003.R and data_collection/2004.R. Data from 2013-2016 have a similar format so they are extracted using data_collection/2013-2016.R. The rest of the data is extracted using data_collection/2005-2012, 2017-2019.R. The xls links to these years is saved in metadata/crime_data_links.csv so that we don’t have to hardcode URLs.

Year State Area Population Violent Property Murder Rape Robbery Assault Burglary Theft Motor Arson
1999 ALABAMA Metropolitan Statistical Area 2960883 15835 134045 273 1128 4602 9832 29432 93614 10999 NA
1999 ALABAMA Cities outside metropolitan areas 597141 4017 27620 41 230 576 3170 5595 20631 1394 NA
1999 ALABAMA Rural 811976 1569 9733 31 155 119 1264 3621 5371 741 NA
1999 ALABAMA State Total 4370000 21421 171398 345 1513 5297 14266 38648 119616 13134 NA
1999 ALASKA Metropolitan Statistical Area 257762 1685 11265 19 162 398 1106 1543 8471 1251 NA

Some of the states were read with a whitespace or a comma so we’ll clean that up.

Year State Area Population Violent Property Murder Rape Robbery Assault Burglary Theft Motor Arson
1999 ALABAMA Metropolitan Statistical Area 2960883 15835 134045 273 1128 4602 9832 29432 93614 10999 NA
1999 ALABAMA Cities outside metropolitan areas 597141 4017 27620 41 230 576 3170 5595 20631 1394 NA
1999 ALABAMA Rural 811976 1569 9733 31 155 119 1264 3621 5371 741 NA
1999 ALABAMA State Total 4370000 21421 171398 345 1513 5297 14266 38648 119616 13134 NA
1999 ALASKA Metropolitan Statistical Area 257762 1685 11265 19 162 398 1106 1543 8471 1251 NA

As stated in the 2003 crime report summary, they started referring to rural counties as metropolitan counties, so we change the area name in the previous years for one-to-one correspondence. In the District of Columbia, the report saves the district-wide crime numbers as “Total” instead of “State Total” since DC is not technically a state. We change the label of that as well to “State Total” just for one-to-one correspondence.

Year State Area Population Violent Property Murder Rape Robbery Assault Burglary Theft Motor Arson
1999 ALABAMA Metropolitan Statistical Area 2960883 15835 134045 273 1128 4602 9832 29432 93614 10999 NA
1999 ALABAMA Cities outside metropolitan areas 597141 4017 27620 41 230 576 3170 5595 20631 1394 NA
1999 ALABAMA Nonmetropolitan counties 811976 1569 9733 31 155 119 1264 3621 5371 741 NA
1999 ALABAMA State Total 4370000 21421 171398 345 1513 5297 14266 38648 119616 13134 NA
1999 ALASKA Metropolitan Statistical Area 257762 1685 11265 19 162 398 1106 1543 8471 1251 NA

Finally, we also convert the year to a factor and the rest of the numbers to integer

3.2.2 Imprisonment

We initiate the exploration of imprisonment data by reading in the files using read_excel function. Then we proceed to eliminate extra columns such as “Jurisdiction” since it is not necessary to conduct our analysis. We also properly rename our desired columns into State, Year, Admissions, Releases, and Rate.

  1. Admissions - number of prisoners admitted into prison
  2. Releases - number of prisoners released from prison
  3. Rate - imprisonment rate per 1,000 prisoners
State 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013/b 2014 2015 2016 2017 2018 2019
U.S. total 152039 161280 171956 199943 218087 237925 234293 258514 291903 326228 365724 447388 460769 466285 480676 500335 523577 549313 542863 572281 603510 611676 654534 638978 660576 686471 697066 730141 747031 742875 738649 728686 703798 671551 608442 629962 626096 608318 606000 606596 596384 576956
Alabama 2572 2597 3766 4025 4425 4605 4701 4370 3962 4543 5101 6510 7031 7683 7967 8454 8287 8692 9465 9301 7492 NA 6296 7428 7033 9524 8278 9723 10039 10708 11037 13093 11881 11387 11203 11265 10912 10451 10749 12170 13160 13267
Alaska/c 258 311 459 461 541 711 727 875 1097 952 1026 1062 1389 1341 1483 2411 NA 1996 2336 2646 2605 2405 2427 2142 2142 2805 NA NA NA NA NA NA 2650 3789 3906 3906 3846 4271 1804 1580 1765 1560
Arizona 1620 1641 2082 2759 2910 3288 3386 3989 4515 5370 5304 6055 6518 7427 7351 8050 9218 8662 9019 9172 10108 9021 9560 10000 11468 11957 11343 12440 13954 14046 14867 14526 13249 13030 12970 13538 14439 14670 13663 13423 13753 13440
Arkansas 1958 2189 2311 2419 2323 2173 2179 2301 2280 3152 2831 3517 4255 4553 4580 3818 4345 5248 5158 5705 6189 6045 6941 6977 7080 7132 8035 8053 5992 6651 7017 7383 7603 7059 5782 8987 9435 9351 9911 8971 9572 10268
State 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013/b,c 2014/c 2015/c 2016 2017 2018 2019
U.S. total 142665 154958 158331 163085 175607 213198 209655 220485 248909 290301 320460 369032 405374 421687 430198 436684 437777 477654 492069 517432 549634 574624 635094 628626 633947 656574 672202 701632 709874 721161 735651 729749 708677 691072 636716 623990 636346 641027 626019 622377 614851 608026
Alabama 2726 2744 3207 2908 2830 3225 3861 3694 3197 3480 5317 5344 5308 6645 7404 7244 7371 7618 8432 8682 7016 8194 7136 7905 7472 10167 9156 10472 11283 11079 11556 12231 12070 11052 11253 11488 11585 11446 12711 13624 14015 12251
Alaska/d 235 216 268 271 358 505 501 620 960 892 936 1002 1442 1348 1379 1824 NA 1894 2043 2393 2615 2504 2599 2041 2041 2736 2726 2702 2719 3286 3741 3196 3068 3599 3774 3774 3774 4085 2159 1941 1735 1717
Arizona 1352 1638 1469 1874 2027 2243 2506 3354 3647 3795 4219 4869 5501 6312 6557 6834 7402 7430 7837 8386 8559 8982 9100 9053 10056 10391 10190 11932 12209 12560 13192 13854 13500 13149 13000 12931 13513 14092 13857 14075 13683 13034
Arkansas 1878 1872 2366 2045 1724 1893 1953 2168 2189 2411 2755 3174 4090 4085 4078 4007 4362 4465 4690 4719 5524 5403 6308 6613 7640 7120 7457 9093 5668 6045 6610 6990 6664 7252 6298 6541 8812 9702 10370 8443 9805 9768
State 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019
U.S. total 130.81 133.07 138.31 153.34 169.89 178.55 187.14 200.98 216.39 230.35 245.71 274.40 295.14 311.34 329.91 359.69 388.62 410.90 426.79 443.80 463.00 475.90 470.26 469.55 477.35 482.67 486.88 492.19 501.24 505.74 506.14 503.75 500.00 491.9319 479.7198 477.3716 470.7563 458.5564 449.6894 441.3848 431.5106 419.4105
Alabama 144.21 141.21 163.30 183.72 218.61 245.06 259.27 270.58 288.21 313.85 307.09 336.83 379.38 400.08 407.75 431.14 447.72 468.49 487.36 496.34 504.00 544.20 584.79 585.52 615.60 607.31 559.75 594.12 598.69 616.77 634.83 652.47 642.35 649.5930 650.4946 647.4109 633.7387 612.4664 570.9180 485.9035 418.1007 419.2635
Alaska/d 121.88 131.89 141.10 170.38 193.95 219.49 251.70 287.33 306.10 327.64 343.56 348.72 334.54 322.70 330.20 450.93 320.57 337.85 383.69 419.43 409.88 372.13 339.12 346.75 400.97 403.91 397.84 415.39 460.05 450.24 431.03 359.07 388.58 398.6128 405.2312 363.8283 281.1676 305.5893 281.9837 257.8795 264.3555 244.1103
Arizona 137.02 141.63 159.37 185.01 209.28 227.12 249.29 259.87 273.19 307.18 327.51 351.34 374.07 391.78 404.78 422.09 447.69 457.78 469.22 471.88 481.23 476.61 491.84 498.89 513.71 531.58 540.09 525.72 541.93 557.82 572.18 584.15 599.13 589.2271 582.7972 584.3090 592.4293 595.6010 586.5044 566.1356 559.8836 557.8603
Arkansas 115.04 131.33 127.18 145.13 170.95 184.15 193.21 198.15 201.59 232.29 235.59 278.98 308.67 324.03 339.20 325.04 354.09 336.04 349.60 381.99 402.13 427.47 442.48 464.35 480.60 486.50 497.71 482.06 487.12 503.48 511.20 524.11 552.68 544.6450 494.2330 579.0428 599.2124 591.7824 583.0859 599.1407 590.3151 586.0233

Next, we proceed to relabel the State and Year columns by removing extra characters that do not provide significance in our visualizations.

State 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019
TOTAL 152039 161280 171956 199943 218087 237925 234293 258514 291903 326228 365724 447388 460769 466285 480676 500335 523577 549313 542863 572281 603510 611676 654534 638978 660576 686471 697066 730141 747031 742875 738649 728686 703798 671551 608442 629962 626096 608318 606000 606596 596384 576956
ALABAMA 2572 2597 3766 4025 4425 4605 4701 4370 3962 4543 5101 6510 7031 7683 7967 8454 8287 8692 9465 9301 7492 NA 6296 7428 7033 9524 8278 9723 10039 10708 11037 13093 11881 11387 11203 11265 10912 10451 10749 12170 13160 13267
ALASKA 258 311 459 461 541 711 727 875 1097 952 1026 1062 1389 1341 1483 2411 NA 1996 2336 2646 2605 2405 2427 2142 2142 2805 NA NA NA NA NA NA 2650 3789 3906 3906 3846 4271 1804 1580 1765 1560
ARIZONA 1620 1641 2082 2759 2910 3288 3386 3989 4515 5370 5304 6055 6518 7427 7351 8050 9218 8662 9019 9172 10108 9021 9560 10000 11468 11957 11343 12440 13954 14046 14867 14526 13249 13030 12970 13538 14439 14670 13663 13423 13753 13440
ARKANSAS 1958 2189 2311 2419 2323 2173 2179 2301 2280 3152 2831 3517 4255 4553 4580 3818 4345 5248 5158 5705 6189 6045 6941 6977 7080 7132 8035 8053 5992 6651 7017 7383 7603 7059 5782 8987 9435 9351 9911 8971 9572 10268
State 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019
TOTAL 142665 154958 158331 163085 175607 213198 209655 220485 248909 290301 320460 369032 405374 421687 430198 436684 437777 477654 492069 517432 549634 574624 635094 628626 633947 656574 672202 701632 709874 721161 735651 729749 708677 691072 636716 623990 636346 641027 626019 622377 614851 608026
ALABAMA 2726 2744 3207 2908 2830 3225 3861 3694 3197 3480 5317 5344 5308 6645 7404 7244 7371 7618 8432 8682 7016 8194 7136 7905 7472 10167 9156 10472 11283 11079 11556 12231 12070 11052 11253 11488 11585 11446 12711 13624 14015 12251
ALASKA 235 216 268 271 358 505 501 620 960 892 936 1002 1442 1348 1379 1824 NA 1894 2043 2393 2615 2504 2599 2041 2041 2736 2726 2702 2719 3286 3741 3196 3068 3599 3774 3774 3774 4085 2159 1941 1735 1717
ARIZONA 1352 1638 1469 1874 2027 2243 2506 3354 3647 3795 4219 4869 5501 6312 6557 6834 7402 7430 7837 8386 8559 8982 9100 9053 10056 10391 10190 11932 12209 12560 13192 13854 13500 13149 13000 12931 13513 14092 13857 14075 13683 13034
ARKANSAS 1878 1872 2366 2045 1724 1893 1953 2168 2189 2411 2755 3174 4090 4085 4078 4007 4362 4465 4690 4719 5524 5403 6308 6613 7640 7120 7457 9093 5668 6045 6610 6990 6664 7252 6298 6541 8812 9702 10370 8443 9805 9768
State 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019
TOTAL 130.81 133.07 138.31 153.34 169.89 178.55 187.14 200.98 216.39 230.35 245.71 274.40 295.14 311.34 329.91 359.69 388.62 410.90 426.79 443.80 463.00 475.90 470.26 469.55 477.35 482.67 486.88 492.19 501.24 505.74 506.14 503.75 500.00 491.9319 479.7198 477.3716 470.7563 458.5564 449.6894 441.3848 431.5106 419.4105
ALABAMA 144.21 141.21 163.30 183.72 218.61 245.06 259.27 270.58 288.21 313.85 307.09 336.83 379.38 400.08 407.75 431.14 447.72 468.49 487.36 496.34 504.00 544.20 584.79 585.52 615.60 607.31 559.75 594.12 598.69 616.77 634.83 652.47 642.35 649.5930 650.4946 647.4109 633.7387 612.4664 570.9180 485.9035 418.1007 419.2635
ALASKA 121.88 131.89 141.10 170.38 193.95 219.49 251.70 287.33 306.10 327.64 343.56 348.72 334.54 322.70 330.20 450.93 320.57 337.85 383.69 419.43 409.88 372.13 339.12 346.75 400.97 403.91 397.84 415.39 460.05 450.24 431.03 359.07 388.58 398.6128 405.2312 363.8283 281.1676 305.5893 281.9837 257.8795 264.3555 244.1103
ARIZONA 137.02 141.63 159.37 185.01 209.28 227.12 249.29 259.87 273.19 307.18 327.51 351.34 374.07 391.78 404.78 422.09 447.69 457.78 469.22 471.88 481.23 476.61 491.84 498.89 513.71 531.58 540.09 525.72 541.93 557.82 572.18 584.15 599.13 589.2271 582.7972 584.3090 592.4293 595.6010 586.5044 566.1356 559.8836 557.8603
ARKANSAS 115.04 131.33 127.18 145.13 170.95 184.15 193.21 198.15 201.59 232.29 235.59 278.98 308.67 324.03 339.20 325.04 354.09 336.04 349.60 381.99 402.13 427.47 442.48 464.35 480.60 486.50 497.71 482.06 487.12 503.48 511.20 524.11 552.68 544.6450 494.2330 579.0428 599.2124 591.7824 583.0859 599.1407 590.3151 586.0233

All of the three tables are in xlsx format so we will use readxl package to import it into R. These tables have the total number of prison admissions, releases, and imprisonment rates by year and by state. The states are along a column and the years are along a row so we will pivot_longer() function so that the final table has state, year and number of prisoners admitted/released/rates as columns.

Now that we have converted all three data frames into the desired long format, we can proceed to apply an inner join on the admissions_data_long and releases_data_long by State and Year. Then we apply another inner join on the resulting data frame with the rate_data_long. Now we have one clean table that encapsulates five columns: State, Year, Admissions, Releases, and Rate and 1,512 rows of entries.

State Year Admissions Releases Rate
ALABAMA 1978 2572 2726 144.21
ALABAMA 1979 2597 2744 141.21
ALABAMA 1980 3766 3207 163.30
ALABAMA 1981 4025 2908 183.72
ALABAMA 1982 4425 2830 218.61

3.3 Missing value analysis

3.3.1 Crime

We visualize the missing data in the transformed crime data table

There are no missing values in Year, State, Area and Population. There are some missing values in crime categories other than arson, which we checked is actually blank in the raw data. We will fill these with 0. Arson has been blank or missing in all raw data across all years so we will drop that column.

3.3.2 Imprisonment

There are missing values through Admissions, Releases, and Rate values across States. However, the numbers are quite insignificant in comparison to the available data that is present in the total_imprisonment_data data frame.

According to the bar chart, Only 1.2% of Releases and Rate data are missing. Admissions takes the lead with 1.39% of missing values. We also plotted the missing values of rows in order to see if there is a pattern across different features. It seems that the missing values originate from a single row – indicating that they are related to a specific state.

Now we will highlight missing values of specific States. There some missing values for Alabama and New Hampshire for Admissions data. There are also missing values across all variables for District of Columbia from 2001-2019 since sentenced felons were the responsibility of the Federal Bureau of Prisons during these years. We will keep these entries since we are simply visualizing information provided from the National Prisoner Statistics (NPS) program.