Chapter 3 Data
3.1 Sources
As described in the proposal, we look at three types of data over time:
3.1.1 Crime
Crime data is sourced using the Uniform Crime Reporting (UCR) program. The UCR program is used by criminal justice researchers and students.
Yearwise UCR data is visited using the index of UCR publications. In the proposal, we had planned to extract data of 1995 to 1998 as well, but these are in pdf format and was turning out to be rather difficult to extract. Besides, we can take data from 1999 as well to roughly compare the Bush administration vs the Obama administration in terms of crime. From 1999 to 2019, crime data is directly available in Table 5 (except for 2016, where the data is available in Table 3) in xls format categorized by state, nature of offense and kind of area. In the proposal, we had planned on downloading total crime rates separately, but then figured that we could sum up statewise and make do with that.
3.1.2 Imprisonment
Imprisonment data is sourced from the National Prisoner Statistics (NPS) program. The Bureau of Justice Statistics has compiled data from NPS as quick tables. We use total number of prison admissions from 1978 to 2019, and total number of prison releases from 1978 to 2019. In the proposal we only decided upon the previously mentioned dataset, however we are also exploring imprisonment rate of sentenced prisoners from 1978 to 2019. This will allow us to see the rate which is the number of prisoners under state or federal jurisdiction with a sentence of more than 1 year per 100,000 U.S. residents.
3.2 Cleaning / transformation
3.2.1 Crime
Raw crime data across all years is made in Excel and doesn’t have a clear table structure. For example,
Data from 1999-2002 have a similar format so they are extracted using data_collection/1999-2002.R
. Data from 2003 and 2004 are peculiar so they ar extracted using data_collection/2003.R
and data_collection/2004.R
. Data from 2013-2016 have a similar format so they are extracted using data_collection/2013-2016.R
. The rest of the data is extracted using data_collection/2005-2012, 2017-2019.R
. The xls links to these years is saved in metadata/crime_data_links.csv
so that we don’t have to hardcode URLs.
Year | State | Area | Population | Violent | Property | Murder | Rape | Robbery | Assault | Burglary | Theft | Motor | Arson |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1999 | ALABAMA | Metropolitan Statistical Area | 2960883 | 15835 | 134045 | 273 | 1128 | 4602 | 9832 | 29432 | 93614 | 10999 | NA |
1999 | ALABAMA | Cities outside metropolitan areas | 597141 | 4017 | 27620 | 41 | 230 | 576 | 3170 | 5595 | 20631 | 1394 | NA |
1999 | ALABAMA | Rural | 811976 | 1569 | 9733 | 31 | 155 | 119 | 1264 | 3621 | 5371 | 741 | NA |
1999 | ALABAMA | State Total | 4370000 | 21421 | 171398 | 345 | 1513 | 5297 | 14266 | 38648 | 119616 | 13134 | NA |
1999 | ALASKA | Metropolitan Statistical Area | 257762 | 1685 | 11265 | 19 | 162 | 398 | 1106 | 1543 | 8471 | 1251 | NA |
Some of the states were read with a whitespace or a comma so we’ll clean that up.
Year | State | Area | Population | Violent | Property | Murder | Rape | Robbery | Assault | Burglary | Theft | Motor | Arson |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1999 | ALABAMA | Metropolitan Statistical Area | 2960883 | 15835 | 134045 | 273 | 1128 | 4602 | 9832 | 29432 | 93614 | 10999 | NA |
1999 | ALABAMA | Cities outside metropolitan areas | 597141 | 4017 | 27620 | 41 | 230 | 576 | 3170 | 5595 | 20631 | 1394 | NA |
1999 | ALABAMA | Rural | 811976 | 1569 | 9733 | 31 | 155 | 119 | 1264 | 3621 | 5371 | 741 | NA |
1999 | ALABAMA | State Total | 4370000 | 21421 | 171398 | 345 | 1513 | 5297 | 14266 | 38648 | 119616 | 13134 | NA |
1999 | ALASKA | Metropolitan Statistical Area | 257762 | 1685 | 11265 | 19 | 162 | 398 | 1106 | 1543 | 8471 | 1251 | NA |
As stated in the 2003 crime report summary, they started referring to rural counties as metropolitan counties, so we change the area name in the previous years for one-to-one correspondence. In the District of Columbia, the report saves the district-wide crime numbers as “Total” instead of “State Total” since DC is not technically a state. We change the label of that as well to “State Total” just for one-to-one correspondence.
Year | State | Area | Population | Violent | Property | Murder | Rape | Robbery | Assault | Burglary | Theft | Motor | Arson |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1999 | ALABAMA | Metropolitan Statistical Area | 2960883 | 15835 | 134045 | 273 | 1128 | 4602 | 9832 | 29432 | 93614 | 10999 | NA |
1999 | ALABAMA | Cities outside metropolitan areas | 597141 | 4017 | 27620 | 41 | 230 | 576 | 3170 | 5595 | 20631 | 1394 | NA |
1999 | ALABAMA | Nonmetropolitan counties | 811976 | 1569 | 9733 | 31 | 155 | 119 | 1264 | 3621 | 5371 | 741 | NA |
1999 | ALABAMA | State Total | 4370000 | 21421 | 171398 | 345 | 1513 | 5297 | 14266 | 38648 | 119616 | 13134 | NA |
1999 | ALASKA | Metropolitan Statistical Area | 257762 | 1685 | 11265 | 19 | 162 | 398 | 1106 | 1543 | 8471 | 1251 | NA |
Finally, we also convert the year to a factor and the rest of the numbers to integer
3.2.2 Imprisonment
We initiate the exploration of imprisonment data by reading in the files using read_excel
function. Then we proceed to eliminate extra columns such as “Jurisdiction” since it is not necessary to conduct our analysis. We also properly rename our desired columns into State, Year, Admissions, Releases, and Rate.
- Admissions - number of prisoners admitted into prison
- Releases - number of prisoners released from prison
- Rate - imprisonment rate per 1,000 prisoners
State | 1978 | 1979 | 1980 | 1981 | 1982 | 1983 | 1984 | 1985 | 1986 | 1987 | 1988 | 1989 | 1990 | 1991 | 1992 | 1993 | 1994 | 1995 | 1996 | 1997 | 1998 | 1999 | 2000 | 2001 | 2002 | 2003 | 2004 | 2005 | 2006 | 2007 | 2008 | 2009 | 2010 | 2011 | 2012 | 2013/b | 2014 | 2015 | 2016 | 2017 | 2018 | 2019 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
U.S. total | 152039 | 161280 | 171956 | 199943 | 218087 | 237925 | 234293 | 258514 | 291903 | 326228 | 365724 | 447388 | 460769 | 466285 | 480676 | 500335 | 523577 | 549313 | 542863 | 572281 | 603510 | 611676 | 654534 | 638978 | 660576 | 686471 | 697066 | 730141 | 747031 | 742875 | 738649 | 728686 | 703798 | 671551 | 608442 | 629962 | 626096 | 608318 | 606000 | 606596 | 596384 | 576956 |
Alabama | 2572 | 2597 | 3766 | 4025 | 4425 | 4605 | 4701 | 4370 | 3962 | 4543 | 5101 | 6510 | 7031 | 7683 | 7967 | 8454 | 8287 | 8692 | 9465 | 9301 | 7492 | NA | 6296 | 7428 | 7033 | 9524 | 8278 | 9723 | 10039 | 10708 | 11037 | 13093 | 11881 | 11387 | 11203 | 11265 | 10912 | 10451 | 10749 | 12170 | 13160 | 13267 |
Alaska/c | 258 | 311 | 459 | 461 | 541 | 711 | 727 | 875 | 1097 | 952 | 1026 | 1062 | 1389 | 1341 | 1483 | 2411 | NA | 1996 | 2336 | 2646 | 2605 | 2405 | 2427 | 2142 | 2142 | 2805 | NA | NA | NA | NA | NA | NA | 2650 | 3789 | 3906 | 3906 | 3846 | 4271 | 1804 | 1580 | 1765 | 1560 |
Arizona | 1620 | 1641 | 2082 | 2759 | 2910 | 3288 | 3386 | 3989 | 4515 | 5370 | 5304 | 6055 | 6518 | 7427 | 7351 | 8050 | 9218 | 8662 | 9019 | 9172 | 10108 | 9021 | 9560 | 10000 | 11468 | 11957 | 11343 | 12440 | 13954 | 14046 | 14867 | 14526 | 13249 | 13030 | 12970 | 13538 | 14439 | 14670 | 13663 | 13423 | 13753 | 13440 |
Arkansas | 1958 | 2189 | 2311 | 2419 | 2323 | 2173 | 2179 | 2301 | 2280 | 3152 | 2831 | 3517 | 4255 | 4553 | 4580 | 3818 | 4345 | 5248 | 5158 | 5705 | 6189 | 6045 | 6941 | 6977 | 7080 | 7132 | 8035 | 8053 | 5992 | 6651 | 7017 | 7383 | 7603 | 7059 | 5782 | 8987 | 9435 | 9351 | 9911 | 8971 | 9572 | 10268 |
State | 1978 | 1979 | 1980 | 1981 | 1982 | 1983 | 1984 | 1985 | 1986 | 1987 | 1988 | 1989 | 1990 | 1991 | 1992 | 1993 | 1994 | 1995 | 1996 | 1997 | 1998 | 1999 | 2000 | 2001 | 2002 | 2003 | 2004 | 2005 | 2006 | 2007 | 2008 | 2009 | 2010 | 2011 | 2012 | 2013/b,c | 2014/c | 2015/c | 2016 | 2017 | 2018 | 2019 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
U.S. total | 142665 | 154958 | 158331 | 163085 | 175607 | 213198 | 209655 | 220485 | 248909 | 290301 | 320460 | 369032 | 405374 | 421687 | 430198 | 436684 | 437777 | 477654 | 492069 | 517432 | 549634 | 574624 | 635094 | 628626 | 633947 | 656574 | 672202 | 701632 | 709874 | 721161 | 735651 | 729749 | 708677 | 691072 | 636716 | 623990 | 636346 | 641027 | 626019 | 622377 | 614851 | 608026 |
Alabama | 2726 | 2744 | 3207 | 2908 | 2830 | 3225 | 3861 | 3694 | 3197 | 3480 | 5317 | 5344 | 5308 | 6645 | 7404 | 7244 | 7371 | 7618 | 8432 | 8682 | 7016 | 8194 | 7136 | 7905 | 7472 | 10167 | 9156 | 10472 | 11283 | 11079 | 11556 | 12231 | 12070 | 11052 | 11253 | 11488 | 11585 | 11446 | 12711 | 13624 | 14015 | 12251 |
Alaska/d | 235 | 216 | 268 | 271 | 358 | 505 | 501 | 620 | 960 | 892 | 936 | 1002 | 1442 | 1348 | 1379 | 1824 | NA | 1894 | 2043 | 2393 | 2615 | 2504 | 2599 | 2041 | 2041 | 2736 | 2726 | 2702 | 2719 | 3286 | 3741 | 3196 | 3068 | 3599 | 3774 | 3774 | 3774 | 4085 | 2159 | 1941 | 1735 | 1717 |
Arizona | 1352 | 1638 | 1469 | 1874 | 2027 | 2243 | 2506 | 3354 | 3647 | 3795 | 4219 | 4869 | 5501 | 6312 | 6557 | 6834 | 7402 | 7430 | 7837 | 8386 | 8559 | 8982 | 9100 | 9053 | 10056 | 10391 | 10190 | 11932 | 12209 | 12560 | 13192 | 13854 | 13500 | 13149 | 13000 | 12931 | 13513 | 14092 | 13857 | 14075 | 13683 | 13034 |
Arkansas | 1878 | 1872 | 2366 | 2045 | 1724 | 1893 | 1953 | 2168 | 2189 | 2411 | 2755 | 3174 | 4090 | 4085 | 4078 | 4007 | 4362 | 4465 | 4690 | 4719 | 5524 | 5403 | 6308 | 6613 | 7640 | 7120 | 7457 | 9093 | 5668 | 6045 | 6610 | 6990 | 6664 | 7252 | 6298 | 6541 | 8812 | 9702 | 10370 | 8443 | 9805 | 9768 |
State | 1978 | 1979 | 1980 | 1981 | 1982 | 1983 | 1984 | 1985 | 1986 | 1987 | 1988 | 1989 | 1990 | 1991 | 1992 | 1993 | 1994 | 1995 | 1996 | 1997 | 1998 | 1999 | 2000 | 2001 | 2002 | 2003 | 2004 | 2005 | 2006 | 2007 | 2008 | 2009 | 2010 | 2011 | 2012 | 2013 | 2014 | 2015 | 2016 | 2017 | 2018 | 2019 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
U.S. total | 130.81 | 133.07 | 138.31 | 153.34 | 169.89 | 178.55 | 187.14 | 200.98 | 216.39 | 230.35 | 245.71 | 274.40 | 295.14 | 311.34 | 329.91 | 359.69 | 388.62 | 410.90 | 426.79 | 443.80 | 463.00 | 475.90 | 470.26 | 469.55 | 477.35 | 482.67 | 486.88 | 492.19 | 501.24 | 505.74 | 506.14 | 503.75 | 500.00 | 491.9319 | 479.7198 | 477.3716 | 470.7563 | 458.5564 | 449.6894 | 441.3848 | 431.5106 | 419.4105 |
Alabama | 144.21 | 141.21 | 163.30 | 183.72 | 218.61 | 245.06 | 259.27 | 270.58 | 288.21 | 313.85 | 307.09 | 336.83 | 379.38 | 400.08 | 407.75 | 431.14 | 447.72 | 468.49 | 487.36 | 496.34 | 504.00 | 544.20 | 584.79 | 585.52 | 615.60 | 607.31 | 559.75 | 594.12 | 598.69 | 616.77 | 634.83 | 652.47 | 642.35 | 649.5930 | 650.4946 | 647.4109 | 633.7387 | 612.4664 | 570.9180 | 485.9035 | 418.1007 | 419.2635 |
Alaska/d | 121.88 | 131.89 | 141.10 | 170.38 | 193.95 | 219.49 | 251.70 | 287.33 | 306.10 | 327.64 | 343.56 | 348.72 | 334.54 | 322.70 | 330.20 | 450.93 | 320.57 | 337.85 | 383.69 | 419.43 | 409.88 | 372.13 | 339.12 | 346.75 | 400.97 | 403.91 | 397.84 | 415.39 | 460.05 | 450.24 | 431.03 | 359.07 | 388.58 | 398.6128 | 405.2312 | 363.8283 | 281.1676 | 305.5893 | 281.9837 | 257.8795 | 264.3555 | 244.1103 |
Arizona | 137.02 | 141.63 | 159.37 | 185.01 | 209.28 | 227.12 | 249.29 | 259.87 | 273.19 | 307.18 | 327.51 | 351.34 | 374.07 | 391.78 | 404.78 | 422.09 | 447.69 | 457.78 | 469.22 | 471.88 | 481.23 | 476.61 | 491.84 | 498.89 | 513.71 | 531.58 | 540.09 | 525.72 | 541.93 | 557.82 | 572.18 | 584.15 | 599.13 | 589.2271 | 582.7972 | 584.3090 | 592.4293 | 595.6010 | 586.5044 | 566.1356 | 559.8836 | 557.8603 |
Arkansas | 115.04 | 131.33 | 127.18 | 145.13 | 170.95 | 184.15 | 193.21 | 198.15 | 201.59 | 232.29 | 235.59 | 278.98 | 308.67 | 324.03 | 339.20 | 325.04 | 354.09 | 336.04 | 349.60 | 381.99 | 402.13 | 427.47 | 442.48 | 464.35 | 480.60 | 486.50 | 497.71 | 482.06 | 487.12 | 503.48 | 511.20 | 524.11 | 552.68 | 544.6450 | 494.2330 | 579.0428 | 599.2124 | 591.7824 | 583.0859 | 599.1407 | 590.3151 | 586.0233 |
Next, we proceed to relabel the State and Year columns by removing extra characters that do not provide significance in our visualizations.
State | 1978 | 1979 | 1980 | 1981 | 1982 | 1983 | 1984 | 1985 | 1986 | 1987 | 1988 | 1989 | 1990 | 1991 | 1992 | 1993 | 1994 | 1995 | 1996 | 1997 | 1998 | 1999 | 2000 | 2001 | 2002 | 2003 | 2004 | 2005 | 2006 | 2007 | 2008 | 2009 | 2010 | 2011 | 2012 | 2013 | 2014 | 2015 | 2016 | 2017 | 2018 | 2019 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
TOTAL | 152039 | 161280 | 171956 | 199943 | 218087 | 237925 | 234293 | 258514 | 291903 | 326228 | 365724 | 447388 | 460769 | 466285 | 480676 | 500335 | 523577 | 549313 | 542863 | 572281 | 603510 | 611676 | 654534 | 638978 | 660576 | 686471 | 697066 | 730141 | 747031 | 742875 | 738649 | 728686 | 703798 | 671551 | 608442 | 629962 | 626096 | 608318 | 606000 | 606596 | 596384 | 576956 |
ALABAMA | 2572 | 2597 | 3766 | 4025 | 4425 | 4605 | 4701 | 4370 | 3962 | 4543 | 5101 | 6510 | 7031 | 7683 | 7967 | 8454 | 8287 | 8692 | 9465 | 9301 | 7492 | NA | 6296 | 7428 | 7033 | 9524 | 8278 | 9723 | 10039 | 10708 | 11037 | 13093 | 11881 | 11387 | 11203 | 11265 | 10912 | 10451 | 10749 | 12170 | 13160 | 13267 |
ALASKA | 258 | 311 | 459 | 461 | 541 | 711 | 727 | 875 | 1097 | 952 | 1026 | 1062 | 1389 | 1341 | 1483 | 2411 | NA | 1996 | 2336 | 2646 | 2605 | 2405 | 2427 | 2142 | 2142 | 2805 | NA | NA | NA | NA | NA | NA | 2650 | 3789 | 3906 | 3906 | 3846 | 4271 | 1804 | 1580 | 1765 | 1560 |
ARIZONA | 1620 | 1641 | 2082 | 2759 | 2910 | 3288 | 3386 | 3989 | 4515 | 5370 | 5304 | 6055 | 6518 | 7427 | 7351 | 8050 | 9218 | 8662 | 9019 | 9172 | 10108 | 9021 | 9560 | 10000 | 11468 | 11957 | 11343 | 12440 | 13954 | 14046 | 14867 | 14526 | 13249 | 13030 | 12970 | 13538 | 14439 | 14670 | 13663 | 13423 | 13753 | 13440 |
ARKANSAS | 1958 | 2189 | 2311 | 2419 | 2323 | 2173 | 2179 | 2301 | 2280 | 3152 | 2831 | 3517 | 4255 | 4553 | 4580 | 3818 | 4345 | 5248 | 5158 | 5705 | 6189 | 6045 | 6941 | 6977 | 7080 | 7132 | 8035 | 8053 | 5992 | 6651 | 7017 | 7383 | 7603 | 7059 | 5782 | 8987 | 9435 | 9351 | 9911 | 8971 | 9572 | 10268 |
State | 1978 | 1979 | 1980 | 1981 | 1982 | 1983 | 1984 | 1985 | 1986 | 1987 | 1988 | 1989 | 1990 | 1991 | 1992 | 1993 | 1994 | 1995 | 1996 | 1997 | 1998 | 1999 | 2000 | 2001 | 2002 | 2003 | 2004 | 2005 | 2006 | 2007 | 2008 | 2009 | 2010 | 2011 | 2012 | 2013 | 2014 | 2015 | 2016 | 2017 | 2018 | 2019 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
TOTAL | 142665 | 154958 | 158331 | 163085 | 175607 | 213198 | 209655 | 220485 | 248909 | 290301 | 320460 | 369032 | 405374 | 421687 | 430198 | 436684 | 437777 | 477654 | 492069 | 517432 | 549634 | 574624 | 635094 | 628626 | 633947 | 656574 | 672202 | 701632 | 709874 | 721161 | 735651 | 729749 | 708677 | 691072 | 636716 | 623990 | 636346 | 641027 | 626019 | 622377 | 614851 | 608026 |
ALABAMA | 2726 | 2744 | 3207 | 2908 | 2830 | 3225 | 3861 | 3694 | 3197 | 3480 | 5317 | 5344 | 5308 | 6645 | 7404 | 7244 | 7371 | 7618 | 8432 | 8682 | 7016 | 8194 | 7136 | 7905 | 7472 | 10167 | 9156 | 10472 | 11283 | 11079 | 11556 | 12231 | 12070 | 11052 | 11253 | 11488 | 11585 | 11446 | 12711 | 13624 | 14015 | 12251 |
ALASKA | 235 | 216 | 268 | 271 | 358 | 505 | 501 | 620 | 960 | 892 | 936 | 1002 | 1442 | 1348 | 1379 | 1824 | NA | 1894 | 2043 | 2393 | 2615 | 2504 | 2599 | 2041 | 2041 | 2736 | 2726 | 2702 | 2719 | 3286 | 3741 | 3196 | 3068 | 3599 | 3774 | 3774 | 3774 | 4085 | 2159 | 1941 | 1735 | 1717 |
ARIZONA | 1352 | 1638 | 1469 | 1874 | 2027 | 2243 | 2506 | 3354 | 3647 | 3795 | 4219 | 4869 | 5501 | 6312 | 6557 | 6834 | 7402 | 7430 | 7837 | 8386 | 8559 | 8982 | 9100 | 9053 | 10056 | 10391 | 10190 | 11932 | 12209 | 12560 | 13192 | 13854 | 13500 | 13149 | 13000 | 12931 | 13513 | 14092 | 13857 | 14075 | 13683 | 13034 |
ARKANSAS | 1878 | 1872 | 2366 | 2045 | 1724 | 1893 | 1953 | 2168 | 2189 | 2411 | 2755 | 3174 | 4090 | 4085 | 4078 | 4007 | 4362 | 4465 | 4690 | 4719 | 5524 | 5403 | 6308 | 6613 | 7640 | 7120 | 7457 | 9093 | 5668 | 6045 | 6610 | 6990 | 6664 | 7252 | 6298 | 6541 | 8812 | 9702 | 10370 | 8443 | 9805 | 9768 |
State | 1978 | 1979 | 1980 | 1981 | 1982 | 1983 | 1984 | 1985 | 1986 | 1987 | 1988 | 1989 | 1990 | 1991 | 1992 | 1993 | 1994 | 1995 | 1996 | 1997 | 1998 | 1999 | 2000 | 2001 | 2002 | 2003 | 2004 | 2005 | 2006 | 2007 | 2008 | 2009 | 2010 | 2011 | 2012 | 2013 | 2014 | 2015 | 2016 | 2017 | 2018 | 2019 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
TOTAL | 130.81 | 133.07 | 138.31 | 153.34 | 169.89 | 178.55 | 187.14 | 200.98 | 216.39 | 230.35 | 245.71 | 274.40 | 295.14 | 311.34 | 329.91 | 359.69 | 388.62 | 410.90 | 426.79 | 443.80 | 463.00 | 475.90 | 470.26 | 469.55 | 477.35 | 482.67 | 486.88 | 492.19 | 501.24 | 505.74 | 506.14 | 503.75 | 500.00 | 491.9319 | 479.7198 | 477.3716 | 470.7563 | 458.5564 | 449.6894 | 441.3848 | 431.5106 | 419.4105 |
ALABAMA | 144.21 | 141.21 | 163.30 | 183.72 | 218.61 | 245.06 | 259.27 | 270.58 | 288.21 | 313.85 | 307.09 | 336.83 | 379.38 | 400.08 | 407.75 | 431.14 | 447.72 | 468.49 | 487.36 | 496.34 | 504.00 | 544.20 | 584.79 | 585.52 | 615.60 | 607.31 | 559.75 | 594.12 | 598.69 | 616.77 | 634.83 | 652.47 | 642.35 | 649.5930 | 650.4946 | 647.4109 | 633.7387 | 612.4664 | 570.9180 | 485.9035 | 418.1007 | 419.2635 |
ALASKA | 121.88 | 131.89 | 141.10 | 170.38 | 193.95 | 219.49 | 251.70 | 287.33 | 306.10 | 327.64 | 343.56 | 348.72 | 334.54 | 322.70 | 330.20 | 450.93 | 320.57 | 337.85 | 383.69 | 419.43 | 409.88 | 372.13 | 339.12 | 346.75 | 400.97 | 403.91 | 397.84 | 415.39 | 460.05 | 450.24 | 431.03 | 359.07 | 388.58 | 398.6128 | 405.2312 | 363.8283 | 281.1676 | 305.5893 | 281.9837 | 257.8795 | 264.3555 | 244.1103 |
ARIZONA | 137.02 | 141.63 | 159.37 | 185.01 | 209.28 | 227.12 | 249.29 | 259.87 | 273.19 | 307.18 | 327.51 | 351.34 | 374.07 | 391.78 | 404.78 | 422.09 | 447.69 | 457.78 | 469.22 | 471.88 | 481.23 | 476.61 | 491.84 | 498.89 | 513.71 | 531.58 | 540.09 | 525.72 | 541.93 | 557.82 | 572.18 | 584.15 | 599.13 | 589.2271 | 582.7972 | 584.3090 | 592.4293 | 595.6010 | 586.5044 | 566.1356 | 559.8836 | 557.8603 |
ARKANSAS | 115.04 | 131.33 | 127.18 | 145.13 | 170.95 | 184.15 | 193.21 | 198.15 | 201.59 | 232.29 | 235.59 | 278.98 | 308.67 | 324.03 | 339.20 | 325.04 | 354.09 | 336.04 | 349.60 | 381.99 | 402.13 | 427.47 | 442.48 | 464.35 | 480.60 | 486.50 | 497.71 | 482.06 | 487.12 | 503.48 | 511.20 | 524.11 | 552.68 | 544.6450 | 494.2330 | 579.0428 | 599.2124 | 591.7824 | 583.0859 | 599.1407 | 590.3151 | 586.0233 |
All of the three tables are in xlsx format so we will use readxl
package to import it into R. These tables have the total number of prison admissions, releases, and imprisonment rates by year and by state. The states are along a column and the years are along a row so we will pivot_longer()
function so that the final table has state, year and number of prisoners admitted/released/rates as columns.
Now that we have converted all three data frames into the desired long format, we can proceed to apply an inner join on the admissions_data_long and releases_data_long by State and Year. Then we apply another inner join on the resulting data frame with the rate_data_long. Now we have one clean table that encapsulates five columns: State, Year, Admissions, Releases, and Rate and 1,512 rows of entries.
State | Year | Admissions | Releases | Rate |
---|---|---|---|---|
ALABAMA | 1978 | 2572 | 2726 | 144.21 |
ALABAMA | 1979 | 2597 | 2744 | 141.21 |
ALABAMA | 1980 | 3766 | 3207 | 163.30 |
ALABAMA | 1981 | 4025 | 2908 | 183.72 |
ALABAMA | 1982 | 4425 | 2830 | 218.61 |
3.3 Missing value analysis
3.3.1 Crime
We visualize the missing data in the transformed crime data table
There are no missing values in Year, State, Area and Population. There are some missing values in crime categories other than arson, which we checked is actually blank in the raw data. We will fill these with 0. Arson has been blank or missing in all raw data across all years so we will drop that column.
3.3.2 Imprisonment
There are missing values through Admissions, Releases, and Rate values across States. However, the numbers are quite insignificant in comparison to the available data that is present in the total_imprisonment_data
data frame.
According to the bar chart, Only 1.2% of Releases and Rate data are missing. Admissions takes the lead with 1.39% of missing values. We also plotted the missing values of rows in order to see if there is a pattern across different features. It seems that the missing values originate from a single row – indicating that they are related to a specific state.
Now we will highlight missing values of specific States. There some missing values for Alabama and New Hampshire for Admissions data. There are also missing values across all variables for District of Columbia from 2001-2019 since sentenced felons were the responsibility of the Federal Bureau of Prisons during these years. We will keep these entries since we are simply visualizing information provided from the National Prisoner Statistics (NPS) program.