Journal 3

My research question is: Do racial disparities in fatal police shootings persist after controlling for whether the individual was armed? 

In class we discussed how there is a 7-year age gap between White and Black victims in fatal police shootings. We also noted that the data cannot fully determine whether bias exists, because it does not directly measure bias. However, I think one way to approach the issue is to look for patterns that might suggest bias, even if they cannot prove it directly. That is why I am focusing on my question about racial disparities and armed status. 

The outcome variable is already fixed because the dataset only contains fatal police shootings. My main variable of interest is race, and my key control variable is whether the individual was armed or unarmed. I started by looking at the number of armed versus unarmed individuals in the dataset. From there, I planned to calculate percentages of armed versus unarmed across racial groups, which helps show whether disparities appear different when weapon status is considered. 

To go deeper, I will use logistic regression in Python. The dependent variable will be whether someone was unarmed (1 = unarmed, 0 = armed), with race as the main predictor and armed status as a control. This will allow me to see if racial disparities remain significant after controlling for whether the person was armed. In addition, I plan to explore age patterns, since earlier discussion in class suggested there may be meaningful differences between White and Black victims in terms of age. 

While setting this up, I installed Python and VS Code to run my analysis. I ran into technical challenges when trying to install pandas, the main library I plan on using. Pip was not being recognized correctly, which prevented me from completing the setup. I used ChatGPT to help me format my Python code and troubleshoot the installation error. Once I finish fixing this issue, I will be able to implement my analysis. 

At this stage, I am working backwards to get my environment set up properly so that I can install pandas and finish preparing the dataset. Once that is fixed, I will move forward with cleaning the armed variable into a simple “Armed vs. Unarmed” grouping, run descriptive statistics, and then build the logistic regression to directly test my research question. 

 

Journal 2

Last week, I focused on conducting my analysis in Excel. This week, I expanded my work by exploring the same analysis across three different programs: R, Mathematica, and Python. I compared how each handled descriptive statistics for the variables in my dataset to get a sense of their functionality. After experimenting, I chose Python as my primary tool moving forward, since it feels the most effective for deeper analysis. 

To build my skills, I also began watching beginner tutorials on Python so I can apply more advanced techniques. Using descriptive statistics, I started comparing variables against one another, which helped me generate several research questions I want to explore further. I was also curious about the timing of incidents, such as days or seasons, but wasn’t yet sure how to approach that in the data. 

Questions 

  • Has the frequency of fatal police shootings changed from 2015 to recent years? 
  • Why do certain states stand out more than others despite differences in population size and density? – Regarding Arizona and Kansas counties having one of the highest rates despite being relatively small 
  • What locations are individuals unarmed and armed, and has it decreased/increased over time?    I’m curious as to whether there were more areas with guns and did that cause more shootings in the area 
  • Does mental illness status vary significantly by race or age group? 
  • Do racial disparities in fatal police shootings persist after controlling for whether the individual was armed? 
  • Are there differences in outcomes based on which police agency is involved? 

I’m still trying to figure out what the different agencies mean. 

 

Journal 1

The first step I took was analyzing each variable using descriptive statistics. I began with the variables I understood best: flee status, city, county, state, age, gender, race, mental illness status, and whether a body camera was turned on.

I noticed that more than 8,000 out of 10,000 cases, the officer’s body camera was turned off.

When examining counties and states with the highest number of shootings, I was surprised that New York and its counties were not at the top. Instead, California, Florida, and Texas had the highest numbers. Given that these are large states, it makes sense that their totals are higher. In class, we discussed the possibility that the proximity and number of police stations could increase the likelihood of shootings. While New York does have many stations and one of the highest population densities in the country, its number of shootings was still relatively small compared to other states. 

Arizona and Kansas, although smaller states, had one of the highest police shootings.

In class, we also looked at Wikipedia and confirmed New York’s high population density, which raised further questions. Why do certain states stand out more than others despite differences in population size and density? To better understand this, I would like to examine state gun laws, the rates of gun ownership, and individuals who were killed while holding guns.

Looking at demographics, I noticed that most people shot were men in their late 20s to early 30s. This made me wonder whether social and economic challenges at that age, such as difficulty finding stable work, could play a role. Since the dataset includes names, it might also be possible to examine whether criminal history is a contributing factor.

I also explored the variable for “threat type.” I searched for definitions on the Washington Post site but could not find a clear explanation. I plan to do further research to better interpret these categories.

Another area I want to focus on is dates. I am curious whether shootings cluster around certain seasons or events, such as Halloween, or other times of the year.

Finally, I am interested in analyzing agency IDs. Identifying which agencies had the highest number of shootings could help highlight patterns and serve as a way to hold certain departments more accountable.