Project#1 – How does medical insurance coverage vary across different age groups in the U.S. marketplace?

Written by:

Viktoriia Malyshkina – Data Analytics with R

Health Insurance in the U.S.

The U.S. healthcare system includes: 

ACA (enacted in 2010) ->  reduce the number of uninsured citizens and make healthcare more affordable ->  subsidies for low- and middle-income individuals, and expanded Medicaid eligibility in participating states.

There are still millions of Americans without health insurance. Coverage gaps are often due to factors like high costs, lack of awareness, or living in states that did not expand Medicaid.

Key Points

The client is a healthcare consultancy firm specializing in market research for the U.S. health insurance sector.

The primary factors to investigate include age demographics, types of medical insurance plans available, enrollment rates across different age groups, and variables influencing these rates like income levels, employment status, health conditions, and regional differences. 

The analysis will require demographic data, insurance plan details, enrollment statistics, and socioeconomic indicators. 

Data Sources: American Community Survey Tables For Health Insurance Coverage.
This data is sourced from government databases such as the U.S. Census Bureau, the Centers for Medicare & Medicaid Services (CMS), and the National Center for Health Statistics. 

Data Overview

The data meets the ROCCC criteria (Reliable, Original, Comprehensive, Current, and Cited) based on its source. 

The ACS is a well-established and credible source of data. However, there can be potential biases or limitations in the survey methodology.

The good idea eliminate biases from ACS data is Cross-Verification with Other Data Sources:

  • Cross-check ACS data with other data sources like the decennial census, health records, or industry-specific surveys.
  • Use these comparisons to identify and understand any discrepancies or anomalies.

Since this is public data from a government source, licensing and privacy concerns are minimal. 

The data is organized in a tabular format with row and column headers.

Steps before analysis

R is a powerful tool for statistical analysis and data visualization. It has extensive libraries for data manipulation (like dplyr, tidyr) and visualization (like ggplot2). R is particularly well-suited for this task due to its capabilities in handling large datasets and performing complex statistical analyses.

Step #1 Data loading and Inspecting structure of the data, missing data, and data types.

Step #2 Reloading the data after revising and inspecting the structure again. 

Step #3 The data has been successfully loaded, but it’s clear that the dataset is quite complex, with multiple columns for various statistics and their corresponding margins of error.

Step #4 The data was opened in a spreadsheet program to get a full view of all columns and headers. The data was cleaned and prepared for the final loading to R for analysis.

The Code

R Console > New Project

install.packages(“readxl”), library(readxl) #Read Excel files

install.packages(“dplyr”), library(dplyr) #Data manipulation and Transformation

> colnames(data_charts) #See Columns Names

> data_charts <- census_data_med_insurance_state_age_v06

> long_data <- tidyr::pivot_longer(data_charts,

+                                  cols = c(“Employment_Based”, “Direct_Purchase”, “Tricare”, “Medicaid”,”Medicaid_Private”, “Private”, “VA_care”, “Uninsured”),

+                                  names_to = “Insurance_Type”,

+                                  values_to = “Count”)

> ggplot(long_data, aes(x = Age_Group, y = Count, fill = Insurance_Type)) +

+     geom_bar(stat = “identity”, position = “stack”) +

+     labs(title = “Stacked Bar Chart of Medical Insurance Coverage Across Age Groups”,

+          x = “Age Group”,

+          y = “Count of People with Insurance”,

+          fill = “Insurance Type”) +

+     theme_minimal() +

+     theme(axis.text.x = element_text(angle = 45, hjust = 1)) #The Chart Builder Code

> long_data$Age_Group <- factor(long_data$Age_Group,

+                               levels = c(“All people”, “Under 19 years”, “19 – 64 years”, “Under 65 years”, “65 years and over”))

> ggplot(long_data, aes(x = Age_Group, y = Count, fill = Insurance_Type)) +

+     geom_bar(stat = “identity”, position = “stack”) +

+     labs(title = “Stacked Bar Chart of Medical Insurance Coverage Across Age Groups”,

+          x = “Age Group”,

+          y = “Count of People with Insurance”,

+          fill = “Insurance Type”) +

+     theme_minimal() +

+     theme(axis.text.x = element_text(angle = 45, hjust = 1)) #The Chart with Grouping

Analysis: Chart

Interpretation

The chart underscores the influence of age on the type and extent of medical insurance coverage. Employment-Based insurance is prevalent in the working-age population, Medicaid coverage is vital for the youngest and oldest age groups, and a significant uninsured segment exists among the “19 – 64 years” age group. This analysis could inform targeted interventions and policy-making to address the observed disparities in insurance coverage across age groups.

Leave a comment