Because we had many categorical variables with a large number of levels, we used association rule learning to explore the relationship between our variables of interest. From this method, we found that the following factors had the most meaningful associations:
However, we decided not to include medical school and primary specialty in our final model because medical school and primary specialty are highly correlated (think specialty-specific school such as chiropractic schools) and there were too many categorical levels of these variables to gain valuable insights regarding correlation.
Using other exploratory methods such as plots and correlation matrices, we eliminated other variables that could be used to study the effects on EHR usage. We decided to study the effect of gender and years since graduation on EHR adoption among physicians, since there was adequate data on these variables and they were evenly distributed among the two outcome groups (use EHR/don’t use EHR).
Using logistic regression, we found that physician gender and years since graduation have statistically significant effects on the use of EHR among practitioners in the incentive program. The results of our model are presented and summarized below.
Variable | Odds Ratio | Confidence Interval (lower) | Confidence Interval (upper) |
---|---|---|---|
Gender (Male) | 2.0266 | 1.9732 | 2.0815 |
Years Since Graduation | 1.0172 | 1.0163 | 1.01819 |
Locations | 1.0024 | 0.9978 | 1.0060 |
For hospital demographic information, we used correlation matrices to study the relationships between hospital demographics such as:
However, we found that these variables were highly correlated with each other and follow the same distributions by EHR use. This makes logical sense because these demographics are all related to hospital size.
After observing strong associations between gross patient revenue and the other variables, we decided to stratify our analysis on revenue to account for confounding. We used more correlation matrices and fitted multiple models to find the best combination of hospital demographics that influenced the adoption of EHR’s. Additionally, we rescaled the variables on the log scale to help normalize their distributions.
Using logistic regression, we added an interaction term between gross patient revenue and total discharges. We found that hospitals’ gross patient revenue and total discharges have no statistically significant effect on the use of EHR among practitioners in the incentive program, despite being highly correlated to EHR usage. The results of our model are presented and summarized below.
Variable (logged) | Odds Ratio | Confidence Interval (lower) | Confidence Interval (upper) |
---|---|---|---|
Gross Patient Revenue | 0.8679 | 0.14765 | 3.53518 |
Total Discharges | 0.8066 | 0.07675 | 4.58805 |
Interaction Term | 0.9967 | 0.87087 | 1.1898 |
Of the 265 vendors used by practitioners in our dataset, we found that over 50% of the products used were from the top 10 most popular vendors.
We compared the most common vendors to the most common primary specialties and found some interesting patterns:
Across the country, we can see that the most popular vendors are used very popularly in Minnesota, but not in New York, Tennessee, Rhode Island, and Nevada.
Interestingly, there tend to be younger (more recently trained) practitioners in Minnesota who use EHR compared to the rest of the United States.
Taking a closer look at the graduation year data in Minnesota, we can see that most physicians lived near Minneapolis. It also seemed like more recently graduated physicians lived near the city, while older physicians tended to live farther away in suburban or rural areas.