State Summaries From Plant Level Electricity Data

Home Page

In this analysis, we will explore the preprocessed data for the electricity dataset from the EIA bulk file download website. This dataset contains information on the electrical generation, retail, and quality statistics for U.S. plants. With this formatted data, we can take easily aggregate data and provide some general summaries. Here we will focus on the net generation by fuel and stat, and try to extract some meaningful insights from those attributes

Initial Setup

There are a number of metrics that can be analyzed in this data set, but for this report we will focus on just a few. Specifically, we will look at the net generation from fuel sources and customer account data to provide overall, and normalized, metrics.

Net Generation

Filter and Join Data

Format Data

State Summaries

Net Generation with State Subcategories

For a first look we can visualize how much generated electrcity came from each fuel type. In the following plot, we visualize the 2019 consumption quantities for every plant in the U.S., aggregated by state. Right away, we can see that there are some major fuel sources, (e.g. natural gas, nuclear, coal) as well as many smaller categories. Note, you can click on individual boxes to zoom in on the data, and click on the outer box to zoom out.

In order to simplify the dataset, we will collect all the coal and renewable categories into individual groups, as well as all the minor categories into an 'other' category. Here we see a cleaner picture of the general fuel consumption. Natural gas, coal, and nuclear are the predominant sources, and the primary consumers appear to be the larger states, like California and Texas. Later on we will take a look at normalized consumption, but for now we will continue with the raw totals.

State Categories with Fuel Source Subcategories

We can take a quick look at the converse treempap which groups the consumption by state, then fuel source. One interesting concept here is that most states have a mixture of the 'big three' fuel sources (natural gas, coal, nuclear), adn almost no states rely totally one one type of fuel.

Evaluating the totals is useful; however, it is also useful to see how the fuel consumptions change over time. For this next plot, we will compare the current year (in this case 2019) to averages of previous years. In order to smooth the data, and avoid comparing any off-years, we will get the difference of the current five year average (i.e. 2015 - 2019) and compare it to the previous 5 year average (i.e. 2010 - 2014).

First we can take a look at the net generation that increased from the previous period to now and see how that relates to the fuel sources used to create the energy. It is evident that there has been a large increase in natural gas and renwable consumption, with nuclear power following behind.

In the plot of decreasing consumption values, the major fuel source being reduced is coal.

Normalized Consumption by Customer Account

Join And Clean Customer Data

First, we will check to make sure that the customer account data is complete enough to join on the above net generation data. From the preliminary check we can see that the latest year, for any state, containing nulls is 2016. In fact, only one state has null data for 2016, which is Arkansas. Given the small amount of missing data, we can we can do our own method of using a straight line between the two endpoints of missing cells without making any complex functions. This is not totally necessary for the next set of analyses but will be useful for future time series studies.

Now, we can do a simple when statetement to replace the missing values, and perform a quick output check and see that sensible values have been imputed into the null fields and that the dataframe, in the original format, now shows no missing data (2007 and previous years have been removed).

So now we can normalize the metrics by dividing the state-wide generation values by the total number of customers for each fuel. This will provide values expressed in net generation (kWH) per total state customers and provide a clearer picture when it comes to state-to-state comparisons. Note, this metric is not the net generation for each customer account served by an individual plant, which is unfortunately not available. Therefore, larger plant values correspond to larger net generation values in the state.

Befofe we get too far ahead of ourselves, we should check that the transformation is actually correct. Some simple aggregations reveal that the overall summed kWH_per_total_state_accounts multiplied by the total state number of customers for 2019 is approximately 3.587e+09 MWH, which is close to the 3.588e+09 MWH reported in the state summaries above.

While not all customer accounts are the same, it is a first attempt at normalization. In fact, we can see some interesting results in the outputs below. First, net generation per customer account from coal is the largest group in the fuel type summary. In the state grouping sumary we see that the largest net generation per customer account belongs to Wyoming, West Virginia, and North Dakota; all of which are predominantly coal consumers.

Additionally, it is interesting to find out that only three (WY, WV, KY) of the five biggest states that mine coal (WY, WV, PA, IL, KY) use it as the primary fuel for generation.

With a little additional work, we can visualize the states which use coal as the primary source of generated electricity, and by how much. Pennsylvania doesn't use coal as the primary source of net generation, but the remaining major coal mining states do. It appears that whether or not a state mainly uses coal is a regional factor, but further analysis would be required.

For fun, let's also plot the states where the primary generation came from natural gas and nuclear power.

Summary

We have learned that natural gas is the primary fuel source for the U.S, and that larger states, in terms of net generation, do not typically rely on a single fuel source. Also, coal is the fuel source that is on the largest decline over the last five years and it appears to be replaced mostly by natural gas and renewables. When we try to normalize the data by customer accounts, we see that coal is the primary fuel source for states which have the largest net generation per customer, which indicates it might be used by heaver consumers or feed grids with higher levels of transmission loss. Some plots have been provided which visualize the percentage of fuel used for net generation for the three major sources (coal, natural gas, and nuclear) to expose any possible geographical clues.

An important note to make is that this is summary data, which means that further detail can be uncovered from drilling down to the plant level data, including the time series monthly data. This general summary provides some insight, but really useful information is still yet to be found in the low grain of this source data.