Mining the 2022, Census of Agriculture – What the Function

The first federal census of agriculture was conducted in 1840. It has been conducted for 184 years. The oldest hardcopy of the US Census of Agriculture in the Biomass Rules collection is from 1982. We have been reading and analyzing this data set for the last 40 years. Over that time, the data in each census dataset has become increasingly available electronically in digital formats.

The first complete data download Biomass Rules conducted was from the 2012 Census of Agriculture. Every Ag Census cycle Biomass Rules gets a little closer to creating a set of user-friendly files. These public access files have had a different format with each 5-year census. But the 2022 Census of Ag is the most accessible census dataset. By transposing and cleaning data from 2012, 2017, and now 2022 Census of Ag files, Biomass Rules is getting more efficient.

A goal is to create more easily accessible ag census data files for clients to use. We are not quite ready, but the journey is worth sharing.

While the Census of Ag makes the census data available by different categories related to land area, economic class, and years of production, the numeric value of these discrete categories is locked in the labels. The USDA delivery is very appropriate, but Biomass Rules wants more. Over Labor Day Weekend, Biomass Rules ‘celebrated’ by mining text labels in the public USDA files. Mining may sound unethical, but it is simply liberating useful information from data.

Step 1: USDA Census of Ag data labels

The numeric category information is imbedded in the downloadable data label. It is inappropriate to use these categories as anything other than categories, but the breaks within each category are not equidistant. Accessing the numeric qualifiers can add value to the associated data values, which can be liberated from the text string. This post identifies the formula used within a single sell to pull out the starting point of the range. In an adjacent cell a similar text manipulating string will pull out the end of the range. Affectively, this function coding picks a text reference – in this case ” TO ” with a space before and after the TO – and counts the characters before that reference.

The code to separate out this range is: =IFNA(IF(ISBLANK(O84)=FALSE,TRIM(RIGHT(TEXTBEFORE(O84,” TO “), LEN(TEXTBEFORE(O84,” TO “))-2)),””),””)

Yes, it is a lot. But SOOO powerful.

Breaking it down, the tools are listed in the upper right corner of the attached image. There are several steps that are not included in this note. One is using the colon (:) character in the label to split the USDA Label into two parts using the Excel text-to-column tool. This creates some cells in the column of label parts that are empty, or blank. There is other data in the USDA data label besides the category information. When this text function cannot find the target text, ” TO “, it returns a #N/A code. Excel functions allow the use of this knowledge to extract the useful range start and finish (Begin/End). But this code is just for the starting part.

Step 2: Nested Excel Functions Used

“” – This is an empty text string. Strings differ from code instructions by being separated by double quotation marks: one for the start of the string, the other for the end.
IFNA(if,then) – An if then function for the error code, #N/A, in this case it is set up to do something if it is not #N/A (=FALSE). If it is not #N/A, the code parses the starting range. When it really is an #N/A code, then the last element of the equation is a ,””). The ‘then’ instruction is if it is not false, or true, insert an empty string.
IF(if, then, else) – If, then, else. This is the meat of the instruction. If the string, ” TO ” is present, get to work. If it is not blank, look at the characters before the target string, then count them and keep that number of characters. (The characters must be counted/calculated to get the length of the string to keep).
ISBLANK(cell) – This returns a TRUE or FALSE.
LEN(length of string) – Counts the characters in the string.
TEXTBEFORE(characters before, text) – Looks only at the characters BEFORE the target string.
RIGHT(cell, character count) – Starts counting characters from the right.
TRIM(text) – This function removes extra spaces. It is cheap insurance since it is difficult to see spaces, but in coding they are a legitimate character.

Step 3 and 4: The inside-one-cell function code by purpose

As complicated as the initial string appears, when it is broken down into its components, it is more intuitive. The very linear part of coding is that each beginning function element, must have an ending element. All the paratheses need to have matching sets. And the ending parenthesis must be where the function is expecting it to be. Building these strings is an art that requires lots and lots of troubleshooting.

The code below the line is the money-maker in pulling out the numeric labels. The code above the line are the conditions that take over if the parsing of the desired numbers cannot take place.

It may take a few tries to make this new series of blog post/papers work, but I like this emerging, “What the Function” series. It is more about applying Excel (and other software) tools to real life datasets. This 2022 Census of Ag download of USDA ag census data is the more intuitive presentation so far. Previous Census data tools and download from USDA, NASS, QuickStats, have provided the same data in less intuitive formats. Access is improving.

Public data is there for our benefit. And as my students were told, better to go recreate the chart presented in the news yourself, than take the interpretation on face value. As an applied economist, even my brilliant economist colleagues present findings I need to verify myself. I hope to add to this in these nerdy posts.

Biomass Rules

Adding Clarity to All Things Bio!

Mining the 2022, Census of Agriculture – What the Function

Comments

Mining the 2022, Census of Agriculture – What the Function — No Comments

Leave a Reply Cancel reply