I was recently on a flight to Nashville and sat next to an Analysts working for Maricopa County, Arizona. The County I happen to live in. I am always interested in learning what others in related careers are up to so listed with great interest as she starts to talk about turnover the last few years.
Little did I know, it’s not just organizations going through the lean and agile revolution, entire counties are also subscribing to this model. Or at least trying, bureaucracy is a thorn in the side of anyone close to sensitive data.
A recent push to publicize public data is great for people like me as all I had to do was Google “Maricopa County Turnover” to be led in the right direction. The county has released yearly turnover by department and title stats every year since 2015.
Of course, these are in the hated .PDF format. As we are only dealing with 4 small files, I opted to use a PDF-Excel online service to transform the documents into something I can start to play with. In general, I stray away from these services for security reasons. There are other ways. In this case, as the information is public, I felt OK pushing them through. What I was left with were 4 readable Excel files, I spot checked them to ensure no garbled data before Ingress into Alteryx.
Once in Alteryx I created a year field for each file and unionized. I was fortunate to receive normalized fields and mostly normalized data. I removed erroneous header lines, Null Lines, Total lines and anything that was not pure data. The raw data was then run through a cleanse tool to further normalize by capitalizing and converting Nulls into 0’s for numeric fields. A small amount of find and replace logic was used to correct things such as “Flood Control” being called simply “Flood” one year.
Data was pushed into an extract for Tableau. A few tables here, vizzes there and we have a working product. This was meant to be a quick hit, fun personal project. Further development could be done on the UI and enrichments.