Friday, November 6, 2015

Data Normalization, Geocoding, and Error Assessment

Goals and Background 

The objective of this exercise is to further our understanding of geocoding.  Proper geocoding requires the analyst to normalize the input data.  Normalizing the data is required for the geocoding tool with in ArcMap to "properly" function.  After geocoding the analyst must examine the errors inherently associated with geocoding.

This exercise had me geocoding the location of 20 of 129 sand mines in Wisconsin given to us from the Wisconsin Department of Natural Resources (DNR).  The entire list of mines was split between all of the members in our class, allowing for the mines to be geocoded by 3 different students.  Having multiple students geocode the locations will display if there was an error in the process by one or more of the students.  Our professor kindly removed the x,y coordinates from the file to simulate real world situations of acquired data.  In the following Methods section I will discuss the variety of ways I utilized to geocode the list of addresses I was given.

Methods
The first step in the process was to copy the information of my 20 mines from the original Excel file into a separate Excel file. You can see in (Fig. 1) there is multiple addresses columns and multiple formats within those address columns. 


(Fig. 1) Portion of the Excel file received from the Wisconsin DNR.

After copying my 20 mines into my own Excel file I created new columns to separate out the addresses from the city, zip codes, state, and the Public Land Survey System (PLSS) information.  Many of mines in my group had no address and only had the PLSS information.  I also eliminated the fields not pertinent to geocoding. Looking at (Fig. 2) you can see I created a separate field for each portion of the address.

(Fig. 2) Portion of the table I normalized from the original data.

After normalizing my data I ran the geocoding tool with in ArcMap.  My instructions were to use the World Geocode Service (ArcGIS Online) when geocoding in ArcMap.  After the tool ran I had results of 15 (75%) matched, and 5 (25%) unmatched addresses.  Now these results are a little misleading.  After using the Zoom to Candidates in the Geocoding Review/Rematch Address screen, I found only one of my mine locations was correctly depicted on the map with the actual location.  The meant I would have to locate all of the mines on the map manually using the Pick Address from Map feature.

Due to the approximation of the tool many of the locations which had actual addresses in the table were in the ball park area but needed to be adjusted for precise location.  The precise location is desired for the most accurate analysis possible when we use this data in later exercises.

Additionally, I was verifying the locations using a ArcMap basemap which was not as current as I would prefer.  Many of the mines were not actually depicted on the basemap.  I utilized Google Earth which has updated images to see if the mine was depicted for comparison against my basemap.  I also used Google Earth to check addresses which were not found using the geocoding tool.  The majority of the address I input into Google Earth brought me to the direct location I was looking for.  Using Google Earth for reference, I would locate the same area on my basemap and adjust the geocoded point to the correct location using the Pick Address from Map feature.

For the mines with no address, I was provided PLSS data.  Using feature classes of the Townships, Sections, Quarter Sections, and Quarter Quarter Sections, from the DNR geodatabase on our campus servers, I located the approximate locations of the mines.  After narrowing the location down using the PLSS system, I used the basemap image to find the actual location of the mine.  Being the basemap was not current not all of the mines were depicted.  I used Google Earth as I did before to locate the mines I could.

After manually checking and locating the 20 mines I was assigned my results table was finally 100% (Fig. 3).  This does not mean all of the mine locations are depicted in the correct location.  A few of the mines were not visible on the basemap or Google Earth.  Using a combination of address (if available) and the PLSS information (if available) I chose the location for the mine on the map.

(Fig. 3) Geocoding Result chart from ArcMap.

Results

The final task for this assignment was to compare my results against my classmates who geocoded the same mines I did.  However, 2 of the 4 class members did not turn in their shapefile for me compare against.  Using the mines from the other 2 people I was only able to compare the accuracy of 9 mines.  Being able to only compare 7 of 20 mines doesn't give a good assessment of the accuracy between my points and the other peoples points.  Even with only 7 mines I could see something was off with one of the mine locations (Fig. 4).  All of the points from my classmates were very close to exactly where my points were located except for one.  You can see in (Fig. 4) the points along the west edge of Wisconsin do not line up.  After further investigate it was my point which was incorrect and my classmates was correct.
(Fig. 4) Comparison between my geocoded locations (green triangles) and the 7 mines from other classmates (blue dots)
Due to a lack of information to compare to I didn't calculate an average of distance error but I utilized the Near tool in ArcMap to complete a quick calculation of the distance between my points and those from my classmates (Fig. 5).  The row in the chart with Maiden Rock is the point location which I made a mistake on, and you can see by the last column the distance error is the largest.

(Fig. 5) Error chart for the distances between my points and my classmates points.

After geocoding we were given the shapefile with the actual locations of the mines to compare to our points.  Since I had point locations for all of the mines in Wisconsin I selected out my mines by their unique id and created a new feature class of just those mines.  I created a map comparing my geocoded point locations and point locations of the actual mines (Fig. 5).  Looking at the map you can see the same point from my previous analysis is off verifying my point is the one which is wrong. As you peruse the map you will find a few other of my points which are not exactly where they should be.  The scale of the map does not precisely show the distance variation between the points.  I will examine the reason for errors in the discussion section.

(Fig. 6) Comparison of my mine locations with the actual locations of the mines.

After plotting the points on the map I again used the Near tool in ArcMap to calculate the distance between my points and the actual locations of the mines.  After the calculations were made I added the values to a new column in Excel to the corresponding mine.  Then I calculated the average distance of error between all of the mines.  As you can see from (Fig. 7) the average distance of error was ~1713 meters.

(Fig. 7) Error distance of each mine along with the average calculation.

Discussion 

There are a number of factors which contribute to the error of the mine locations.  Digitization of the location was an inherent and operational cause of error.  The points from the DNR were centrally located within the mine area.  I was instructed to locate my point at the driveway entrance of the mine for future roadway analysis.  You can see the variance in location from the driveway entrance and the central point of the mine (Fig. 8).

(Fig. 8) Large scale image of a mine location and the variance of my point and the DNR location.

Inherent errors are very typical in geographic data.  How you represent a location on the map with a small point when the actual object (in this case a sand mine) is not point shaped or the same size. Also, each person creating the map with choose a different point style and choose to locate that point based on their own purpose.  If the map designer needed highly accurate locations geocoding is definitely not the proper way to locate points on a map.

The locations in which were a great distance off was an error on my part.  Somewhere along the process I either missed relocating the point or never even looked at it.  I felt I went through the complete list more than once but I was obviously wrong.  I never went back through to double check all of my locations were correct.  This would have been a simple error to catch had I gone back through the list.

I would have liked to compare my location with those of my classmates further.  However, their failure to complete their task left me unable to fully complete mine.  I fell this is a good reminder lesson of how depending on other can go wrong.  In the real world there will be people who don't get their information in on time which could jeopardize the entire job.  I feel this should be taken into account when planning the time frame for completing a job.

Conclusion

Overall this lab was a great learning experience of many aspects.  Data organization from outside sources may not always be in the best format for your use.  Understanding your platform allows you to best prepare your data for analysis.  Geocoding (and maps in general) are a generalization and no two people will map the same locations the same.  There will always be a variance in locations on a map unless you use an accurate GPS location of your desired point.  Preparing the locations per the task is the best way to achieve accurate results.


No comments:

Post a Comment