Goals and objectives
The goal of this assignment was to geocode address from a table provided by the Wisconsin DNR to locate frac sand mines and frac sand transportation facilities on a map of Wisconsin. Then after geocoding those sites, to compare the results of geocoding to actual site location data, analyzing any discrepancies when between the data sets. This may sound like an easy task, however it proved to be quite the opposite.
Methods
In order to normalize the data and have the geocoder work correctly, the individual components of the Address column need to be split up according to the individual components the geocoder uses to find that sites specific location.
As seen in Fig 4 (below) the individual components all require their own column with their own heading in order to be interpreted by the geocoder.
As you can see the address has been split into sections such as street address, city/township, Zip code, and County. Now that the geocoder can tease the individual address components apart, the site can be located.
Once located these geocoded sites were compared to the actual locations of the sites, provided from GPS data from the DNR (Fig 5 and Fig 6, below)
After mapping these sites other users geocoded locations were added to the map to compare the differences in the distances between all the users geocoded sites (Fig 7, below).
Once all the sites were added to the map, the "near tool" was ran and distances were determined between the sites and the mine location. In order to evaluate these data an error table was generate (Fig 8, below).
The goal of this assignment was to geocode address from a table provided by the Wisconsin DNR to locate frac sand mines and frac sand transportation facilities on a map of Wisconsin. Then after geocoding those sites, to compare the results of geocoding to actual site location data, analyzing any discrepancies when between the data sets. This may sound like an easy task, however it proved to be quite the opposite.
Methods
The information provided by the Wisconsin DNR was
not as robust as one would hope. The facility information which was recorded
had captured the basic naming and property status of each facility, in addition
to having the site type and whether the site was active, but the location
information specifically the site address was in shambles. Some of the
locations were recorded in PLSS notation, while other sites had incomplete or
missing information in their addresses, and some did not have any address
information at all.
| Fig 1. Example of information given to us from the Wisconsin DNR, the Address Column (yellow), shows the jumble of information formats that site addresses came in. |
So the first step naturally became to normalize the
data in the table. In order to geocode the sites complete address information
was required. Research was done to find that information by attempting to find
the site locations and subsequent addresses by searching the PLSS records via
the PLSS finder on the Wisconsin State Cartographer’s Office Website (See references) . Once the correct PLSS was located, the correct individual sub section
which narrows down the zone in which the site would be located.
![]() |
| Fig 2. PLSS Finder via the Wisconsin State Cartographers Office. The Information of the zoomed in PLSS zone is listed under the Township/Range/Section Search, on the left hand side of the picture. |
After to attempting to locate some of the sights
which did have actual address information, more research had to be done to
relocate these sights, as the address was not specific or accurate enough to
geocode the site correctly. These geocoded addresses usually were associated
with a road that was a highway or county road which could have multiple names . When put into the geocoding tool in
ArcMap the correct road was found but the address was not geoloacted in the
correct place.
Once sites were located geocoded, we then received
a shapefile from other students who had geocoded the same mines and facilities.
We also received a shapefile with the actual locations of the mines. After
running the "Near Tool" in ArcMap we were able to compare the
distance of the geocoded mines that we had done to that of our classmates to
attempt to determine how accurate and precise the geocoding of the mines
actually was, as well as how accurate we were in normalizing the table provided
from the Wisconsin DNR.
Results
As the results will show below, even if the data is normalized in way which is standard, concise, and regulated, the geocoding of that information will be drastically different based on the individual who has geocoded it. The best practice would be to start entering the information in a standard format as the table is being created.
In talking with Dr. Hupy about this issue, she pointed out that this table may not have been intended for outside use, or in the making of the table, it was not tailored in a way that a GIS user would be able to use, but rather as a record. This explanation does make sense as to why the table is not GIS friendly, but even as a record for internal use, standard entry of addresses would make this table much more easy to understand for every user, not just GIS users. It is as important to standardize practices outside of GIS as to allow for anyone to look at and evaluate work. This includes information only one person would look at, I am sure that everyone has had the experience of looking back at something they wrote and wondering, "what did I write or mean by that?", if the information being taken down is recorded in a similar way then that dose not happen.
For GIS specifically the address need to be in a standard format for the geocoder to work, in the first table (Fig 3), the addresses (much like the first example in Fig 1) are not entered in a standard format even if they contain the same information.
Fig 3. Wisconsin DNR Excel Table with site information before normalization
|
As seen in Fig 4 (below) the individual components all require their own column with their own heading in order to be interpreted by the geocoder.
Fig 4. Normalized Table
|
Once located these geocoded sites were compared to the actual locations of the sites, provided from GPS data from the DNR (Fig 5 and Fig 6, below)
After mapping these sites other users geocoded locations were added to the map to compare the differences in the distances between all the users geocoded sites (Fig 7, below).
![]() |
| Fig 5. A map of the actual frac sand processing sites in Wisconsin. The orange squares represent individual sites. |
![]() |
| Fig 6. A map of the locations that I geocoded as compared to their actual locations. |
![]() |
| Fig 7. A map of geocoded locations of classmates as compared to the actual mine locations. |
| Fig 8. Error table, compairing distance of geocoded locations among users for these specific mines. |
The columns on the left report the actual distance (meters) that the geocoded site was placed compared to its location, with one column for each user. The statistics on the right are from the Excel Statistical analysis tool, and show the mean, median and standard deviation for each user as well as the minimum and maximum distance away a site was placed. I had one of the largest maximum distances from an actual mine location at 63106 meters, but every user had mean distance of over 2000 meters away from their sites on average.
Discussion
As both Figure 8 (above) and Figure 9 (below) show the differences in the geocoding of the users. While everyone did have at least one site that was within 100 meters, and for the most part everyone was in the same county. But in actual situations where decisions need to be based off of specific locations the information provided to us would NOT have been accurate enough to base a decision on. With out any information (which was provided) to determine the actual locations of the mines there would be no way to determine if the locations we geocoded were actually correct. With a standardized input and data on how accurate geocoding is we could determine some type of standard error, to include in the data analysis but at this point in time we dont know those figures.
![]() |
| Fig 9. A close up of the geolocated sites for each user as well as the actual mine sites. |
There are two main sources of error in Geographic Data, Operational Errors, and Inherent Errors.
Operational Errors in Geographic Data Operational
errors occur mostly during the operation of the
procedures for collecting, managing, and using geographic
data
Inherent Errors occur as a result of the special nature of geographic data. Geographic data, as representations of the real world in a certain data model, are necessarily incomplete and generalized.
In this assignment we have encountered both types of errors, from the start of the project the non standardized format of the table resulted in operational errors. Not having a set format or procedure for entering geographic data resulted in end users attempting to interrupt information that they had previously seen, which at best can push accuracy and precision of data off at the onset of the assignment and may propagate through out the rest of the operations.
Inherent errors occurred as gecoding addresses was partially based on the operational error of normalizing data but also from the inherent errors of the geocoding tool and the user using it. It is up to the user to determine if the geocoding tool has selected the right address based on the information at hand, if that information is missing or wrong, the user will tell the tool that the location selected is correct.
Inherent Errors occur as a result of the special nature of geographic data. Geographic data, as representations of the real world in a certain data model, are necessarily incomplete and generalized.
In this assignment we have encountered both types of errors, from the start of the project the non standardized format of the table resulted in operational errors. Not having a set format or procedure for entering geographic data resulted in end users attempting to interrupt information that they had previously seen, which at best can push accuracy and precision of data off at the onset of the assignment and may propagate through out the rest of the operations.
Inherent errors occurred as gecoding addresses was partially based on the operational error of normalizing data but also from the inherent errors of the geocoding tool and the user using it. It is up to the user to determine if the geocoding tool has selected the right address based on the information at hand, if that information is missing or wrong, the user will tell the tool that the location selected is correct.
| Fig 10. Sources of Error in Geographic data. |
Conclusion
If you ever have to record information that could be used in an analysis standardize the data entry, format, and be consistent when recording that information.
References
http://www.sco.wisc.edu/plssfinder/plssfinder.html





No comments:
Post a Comment