Tuesday, May 17, 2016

Raster Analysis and Risk Modeling

Introduction

The purpose of this exercise is to continue our analysis of the impacts of Frac Sand mining in Western Wisconsin. For the purpose of this exercise we will be looking at the impacts on Trempealeau County. We will be using various raster geoprocessing tools to build models for sand mining suitability, as well as sand mining impacts to the local communities and the environment.

The source of this data will be from the Trempealeau County Geodatabase, which we downloaded for an earlier exercise.

First we will be accessing sand mine locations based on known criteria of what is logistically and physically the best conditions for establishing a new mine in Trempealeau County.

Next we will be using features of Trempealeau County to measure the impacts a new sand mine could have on the community and Trempealeau County in general.

Methods

In order to access sand mining suitability and determine where in Trempealeau County new sand mines could be located we examined the following factors:

Geology:

The specific type of sand which is used for hydraulic fracing comes from the Jordan and Wonewoc Geological formations

Land Use Land Cover:

agricultural (herbaceous planted/cultivated) land use, hay, cultivated crops, forest, and human development will all play a roll in suitable mining locations

Distance to railroad terminals:

Just like in exercise 7, the distance to railroad terminals will have logistical impacts on the County, as well as financial costs on both the County and the mining corporation.

Slope:

The general slope of the land makes mining easier or harder depending on the incline.

Water table criteria:

Sand mining requires the use of water, and as such prefers locations that have a water table close to the surface.

Each of these features had to be converted to a raster, and then reclassified based on a criteria ranking of our choosing, based on information of the mining process and were classified in the following manner; 3 = High or desirable criteria, 2 = Medium, 1 = Low or least desirable criteria.


FeatureAttributesRankJustification For Ranking
Geology suitabilityJordan/Wonewoc Geological Formations(Most) 3Geology that is desired
All Other Geological Formations in Trempealeau County(Least) 1Non desired geology
Land Cover Suitability Open Space, Barren Land, Shrub, Scrub, Herbaceous, Hay, Pasture(Most) 3Non developed space that can easily be replaced



Low Intensity Development/Cultivated Crops

(Mid) 2

Can be replaced with some costs or effort to set back to normal (other than time and natural Processes)

Open Water, Medium intensity to High intensity Developed, Deciduous Forest, Evergreen Forest, Mixed Forest, Woody Wetlands, Emergent Herbaceous Wetlands.(Least) 1Would require substantial effort to return to pre mine condition, or would otherwise be impossible to mine
Land Cover ExclusionOpen Space, Barren Land, Shrub, Scrub, Herbaceous, Hay, Pasture, Low Intensity Development/Cultivated Crops, Forest,  Medium intensity to High intensity Developed, Woody Wetlands1


Open Water, Emergent Herbaceous Wetlands, Woody Wetlands0Physically Impossible Locations
Distance from Closest Rail Terminals (Meters)30-12000 (Most) 3Closest distance to rail terminals
12001-20770(Mid) 2
20771-32877(Least) 1Farthest distance to rail terminals


0Mines should not be located



Slope0%-5.1% Slope(Most) 3Least amount of slope, easiest to mine
5.2%-12.4% Slope(Mid) 2






12.5%-37.3% Slope(Least) 1Most amount of slope, hardest to mine

Results



Figure 1 (left) and Figure 2 (right).  Figure 1 shows the distance of the closest mining terminals to Trempealeau County. Looking back at the exercise 7, we have already begun to understand the impacts logistics and transportation can have on an area. The further the distance traveled increases the cost for mining companies as well as damage to local infrastructure. In this map the rank of 3 denotes the area in which the least amount of travel would occur, maximizing efficiency for a mining company, and minimizing damage to infrastructure, and thus would be the best area to locate a new mine. Figure 2 shows the land cover of Trempealeau County that would be excluded int he final model. This land was excluded due to being physically impossible to mine. Areas such as open water, wetlands, or highly developed areas were all excluded in the final model.

Figures 3 (left) and Figure 4 (right). Figure 3 shows the geological formations of interest in Trempealeau County, the Jordan/Wonewoc formations (rank of 3, most desired), while all other formations ranked a 1 for least desired. Figure 4 shows the land cover again in ranks of most to least desired. Figure 4 shows the land cover classes as well as the rankings of suitability. For the reasons those features were chosen and the rankings assigned to each feature, please see the above table.

Figures 5 (left) and Figure 6 (right). Figure 5 shows the slopes of Trempealeau County which would be easiest to mine (rank 3) to most difficult to mine. Figure 6 shows classes of water table depth that would be easiest to access for the mining process, again with a rank of 3 being the areas with the easiest access to the water table.



Figures 7. This map shows all the criteria from figure 1-6 compiled using raster calculator to give an index of suitability for the county. Yellow indicates areas that are least desirable to mine, while blue shows areas that would be best to mine based on all of the above criteria. This map shows which areas that would likely to be chosen if a new mine were to be created.

Sand Mining Impact Criteria

Similar to the above models, knowing a bit about the make up of the county we can predict how sand mining will impact cretin areas. We wanted to look at not only impact on the environment but also on people, so we constructed a risk model that includes a noise/dust shed as well.

Feature Attributes Rank Justification
Streams Streams -1 through third  Order Streams 3 Smaller streams are more susceptible to impacts of sand mining as they discharge less water and not be able to handle increased sediment load
4th - 6th order streams 2
Greater than 6th 1 Mississippi is a 9 and the Amazon is a 12 on this scale. These Rivers would have less impact from sediment, if it were to be added. 

Prime Farmland Highly Erodible 3 Erodible land should not be mined as it will lead to greater erosion of the landscape, which may have impacts on vegetation, agriculture, road networks, and other impacts
Potentially Highly Erodible 2 Okay place to mine, may still cause environmental damage if erosion does occur
Not Highly Erodible 1 Best place to mine, least impact

Zoning Residential any type (30-2000 meters) 3 Where people would notice the impact of mining most is in their daily lives
Agricultural 2 While the impact may have more of an effect on animals, potentially finding sand in your corn is not good either
Industrial/Commercial/Utility/ Major road

1 Noise and dust wont add to much to factories or major roadways, 

Schools 0 - 4,000 3 Children exposed to dust and nose that disrupts learning. To put this in perspective 4000 meters is 2.5 miles
4,000 - 8,000 2 8000 meters is 5 miles. 
8,000 - 11,582 1 This is about 7 miles, while this is the best distance of the three it is important to note that any mine built in the County will be at a maximum of 7 miles to a school no matter where it is placed. 



Wild Life 30 - 5,000 meters 3 Similar break down to schools with close proximity having the most impact. Wild life areas have a larger distance than schools 
5,000 -1 0,000 2 because there are not as many
10,000 - 15,700 1 15,000 meters is about 10 miles.


Figure 8 (left) and Figure 9 (right). Figure 8 shows the areas of Trempealeau County that are designated as residential zones, and which would have the greatest impact if mines were to open in those locations. In this map the rank of 3 designates area areas of highest impact risk. Residential zones were chosen over land cover, due to land cover being slightly ambiguous, in that land cover only designated development and did not state if this development was residential. Figure 9 shows the distance from schools to the edges of the county. The Euclidean distance tool did not run to the full extent of the county, the implications of which are noted in the discussion below.

Figure 10 (left) and Figure 11 (right). Figure 10 shows the distance from wild life areas that would have the least and greatest potential impact to those wild life areas. As with the other distance maps, the closer to the wild life area the mine would be created would result in a higher impact to those areas. Figure 11 shows the streams of Trempealeau County. In the model the distance of streams with the most impact was supposed to be used, but due to the issues with the feature class to raster tool (noted in the discussion). The streams themselves had to be used. 




Figure 12 (left) and Figure 7 (right, from above). This map shows the areas that would be the most affected in Trempealeau County. Yellow and low numbers designate areas of least impact, while blue designates areas of highest impact. Comparing the two maps it would seem that the areas that would be best for mining in figure 7 designated in blue are generally the areas in Figure 12 that would have the highest impact. The location of a new mine would most likely be located in areas of blue in figure 7 and areas of yellow in figure 12 which overlap. 



For the weighted risk model, a Python Code was used, or rather would have been but unfortunately it would not run  (most likely my error). So the model was ran directly in ArcMap. Using the same equation, the factor that was chosen to weigh more heavily in the model was Wild Life Areas. Below is the Model Builder for both the weighted model and for the View Shed tool. 








Figure 13 (left) and Figure 12 (right). Figure 13 is the weighted impact model which emphasizes one of the criteria used in calculating the impact model over the others. The level of weight can be chosen by the user and in this case is a factor of 1.5. The factor that was chosen to be weighted was wild life areas. Wild life areas were chosen due to potential legal issues on the federal level if any of these areas contains migratory birds, it would be a nightmare for a mining company to attempt to establish a mine near that location due to the federal migratory bird act of 1918. It is important to note that at this point in time it is unknown if migratory birds use these areas, these wildlife areas were chosen because I felt that this point may be over looked and that residential or farm areas would have advocates for both of those locations inherently from the who lived there, while people might overlook wildlife areas. 


The view shed tool calculates a persons ability to see the area around them at a chosen point. For this model we were looking at the impact a new mine might have on something a little more intangible, the scenic view of the area.


Figure 14. The view shed tool. This map shows what the view from a location would be at a chosen scenic point. For the scenic point I chose Tamarak Creek Wild Life Area in Trempealeau County.  It is important to note that limitations exist with the view shed tool and will be discussed below. 

Discussion:

In full disclosure some issues which occurred in building this model did effect the outcome. Not that the outcome is biased, the outcome should be understood in the light of these issues. First, the Raster Calculator in arcmap did not run the model to the full extent of the county. The issue seems to stem from the School Distance layer, which truncated itself to a lesser extent than the county for an unknown reason. When using Raster Calculator, not having values for the school layer in these places resulted in the gaps in the model which are seen in the maps. The fix for this issue is unknown at this time, but it is possible that mask for the raster was some set to different limits than for all of the other layers, which resulted in the truncation. Attempting to rerun the tool many times with different masks did not fix the issue.

Secondly, the Feature to Raster tool resulted in much anguish while constructing the model. The tool would run for 15-20 minutes and would often not complete and would crash the program. During these crashes, the first map document was corrupted and would no longer open, taking a model builder for the first section with it. The results of this tool not working well, changed the model in two ways, first the distance from residential areas could not be calculated so just residential areas were used. Secondly and similarly, the Streams layer was supposed to consist of streams that would be the most heavily impacted by having a mine located near them. This did not occur and again the model has different results than what the analysis of the distance from the streams would have produced.

What this means for map interpretation is that the Zoning areas and the streams should have been different. The Euclidean Distance tool should have been ran on residential areas, and on only streams of cretin sizes to measure impact on the distance of both of these layers.

Lastly, their is an issue with the view shed tool, however this issue is actually with how the tool is designed. The view shed tool is limited in a few ways, one with out Z data to know actual elevation the models predictions are going to be off. While the model can predict elevation, it will not take into account vegetation or structure, only the elevation of the point that the view shed is ran from, the elevation, and the curvature of the Earth.

Again, the results of the impact model are limited, but it is important to understand the limitations of the model, how the issues occurred in the model, and in the future, how those issues can be fixed.

Under standing the limitations has an ethical component, which is that as a map maker, we in the GIS industry do not mislead people who view our content or misrepresent our results.

Letting the GIS community and people in general, know of the limitations of the model accomplishes two things, first as stated above it honestly states that the results are not misrepresented, and second, it allows for others in the GIS community to build on the model or help in correcting any errors that have occurred.

Conclusion:

While not perfect, GIS maps help to show off the modeling potential of GIS. We can take data, and just with known information we can attempt to understand or explain geospatial phenomenon of all kinds. This can extend to other disciplines, as well as everyday problems, and goes to show why GIS is becoming such an important tool in our everyday lives.    

While showing a partial picture, these maps help shine a light on some of the impacts that may occur if new frac sand mines would be built in Trempealeau County. Knowing the impacts, people can make more informed decision on both the local and County level as to where a mine should be located and the potential impacts on the general population.

Sources:
Wisconsin DNR
Trempealeau County Geodatbase (from the Trempealeau County Website)



Thursday, April 21, 2016

Network Analysis

Introduction

The goal of this project is to preform network analysis, in order to calculate the impact of trucks on local roads as they travel to and from mines to rail terminals in Western Wisconsin.

As one could imagine, the increased traffic of trucks moving from mines to railroad terminals has increased and speed up degradation on local roads in Western Wisconsin. Weather trucks are full of sand or empty, the routes from mines to rail terminals will be traversed many times per day, causing increased wear and tear on local infrastructure that may not have been designed for that level of traffic. Using the network analysis tools that are built into ArcMap will allow us to determine the fastest and most likely routes between the mines and the rail terminals. With this information, we can model and estimate the amount the increased traffic from trucking along specific routes, and then attempt to capture the true cost of the routes and upkeep which local municipalities will pay for.

One may think that the impacts of increased traffic may be negligible, however based on current industry analysis in the White Paper, Transportation Impacts of Frac Sand Mining in the MAFC region: Chippewa County Case Study, "At full build-out, the frac sand mining industry will be characterized by mining twenty-four hours a day, five days a week, heavy truck moves over rural roads, and unit or manifest trains moving approximately 40 million tons of sand a year..." (5), this report also predicts that the investment levels of frac sand mining predict a "20 to 30 year life span of the emerging frac sand industry" (5).

As you can see, the impact on the road system will not be a small one or of a short duration, with trucks moving back and forth across these roads almost constantly. Also noted by the White Paper is that, "Wisconsin serves as a model of how local government is using road use or road upgrade maintenance agreements (RUMA) to recover road damages, fund maintenance, and grade crossing improvement" (5). 

But we cannot just consider transportation of the frac sand, the construction of the mines, along with the hauling of heavy mining equipment, waste and other factors such as, "well construction, cement, steel pipes, rig infrastructure, as well as mobile offices are needed" (7),  and also contribute to road impacts due to truck transport. 

An example of the true impact of trucking predicts, "a conservative estimate of truck moves associated with a single well consist of 1.340 one-way truck trips to establish the well, or 2,680 round trip truck movements" (8). This White Paper also uses estimations of truckload number in various mine activates (Fig 1) and truck impacts on the local infrastructure based on type of truck movement (Fig 2).


Fig 1. Low and high estimates of typical number of truck loads to complete various activities involved with mining. These estimations are cited in the Transportation Impacts of Frac Sand Mining in the MAFC region: Chippewa County Case Study (8). See sources below.
Again, there is also the consideration of more than just frac sand transportation when it comes to truck transport. Each facility will require specific site equipment as well as infrastructure to keep the site running, in addition to the movement of replacement parts, and heavy mining equipment. Fig 2 illustrates some what these additional trips may be. 


Fig 2. Types of sand mining operations and transportation impacts based on the movement of trucks inside or outside of the mine.These estimations are cited in the Transportation Impacts of Frac Sand Mining in the MAFC region: Chippewa County Case Study (8). See sources below
With all of the different needs of the mine, as well as the movement of frac sand, one could imagine a substantial amount of wear and tear on local roads. Once the frac sand is trucked from a mine site, the sand is driven to a rail terminal, loaded on to a rail car, and sent to where the sand is needed to undergo the process of hydraulic fracing. As the map below illustrates this movement in Chippewa County. Rail terminals may only be accessed by frac sand trucks traversing State Highways and local roads (Fig 3).

Fig 3. Chippewa County Frac Sand Facilities. It is important to note that almost all facilities, rail or mine, are situated on the  a State Highway. Railways tend to be more centrally located near a city, while mines tend to be far away from that city. See sources below. 


In this analysis we will begin to look at what this impact may be, in terms of a hypothetical model, which will just focus on the impact of frac sand transportation. During the establishment of the mine, and while the mine is active, the amount of equipment which comes in compared to the amount of frac sand that goes out will be relatively minimal, as most of the heavy lifting of equipment will be when the mine opens and probably won’t be decommissioned or moved unless it is replaced or until the mine is inactive and shut down. When compared to the amount of trucks moving frac sand, which as stated previously, never stops and occurs at a rate that is much greater than the movement of equipment, and thus will be more impact-full on local infrastructure, and easier to demonstrate.
 
It is important to note that the our analysis of trips and cost will be one that is hypothetical and does not reflect real world data, but rather the process of building a project, and undergoing the analysis of this set of information, which can then be used with real world data and examples. 

Methods

For our hypothetical model, we will be using the data that we have gathered over the semester (published in previous blog posts) as well as a new analysis we have not previously been exposed to, Network analysis.

For our model we will be using the following information: That the trucks transport frac sand from the mines to the terminals 50 times per year (100 times total; round trip), and that the cost of the impact of the trucks on the road is $0.022 cents per mile.

In using network analysis we can use ArcMap to calculate the routes that would most likely be used by the frac sand mining industry to transport frac sand from mine to rail terminal. This essential works on the premise that the most direct routes are the most cost effective, and will be the routes taken by the trucks when the frac sand is moved.

In order to automate the process and undertake the analysis faster, we developed this project in two parts: first to write a python script which would select the mines in Western Wisconsin of Interest, and second to use Network analysis and model builder to automate the process of determining truck routes.

With both the python script and the model builder, new sites could be added to this analysis to reflect the change in mines over time, and offer a continued analysis with new information. Additionally, this analysis could be shared and validated by sharing either the python script or the model for someone else to undertake their own analysis.


For the first portion of this project we developed a python script that would….
  1. Select facilities that were in active status 
  2. Select from the list of total facilities, the sties that were actually mines
  3.  Create a feature class for facilities that were both active and mines which we could use for our analysis
  4. Select mines that are not within 1.5 km of their own rail terminal

For a picture of the actual script that was used, please see the page Python Code

The first few steps of the script were to narrow down the facilities to mines that were in active status from a larger list of facilities, create a feature class of those facilities, which would then be used in our later analysis.

The second part of the python code, was necessary for the reason that we are interested in mines that are not within 1.5 km of a mine, due to the fact that these sites have developed their own rail spur. Having a rail spur allows those mine locations to load frac sand directly on to rail cars, thus not needing to use trucks to transport the fracs sand off sight, and thus do not impact local road infrastructure, so we have taken them out of our model. 

Once our sites were chosen, we added a network layer of streets to ArcMap. Again the network layer allows us to determine the route locations from mines to rail terminals using the Make Closest Facility tool.

To do this step of the analysis we used model builder (Fig 4) which runs all the tools added to the model in sequentially, and much faster than using individual tools. 
Fig 4. Model Builder of the Network Analysis for Part 2. 

The first step in the model builder was select all of the Rail facilities that had 'rail' in their listed type, so that we would know that this set of facilities was indeed a rail terminal, and then we made a feature class of those Rail facilities. 

Then we began employing network analysis, by using the create closest facilities layer. In order to use the 'Make Closest Facilities Tool, we have to use the 'Add Layer Tool' twice, once for the facilities (rail terminals) and ounce for the incidences (mines). Then we can use the Solve Tool to get a solved routes feature class. By projecting the Routes feature class into UTM WGS84 Zone 15N, we can turn the linear distance of the feature to a unit that is more easily converted to miles, rather than using Decimal Degrees.

By intersecting that projected routes feature class with a counties feature class, we could then run the Summary Statistics Tool and determine the actual number for route length. 

At this point, the output of the Summary Statistics Tool created a table which gave us the information on the summed meters that traversed in each county by trucks. From this information we added a field to the table, using the Add Field Tool in model builder, which converted that distance in meters to miles. We then added two fields which similarly, calculated the cost of the miles traversed in the county, and then how much wear and tear over the course of the year the trucks traveling would actually amount too in dollars. The equation that we used via the Calculate Field tool was...
  • Cost of Travel = Route Miles* Cost to Infrastructure * Truck Trips per year * 2 (round trip!).
  • or: Cost of Travel = Route Miles * 0.022 * 50 *2
 The table is shown below (Fig 6)
Fig 6. Table with added fields for conversion of km to miles and calculation of cost and total costs per county.

Results and discussion

While the table in ArcMap works and has all of the information, cleaning up the table and using it to  make some graphs will help us to understand how local infrastructure is being impacted. To clean up the table the information was copped from ArcMap to excel (Fig 7).

Fig 7. Copied table from fig 6. which was entered and cleaned. 
After creating the table in excel, graphs were employed to visualize the data for a more thought out understanding of what the calculations done in model builder were actually telling us. 

In the Frequency of Facilities Per County Graph we can see many counties have between 0-10 mining facilities, while a few counties have under 5 and a few counties have more than 10. The counties with the most facilitates are Barron, Chippewa, Trempealeau and Wood. 


Fig 8.  Frequency of Facilities Per County, interestingly most counties have under 10 Faculties, which include both mines and rail terminals, while Wood and Barron Counties are in between 10 and 20, and Chippewa and Trempealeau counties have the most facilities with Trempealeau County having almost 1.5 times that of its closest county, Chippewa.  
 From the Total Route Miles Per County graph we can see that with the increased number of facilities in each county, there is an increased number of miles in the same counties that have the most facilitates from fig 7.  But oddly, the counties with more facilities, while all else equal have more route miles, than counties with less facilities, the increase in facilities does not mean that their is a positive correlation between facility number and route miles traveled per county. Specifically this is exemplified by the Chippewa and Trempealeau county which are both high facility counties, but have opposite route miles as their figure 7 counter parts.

Fig 9. Total Route Miles Per County. Here we can again see a similarly trend of Chippewa and Trempealeau counties having a higher number of route miles per county, but interestingly Trempealeau County has about half of the route miles as Chippewa County, which tells us that while Trempealeau has more facilities, these facilities are located much closer to rail terminals than in Chippewa County. We can also see Baron and Wood County with a higher number of route miles.  

 However as compared with the route miles to facility frequency graph, the cost of truck trips per year per county graph drive homes the point, that the more route length you have in the county, the higher the cost of impact that the frac sand industry will have on local roads in terms of dollars.

Fig 10. Cost of Truck Trips Per Year, following a similar trend as the route miles per county graph (Fig 8), we can see that the county impacted most by frac sand mining, in terms of road wear and tear predicted by our model is Chippewa County. We also see Trempealeau, Barron, and Wood Counties also have higher costs than the other counties in Western Wisconsin.  
Fig 11. Map of Trucking Impacts on Wisconsin Counties. The counties in green are low impact counties and will thus have the lowest wear and tear costs on roads, while counties in red have the highest wear and tear and thus the highest cost. 

As we can see from the graphs (Fig 8,9,10), the number of facilitates that a county has does not directly mean that their will be an increased cost associated with road maintenance from transportation of frac sand. It is perhaps more important to realize that the location of the rail terminal dictates more damage, such as the case in Trempealeau county, we can see that a centrally located rail terminal limits the amount of road use. The take away from this is really about planning, if a county is planning on expanding or creating mining operations and transportation to rail terminals of the frac sand mining industry, locating the rail terminal will limit the costs of maintaining roads in the counties which the trucks frequent.

With the addition of the network analysis map, we can visually highlight which counties specifically have the highest costs, and the placement of mines relative to the rail terminals. Both the graphs and the map reinforce each other in terms of their analysis and really drive home the point of spatia distribution of the mines and terminals, and not facility number as the main qualifier of road cost. 

Conclusions

This exercise serves as an example of the power of solving geospatial phenomenon using ArcMap. The ability of the user to understand ArcMap functions is paramount in solving real world problems.

Again, this is still a hypothetical model of potential impacts, that serves as a demonstration of only a portion of the traffic that these roads receive annually from the mining industry. But even with this limited information we have the ability to use ArcMap as a tool to understand the impacts of future decisions and evaluate future plans.

The other outcome of this analysis is in the road upgrade maintenance agreements (RUMA), between counties and mines to recover cost of road damage while the mine is in operation. Knowing how the rail terminals in each county are located can help the counties accurately conclude a correct contract with the mining companies to recover the true cost of damage caused to county roads.

Additionally we can also use this model to determine where a new terminal could be placed that would limit the amount of damage caused to roads, which would make both the mining companies happy, by decreasing their expenses due to the RUMA, and the city happy because they are not having to pay as much for road repair.



Sources:

  1. http://midamericafreight.org/wp-content/uploads/FracSandWhitePaperDRAFT.pdf
  2. ESRI street map USA is the source for the Network Dataset.
  3. The number of truck trips and cost of truck traffic on county roads was provided by Dr. Hupy

Friday, April 8, 2016

Data Normalization, Geocoding, and Error Assessment

Goals and objectives

            The goal of this assignment was to geocode address from a table provided by the Wisconsin DNR to locate frac sand mines and frac sand transportation facilities on a map of Wisconsin. Then after geocoding those sites, to compare the results of geocoding to actual site location data, analyzing any discrepancies when between the data sets. This may sound like an easy task, however it proved to be quite the opposite.

Methods

The information provided by the Wisconsin DNR was not as robust as one would hope. The facility information which was recorded had captured the basic naming and property status of each facility, in addition to having the site type and whether the site was active, but the location information specifically the site address was in shambles. Some of the locations were recorded in PLSS notation, while other sites had incomplete or missing information in their addresses, and some did not have any address information at all.

 
Fig 1. Example of information given to us from the Wisconsin DNR, the Address Column (yellow), shows the jumble of information formats that site addresses came in. 




So the first step naturally became to normalize the data in the table. In order to geocode the sites complete address information was required. Research was done to find that information by attempting to find the site locations and subsequent addresses by searching the PLSS records via the PLSS finder on the Wisconsin State Cartographer’s Office Website (See references) . Once the correct PLSS was located, the correct individual sub section which narrows down the zone in which the site would be located.

Fig 2. PLSS Finder via the Wisconsin State Cartographers Office. The Information of the zoomed in PLSS zone is listed under the Township/Range/Section Search, on the left hand side of the picture. 


After to attempting to locate some of the sights which did have actual address information, more research had to be done to relocate these sights, as the address was not specific or accurate enough to geocode the site correctly. These geocoded addresses usually were associated with a road that was a highway or county road which could have multiple names . When put into the geocoding tool in ArcMap the correct road was found but the address was not geoloacted in the correct place.

Once sites were located geocoded, we then received a shapefile from other students who had geocoded the same mines and facilities. We also received a shapefile with the actual locations of the mines. After running the "Near Tool" in ArcMap we were able to compare the distance of the geocoded mines that we had done to that of our classmates to attempt to determine how accurate and precise the geocoding of the mines actually was, as well as how accurate we were in normalizing the table provided from the Wisconsin DNR.






Results

As the results will show below, even if the data is normalized in way which is standard, concise, and regulated, the geocoding of that information will be drastically different based on the individual who has geocoded it. The best practice would be to start entering the information in a standard format as the table is being created. 

In talking with Dr. Hupy about this issue, she pointed out that this table may not have been intended for outside use, or in the making of the table, it was not tailored in a way that a GIS user would be able to use, but rather as a record. This explanation does make sense as to why the table is not GIS friendly, but even as a record for internal use, standard entry of addresses would make this table much more easy to understand for every user, not just GIS users. It is as important to standardize practices outside of GIS as to allow for anyone to look at and evaluate work. This includes information only one person would look at, I am sure that everyone has had the experience of looking back at something they wrote and wondering, "what did I write or mean by that?", if the information being taken down is recorded in a similar way then that dose not happen.

For GIS specifically the address need to be in a standard format for the geocoder to work, in the first table (Fig 3), the addresses (much like the first example in Fig 1) are not entered in a standard format even if they contain the same information.

Fig 3. Wisconsin DNR Excel Table with site information before normalization
 In order to normalize the data and have the geocoder work correctly, the individual components of the Address column need to be split up according to the individual components the geocoder uses to find that sites specific location.

As seen in Fig 4 (below) the individual components all require their own column with their own heading in order to be interpreted by the geocoder.


Fig 4. Normalized Table



As you can see the address has been split into sections such as street address, city/township, Zip code,  and County. Now that the geocoder can tease the individual address components apart, the site can be located.

Once located these geocoded sites were compared to the actual locations of the sites, provided from GPS data from the DNR (Fig 5 and Fig 6, below)

After mapping these sites other users geocoded locations were added to the map to compare the differences in the distances between all the users geocoded sites (Fig 7, below).


Fig 5. A map of the actual frac sand processing sites in Wisconsin. The orange squares represent individual sites.

Fig 6. A map of the locations that I geocoded as compared to their actual locations.


Fig 7. A map of geocoded locations of classmates as compared to the actual mine locations. 
Once all the sites were added to the map, the "near tool" was ran and distances were determined between the sites and the mine location. In order to evaluate these data an error table was generate (Fig 8, below).

Fig 8. Error table, compairing distance of geocoded locations among users for these specific mines.


The columns on the left report the actual distance (meters) that the geocoded site was placed compared to its location, with one column for each user. The statistics on the right are from the Excel Statistical analysis tool, and show the mean, median and standard deviation for each user as well as the minimum and maximum distance away a site was placed. I had one of the largest maximum distances from an actual mine location at 63106 meters, but every user had mean distance of over 2000 meters away from their sites on average.


Discussion 

As both Figure 8 (above) and Figure 9 (below) show the differences in the geocoding of the users. While everyone did have at least one site that was within 100 meters, and for the most part everyone was in the same county. But in actual situations where decisions need to be based off of specific locations the information provided to us would NOT have been accurate enough to base a decision on. With out any information (which was provided) to determine the actual locations of the mines there would be no way to determine if the locations we geocoded were actually correct. With a standardized input and data on how accurate geocoding is we could determine some type of standard error, to include in the data analysis but at this point in time we dont know those figures.

Fig 9. A close up of the geolocated sites for each user as well as the actual mine sites. 

There are two main sources of error in Geographic Data, Operational Errors, and Inherent Errors.
Operational Errors in Geographic Data Operational errors occur mostly during the operation of the procedures for collecting, managing, and using geographic data

Inherent Errors occur as a result of the special nature of geographic data. Geographic data, as representations of the real world in a certain data model, are necessarily incomplete and generalized.

In this assignment we have encountered both types of errors, from the start of the project the non standardized format of the table resulted in operational errors. Not having a set format or procedure for entering geographic data resulted in end users attempting to interrupt information that they had previously seen, which at best can push accuracy and precision of data off at the onset of the assignment and may propagate through out the rest of the operations.

Inherent errors occurred as gecoding addresses was partially based on the operational error of normalizing data but also from the inherent errors of the geocoding tool and the user using it. It is up to the user to determine if the geocoding tool has selected the right address based on the information at hand, if that information is missing or wrong, the user will tell the tool that the location selected is correct.


Fig 10. Sources of Error in Geographic data.
The errors that we are specifically encountering are Data automation and compilation and data processing. Specifically digitizing the mine sites, attribute data input, format translation. These errors propagate through to the analysis stage and result in the differences as seen in the figure 8 error table.  

Conclusion

If you ever have to record information that could be used in an analysis standardize the data entry, format, and be consistent when recording that information.


References

http://www.sco.wisc.edu/plssfinder/plssfinder.html