Geek Hiking – Page 3 – Sagebrush Trails

Trails and Graph Theory 18: Water Sources

Now that we have been able to create GPX files and a databook for our trip, a logical improvement would be to include water sources. OpenStreetMaps has several tags for identifying water, which seem to have been added over time in an ad-hoc manner. Let us include as many relevant options as we can find.

water_tags = {'natural':['spring','water'],
              'water':['pond','lake','reservoir'],
              'landuse':['reservoir','basin'],
              'amenity':['watering_place','drinking_water','water'],
              'waterway':'stream',
              'man_made':'water_tank'
        }

### ###########  FIND WATER SOURCES    #############
bbox_gila = (34.35704, 32.77935, -107.65983, -108.94574) # rough box of Gila NF
wgdf = ox.features.features_from_bbox(bbox=bbox_gila, tags=water_tags)

The water features are imported as polygons. That is probably more detail than we need, so let’s reduce the polygon to a point, with the view of being able to import a list of waypoints.

# I really did not want to learn about GeoPandas, but there you go...
water_points = wgdf.copy()
water_points['geometry'] = water_points['geometry'].centroid

We understand that reducing from polygons to points is an approximation, and might not work so well for rivers and streams and large lakes. Still, we approximate for convenience, and see if the results are close enough to be useful.

We should be able to convert the GeoDataFrame back to a “network” with only nodes and no edges, but our OSMnx library functions do not work well with “networks” with no edges. We could add one dummy edge, or just stay with GeoDataFrames. Eventually we are able to plot results on our map.

Zoom in…

With more than 9k water sources in the Gila imported, this is too many to be practical. Some waypoints seem to be multiple locations along a creek. Other points are so numerous in areas, that we begin to doubt their veracity.

Let us filter out all water sources that are more than X distance from our route.

WJ = WG.copy()

E,D = ox.distance.nearest_edges(JD,X=water_points['x'],Y=water_points['y'],return_dist=True)

max_distance = 2000 #meters
EARTH_RADIUS_M = 6_371_009 # new convenience notation for big numbers 
max_distance = max_distance / EARTH_RADIUS_M

# Not clear from documentation, but nearest_edges returns distance in units of
# earth radius if projection is lat/long.
for node,d in zip(WG.nodes(),D):
    if d > max_distance:
        WJ.remove_node(node)

With the distance filter, our water sources are down to 77. Here they are on the map, and when you hover over a source, it shows some description of the water.

We can also plot the water sources with our route. Water sources appear as darker blue dots.

We could try to add these water sources to our databook, but perhaps the easiest approach is just to import water source waypoints to our favorite GPX app.

Download source code here.

Trails and Graph Theory: The Series

Trails and Graph Theory 17: Dog Food

In the tech world, “eating your own dog food”, or “dogfooding“, is the practice of developers using their own products, as a form of quality control, and for insight whether one is building the right tool.

In this series, we have been discussing the graph theory problem of finding a long continuous loop in the Gila that does not repeat segments, and finally we have a path that is worthy of a long hike. This Spring (2024) I will attempt to hike this route, and see what problems and adventures are encountered.

[link to map full screen]

In the next few weeks I will post a few more entries about my trails graph theory effort, in preparation for the hike. In April, you will find me on the trail, provisionally named the Gila x Loop: GxL.

I recently learned that the Gila National Forest plans to designate their own loop trail to commemorate the Centennial of the Gila Wilderness. So my first naming choice, GCL Gila Centennial Loop, may be preempted.

Friends and family have helped suggest other possible names: GTL Gila Tangle Loop, GGL Great Gila Loop, GEL Gila Euler Loop, GGT Great Gila Tangle, GIL Gila Infinite Loop, GLL Gila Loop-de-Loop, GLL Gila Long Loop, GSL Gila Strange Loop, GRL Gila Random Loop, GAL Gila Algorithmic Loop, GNL Gila Nerd Loop, Gila Geek Loop, GCL Gila Crazy Loop, GWL Gila Weird Loop. What would you suggest? Perhaps we should stick with GxL, where x is an iterator over all name variants…

As with all my annual long hikes since 2012, you can follow my hiking journal on this blog.

Source code may be downloaded here:

‘max_trail_gap_road.py‘ adds another iteration algorithm for finding a maximum loop length.
‘output_gpx.py‘ takes a graph and outputs a gpx file for use with a navigation device.
‘add_towns.py‘ manually edits the maximum loop found to add in towns for resupply. Might better be done with a map editor.

Additional thoughts:

Yes, this trail does borrow heavily from the GET and CDT. Those trails already worked to establish long tracks, and my algorithm randomly found their results. I do hope this route offers attractions for hikers who already completed those trails.
I did add Pie Town to the loop, by including a rather long road walk, for sentimental reasons, and partly because my adopted section of the CDT is on this route. The town is also included to emphasize that the Gila really does extend almost as far north as Pie. Hikers should consider Pie Town as optional, and feel free to shortcut across.
The Gila may have high water crossings this year, which may change my route.
It is not at all obvious that long-distance hikers will be attracted to a trail that does not cover a long distance between start and end points. I think the Gila will offer plenty of variety, but we shall see how the trail is experienced and endured. The route is certainly different in form from most long trails.
With each inner loop, a hiker has a choice of which direction to go. Do I follow the default path randomly chosen by the algorithm, or make my own choice at each 4-way intersection not previously visited? It would be useful to add tools to define my own choices ahead-of-time to make the best route to space out supply points more equally and modify a databook. Alas, these tools have not been developed yet.
Or perhaps the novelty of choosing somewhat randomly a direction at each fresh 4-way, and not in advance, will have some attraction to other hikers, and distinguish this trail in some way? Could this be a feature / gimmick, so each hiker can traverse the route in a unique way, a choose-your-own-adventure hike?
Perhaps an app-of-the-future could handle the choose-your-own-adventure hike by automagically reconfiguring each time you made a choice at a fresh intersection?
An earlier version included Kingston, and I really wanted to keep that community on-route. But a later routing bypassed that place. I really need to learn a OSM editor app, to make small corrections and add towns back in.
Getting ready for the hike shows the need for several other tools, besides a trail editor. The databook needs to be able to be generated starting at any location, and either direction. Also, the Euler path has more than one solution, and could change depending on small variations in the graph. It might be best to enforce the option to go straight through 4-way intersections as default.
Another tool that I wanted to get working before starting the hike, but ran out of time, is to import all water (tank, creek, spring, etc) features and add them to the GPX and the databook.
I know several people that have deep knowledge of Gila trails. By selecting a route via an algorithm, I fully acknowledge that some of the trails included might not be the best choices.
Hike the GxL at your own risk. Do not try unless you have done several other long trails, alone.

Trails and Graph Theory: The Series

Trails and Graph Theory 16: Edge Cases

After adding in a few short sections of road to my trail graph, which should make it more connected and allow a longer route, I was noticing poor performance. The longest route was not getting particularly long, and typically a matching would not be complete– there would be leftover odd nodes, and also unmatched nodes, so eliminating matched edges would not result in a Eulerian graph.

length of min_weight_matching =  163
is perfect matching?  False
node  2510903794  not in matching
node  2834068719  not in matching

Let us remind ourselves of the algorithm:

Simplify graph (remove all 2nodes)
Remove ends (remove all 1nodes)
Make odd-node graph by deleting 4nodes
For all 4nodes in original graph, add temporary bypass edges to odd-node graph, which should still be odd-node. If it is not all odd-node, you did something wrong.
Match odd nodes in pairs, using a matching algorithm
Delete edge between matched pairs of nodes
Replace any temporary bypass edges that still exist with the original edges
Resulting graph should be all even nodes. If not you did something wrong!!!
Graph is an Euler graph, which means you can find a continuous loop that includes all remaining edges.

My previous code is not entirely correct. Here are some situations that are (or may be) breaking this procedure:

Parallel Edges

In NetworkX a simple Graph object allows only one edge between any two nodes, but we are using a Multigraph, which can allow more than one edge between two nodes. The parallel edges are identified with their own key value, which is an integer in the case of the OSMnx library. Instead of iterating through edges as with a simple Graph:

for u,v in Q.edges():

we include the key in our iteration.

for u,v,k in Q.edges(keys=True):

My early code was fairly sloppy about this, and often forgot about parallel edges, so several functions needed to be updated. Also, code has been added to count the number of parallel edges in our report() function, which increases as part of the simplification function.

84539  nodes  in largest subgraph
84730  edges  in largest subgraph
total length 1573.54 miles  in largest subgraph
number of odd intersections in largest subgraph : 606
number of even intersections in largest subgraph : 83933
number of 2-way intersections in largest subgraph : 83902
number of 4-way intersections in largest subgraph : 31
number of trailhead/terminus in largest subgraph : 145
number of parallel edges in largest subgraph : 0
number of 4-way adjacent pairs in largest subgraph : 1

354  nodes  simplified graph
554  edges  simplified graph
total length 1285.20 miles  simplified graph
number of odd intersections simplified graph : 328
number of even intersections simplified graph : 26
number of 2-way intersections simplified graph : 0
number of 4-way intersections simplified graph : 26
number of trailhead/terminus simplified graph : 0
number of parallel edges simplified graph : 67
number of 4-way adjacent pairs simplified graph : 11

Adjacent 4nodes

My original code for making temporary bypasses is not entirely correct. It does not handle the case of a 4node connecting to a 4node neighbor. How many of those situations exist? From the above report:

number of 4-way adjacent pairs simplified graph : 11

I could not patch the existing code to handle 4node neighbors, even after attempting recursion, and had to start from scratch 2 or 3 times to get something that partly works. And in the image below there are 3 adjacent 4nodes connected in a chain, which my code still does not deal with well.

To debug my code, it is best to go back to simple small graphs to use for testing, as we did in Chapter 6. Let us create a graph with two adjacent 4nodes:

My earlier algorithm gives incorrect results when trying to restore 4nodes, which is not an Euler loop.

After a lengthy amount of hair-pulling during debugging, my new algorithm gives the correct results.

The way we get the correct result in graphs with adjacent 4nodes is to create temporary edges between each of the neighbors of both 4nodes, which becomes something of a hairball.

I am not entirely convinced that my code handles all cases of adjacent 4nodes correctly. Errors are lurking. The related Chinese Postman problem does not have these issues, because you are adding edges, not deleting edges. Deleting edges often breaks things.

4node Neighbors with Parallel Edges

Playing around with my small graph, I discovered another configuration that broke my algorithm. Two 4node neighbors that each share two edges between them seems to be a problem.

One solution is to delete one of the parallel edges, converting the neighboring 4nodes into neighboring 3nodes, which is easy for the algorithm. Somehow, I suspect that is not the optimal solution, but it is easy to code, so I will use it for now, until I see a problem with performance.

Later, I found that a 3node connected to a 4node with parallel edges also has this same problem, so I fixed it in the same inelegant way. Sigh.

Self Loops

Multigraphs also allow self-loops. I added code to count how many exist.

def check_self_loops(Q,txt=''):
    self_loops = 0
    self_loop_distance = 0
    max_self_loop_distance = 0
    for u,v,k,dic in Q.edges(keys=True,data=True):
        if u==v:
            d = dic['length']
            self_loops += 1
            self_loop_distance += d
            if d>max_self_loop_distance:
                max_self_loop_distance = d
    print('number of self-loops ' + txt + ' : ', self_loops )
    print(f'total self-loop distance {self_loop_distance*0.0006213712:.2f} miles ', txt)    
    print(f'maximum self-loop distance {max_self_loop_distance*0.0006213712:.2f} miles ', txt)
    print('')

Here are the results:

number of self-loops simplified graph :  18
total self-loop distance 86.05 miles  simplified graph
maximum self-loop distance 16.78 miles  simplified graph

Compared to the graph mileage (1285.20 miles), the total mileage of self-loops is small. My code has trouble dealing with self-loops, especially calculating the number of edges on a node, so it is easiest to delete self-loops, and add them in later if we need to.

Difficult Nodes !!

A small number of 3nodes tend to be difficult to match with other 3nodes using the built-in matching algorithms in the NetworkX library, such as min_weight_matching().

Here are some nodes that are not matching, through several iterations:

length of max_weight_matching =  163
is perfect matching?  False
node  2509005592  not in matching
node  2834068719  not in matching

length of min_weight_matching =  163
is perfect matching?  False
node  2510903794  not in matching
node  2834068719  not in matching

length of no-weight min_weight_matching =  163
is no-weight matching perfect?  False
node  2509003421  not in matching
node  2834068719  not in matching

node  6226321975  not in matching
node  2834068719  not in matching

node  2510903794  not in matching
node  2834068719  not in matching

node  2510903794  not in matching
node  2834068719  not in matching

node  2510903794  not in matching
node  2834068719  not in matching

node  2510903794  not in matching
node  2834068719  not in matching

To do a fully optimized matching, we should create a complete graph, where all nodes are connected to all other nodes using temporary edges with the trail distances between any two nodes on the graph. I am taking the short-cut of only calculating the distances between nearby nodes, and only adding temporary edges to skip over 4nodes. This may leave a few nodes in isolation, that almost never get matched. On earlier simulations these nodes might get eliminated when a subgraph got disconnected, so we could still find a perfect matching. And many subgraphs got disconnected. But now we have added a few roads to make a more connected graph, which gives us the advantage of finding a longer route, but increases the chance that we never find a perfect matching, which is necessary to find an Euler Loop. Additionally, at least one of the short roads added, to connect the Aldo to the rest of the Gila, tended to be one of those hard-to-match nodes.

How do you find the location of a node with its node number (aka node id). I wrote code to colorize, but in general you can edit this URL and substitute the node you are searching for, and it will display on a map.

https://www.openstreetmap.org/node/2834068719

(Do not try to put the node id in the search box for OpenStreepMap– it just doesn’t work for now.)

I really not want to eliminate this node, or any of the edges connecting it, since it bridges the Aldo to the Gila. Looking at my input map, I was actually missing a few good connections around “Me Own Admin Area” (which I am familiar with because the Grand Enchantment Trail passes through).

Adding a few extra short roads across the gap between the Aldo and Gila solved the problem, and missing trails re-appeared. I do not know if the previous graph had a data integrity problem, or a 3node was really isolated from potential matches, but anyway the problem disappeared, so on to the next adventure.

After all these fixes, I am still getting a high percentage of non-Euler graphs after matching. After many weeks of debugging, I needed a temporary fix to proceed.

A FINAL HACK…

Eventually I realized that, with a recently upgraded computer, I could get around the problem of a high proportion of non-Euler graph attempted solutions: Each iteration only takes seconds to compute, so if I need to run 10,000 iterations to get 500 Euler graph solution candidates, that still did not take too long, and I could easily find loop trails more than 700 miles long, plenty long for a good hike. We may be stepping back from the original goal of finding the longest continuous non-repeating loop in the Gila, to finding one long enough. We may be missing better solutions with our buggy code, but those improvements can be made at a later time.

Candidate graph, massaged to include towns, 800 miles

Inelegant and buggy source code may be downloaded here.

Trails and Graph Theory: The Series

Category: Geek Hiking

Trails and Graph Theory 18: Water Sources

Related Posts:

Trails and Graph Theory 17: Dog Food

Related Posts:

Trails and Graph Theory 16: Edge Cases

Parallel Edges

Adjacent 4nodes

4node Neighbors with Parallel Edges

Other Problem Topologies

Self Loops

Difficult Nodes !!

A FINAL HACK…

Related Posts: