Trails and Graph Theory 10: Databook

Long distance hikers, especially old-school hikers before hiking apps, have been accustomed to rely upon “databooks“, a brief text summary of the important intersections, water sources, features, and distances of a long hike. Here is a portion of a page from the Appalachian Trail 1987 databook, a slim pocket-size saddle-stapled booklet.

And this is an image from a portion of the databook for the Northern New Mexico Loop (by Blisterfree, the creator of the Grand Enchantment Trail).

Not all backpackers use a databook during their hike, even pre-app, but it is still useful for planning prior to the trip. Some hikers do find an advantage in reading a text summary on trail, rather than squinting at a topo map representation on their smartphone screen. Hike your own hike; your mileage may vary; use tools and do not let them use you.

In our previous post, we were able to find a long loop route in the Gila National Forest (that does not repeat any segment), in the form of a numeric list of nodes. Our graph imported via OSMnx from OpenStreetMap includes extra information as node attributes and edge attributes, that we can extract in a Pythony way to make our own databook of sorts. Here is an example of the text information in one edge of our graph.

{'osmid': 243511168, 'ref': '165', 'name': 'Pine Flat Trail #165', 'highway': 'path', 'oneway': False, 'reversed': [False, True], 'length': 4374.8279999999995, 'from': [2509057410, 2509594627, 2509057291, 2509594252, 2509057424, 2509594385, 2509594643, 2509594263, 2509594657, 2509594529, 2509057314, 2509594279, 2509594664, 2509594284, 2509594543, 2509594673, 2509594294, 2509057334, 2509594680, 2509594556, 2509594176, 2509594306, 2509594691, 2509594435, 2509057350, 2509594568,
<truncated>

From that information we can generate a summary from our Euler graph.

  MILE   ELEVATION  TRAIL                         INTERSECTION                      
...
  105.0  6509       Hells Hole Trail #268         West Fork Trail #151          
  105.5  6529       Trail #785                    Hells Hole Trail #268         
  106.0  6483       West Fork Trail #151          Trail #785                    
  109.7  6834       Turkey Creek Trail #155       West Fork Trail #151          
  111.8  7851       Mogollon Creek Trail #153     Turkey Creek Trail #155     
...

The Python code is fairly short (excluding the altitude part, which is not included in our OSM data). The column format is enabled by the library columnar, exceedingly easy to use.

def report_path(Q, txt='',freedom_units=False,lat_long=False):
    meters_to_mile = 0.0006213712
    meters_to_feet = 3.28084
    meters_to_km = 0.001
    
    if nx.is_eulerian(Q):
        total_length = 0
        data = []
        for u,v,k in nx.eulerian_circuit(Q, keys=True):
             data_line = []
             if freedom_units:
                 data_line.append(str(round(total_length*meters_to_mile,1)))
             else:
                 data_line.append(str(round(total_length*meters_to_km,1)))
                 
             d = Q.get_edge_data(u,v)[k]
             length=0                 
             if 'length' in d:
                 length = d['length'] #meters
                 total_length += length
             node_attr = Q.nodes(data=True)[u]
             long = node_attr['x']
             lat = node_attr['y']
             if lat_long:
                 data_line.append(lat)
                 data_line_append(long)
             elevation = get_elevation(lat,long) #in meters
             elevation *= meters_to_feet #freedom units
             if freedom_units:
                 data_line.append(str(round(elevation*meters_to_feet)))
             else:
                 data_line.append(str(round(elevation)))

             trail_name=''
             if 'name' in d:
                 trail_name = d['name']
                 if not isinstance(trail_name, str):
                     trail_name = ','.join(trail_name) #in case is a list
             data_line.append(trail_name)                     
             
             crossing = set()
             crossing_txt = ''
             for u,neightbor,k,d in Q.edges(u,data=True, keys=True):
                 if 'name' in d:
                     name = d['name']
                     if not isinstance(name, str): #FIX with a foreach
                         name = ','.join(name) #in case is a list
                     if name != trail_name:
                         crossing.add(name)
             if len(crossing)&gt;0:
                 crossing_txt = ','.join(crossing)
             data_line.append(crossing_txt)
                              
             data.append(data_line)                 
        header = []
        if freedom_units:
            header.append('MILE')
        else:
            header.append('KM')
        if lat_long:
            header.append('LAT')
            header.append('LONG')
        header.append('ELEVATION')
        header.append('TRAIL')
        header.append('INTERSECTION')
        table = columnar(data, header, no_borders=True)
        print(table)     
    else:
        print('The graph is not eulerian')

One alternative, that only occurred to me after I had written this, is to output the route as CSV (comma separated values), import into a spreadsheet, and make any changes to presentation and format within the spreadsheet, and eventually export as PDF.

We still do not have water sources, towns, fence or road crossings, etc. I will go off and have a long think on how to include those features automagically.

Download source code here.

Related Posts:

Trails and Graph Theory 9: Gradient Fuzz

In our last episode we added random “noise” to the length values of each segment of our trail graph, and for each iteration of random values we did our normal routine to make an Euler graph, and therefore find a long hiking loop without repeating any trail sections:

  • networkx.algorithms.min_weight_matching() on the fuzzed length odd-nodes graph
  • Delete trail segments between matched nodes to make an all-even node graph
  • Restore 4-nodes that we saved previously
  • Take the largest subgraph, since deleting bits of trails may create disconnected islands
  • If the graph is Eulerian, see if the total length is larger than anything we have tried before (using the actual length values, not the fuzzed lengths).

With our fuzzing routine, we increased our best try at an Euler graph from an initial solution of 154 miles to a respectable 228 miles after 200 or so iterations.

Now, let us alter our approach, doing a technique I am calling “gradient fuzz”. We fuzz our original graph like before, but if the resulting solution is the longest, we save the fuzzed graph as our next input, and fuzz the fuzzed graph instead of fuzzing our original graph.

In other words, if we find a better solution, try to improve that solution instead of continuing to try to improve the original input.

The code needs only minor changes.

def fuzz_min_weight_matching(Q):
    T = nx.Graph()
    for u, v, d in Q.edges(data=True):
        len = d['length']
        len = len * (1 + random.random())
        T.add_edge(u,v,length=len)
    return T, nx.min_weight_matching(T,weight='length')

max_fuzzed_Graph = None
while j&lt;1000:
    if max_fuzzed_Graph == None:
        fuzzed_Graph, matching = 
            fuzz_min_weight_matching(K)
    else:
        fuzzed_Graph, matching = 
           fuzz_min_weight_matching(max_fuzzed_Graph)
    j+=1
    T = K.copy()
    T.remove_edges_from(matching)
    restore_4node_graph(T,fix_4_list)
    nodes = max(nx.connected_components(T), key=len)
    T2 = nx.subgraph(T, nodes).copy()
    if nx.is_eulerian(T2):
        all_matchings.append(matching)
        print('found eulerian on fuzz attempt ', j)
        index = len(all_matchings)-1
        print('length of successful matching: ', len(matching))
        print('matching ', index,  ' is eulerian')
        report(T2,'matching ' + str(index))
        length = graph_length(T2)
        save_png(T2,max_T2,bounds,index)                                               
        if length > max_length:
            max_length = length
            max_T2 = T2.copy()
            max_fuzzed_Graph = fuzzed_Graph.copy()

Results after 1000 iterations:

...

current max length:  270
found eulerian on fuzz attempt  9999
length of successful matching:  68
matching  834  is eulerian
65  nodes  matching 834
66  edges  matching 834
total length 189.19 miles  matching 834
number of odd intersections matching 834 : 0
number of even intersections matching 834 : 65
number of 2-way intersections matching 834 : 64
number of trailhead/terminus matching 834 : 0

current max length:  270
found eulerian on fuzz attempt  1000
length of successful matching:  68
matching  835  is eulerian
65  nodes  matching 835
66  edges  matching 835
total length 161.67 miles  matching 835
number of odd intersections matching 835 : 0
number of even intersections matching 835 : 65
number of 2-way intersections matching 835 : 64
number of trailhead/terminus matching 835 : 0

current max length:  270
1:33:15
finished finding some possible matchings
max length =  270.4589642970273  miles
1:33:16
finished script max_trail3e.py

The color schemes chosen for the output graphs have been tweaked from the last article to be a bit easier to follow. (I am still having trouble finding a good combinations of colors and styles, as the current graph and maximum graph often overlap for much of their length.)

We have improved the maximum known circuit length from 228 to 270 miles, with modest effort.

Download source code here.

Related Posts:

Trails and Graph Theory 8: Fuzz

In our last post, we found a single Euler Graph that is a subset of the Gila National Forest trail system, but we were left with the question of how to find other, possibly better, solutions. We might try several alternatives, so the first thing is to save the input graph that we massaged by removing 1-nodes and collapsing 2-nodes from the full Gila trail graph, so we do not have to repeat that step each time we try something new. The Python library pickle is just what we need, as it will save any arbitrary Python object to disk so we only need to compute this initial graph once.

import pickle

pickle.dump(H,open('h_graph.pkl', 'wb'))

Recall that in a previous post we had found a function all_maximal_matchings(graph G) that should help us find candidates for longer Euler graphs than we found in our previous post. Unfortunately, for our size graph, the function is unbearably slow, so we have added a timing function to better gauge our progress.

import time, datetime

start_time = time.time()

def elapsed_time(start):
    end_time = time.time()
    duration = round(end_time - start)
    timestamp = str(datetime.timedelta(seconds = duration))
    print(timestamp)

We add an elapsed_time to each iteration in all_maximal_matchings(…) to see what is happening.

...
matching  805
not eulerian matching, length  60
1 day, 5:14:26
matching  806
not eulerian matching, length  60
1 day, 5:14:26
matching  807
not eulerian matching, length  60
1 day, 5:14:27
matching  808
not eulerian matching, length  60
1 day, 5:14:27
matching  809
not eulerian matching, length  60
1 day, 5:14:27
...

Oof, over a day of computation, and no helpful results yet.

For n nodes, all_maximal_matchings() seems to be higher than order O(n³), too large for a graph with 128 odd nodes, and in practice the “worst” maximal matchings are discovered first. We need an alternative approach.

Recall that we used a library function networkx.algorithms.min_weight_matching() to find a single solution last time, and that function is rather fast, using the Blossom algorithm. As a first attempt, let us introduce some “noise”, or “fuzz “in the weights of the graph, and see if that results in alternate solutions that might be longer.

import random

def fuzz_min_weight_matching(Q):
    T = nx.Graph()
    for u, v, d in Q.edges(data=True):
        len = d['length']
        len = len * (1 + random.random())
        T.add_edge(u,v,length=len)
    return nx.min_weight_matching(T,weight='length')

We run this function many times, and see if we can find longer Euler loops. The results are displayed in an animated GIF file, using techniques adapted from the article Graph Optimization with NetworkX in Python. Each iteration we display a graph, and compare it to the current maximum length graph, using the function save_png(). Plotting onto map tiles is unnecessary, so let us simplify matters by drawing onto a white background.

def save_png(Q, max_Q=None, bounds=None, index = 0, iteration_label='', max_label='' ):
    m = folium.Map()
    QD = Q.to_directed()
    QD.graph['crs'] = ox.settings.default_crs
    
    nodes, edges = ox.graph_to_gdfs(QD)
    ax = edges.plot(color='#ff00ff', label = iteration_label)

    if max_Q != None:
        QD = max_Q.to_directed()
        QD.graph['crs'] = ox.settings.default_crs
        nodes, edges = ox.graph_to_gdfs(QD)
        edges.plot(ax=ax, color='y', linewidth=3, alpha=0.2 , label = max_label)
    if bounds != None:
        sw = bounds[0]
        ne = bounds[1]
        ax.set_xlim(sw[1], ne[1])
        ax.set_ylim(sw[0], ne[0])
    ax.legend(loc="upper left")
    ax.plot()
    plt.axis('off')
    plt.savefig('fig/img{}.png'.format(index), dpi=120)
    plt.close()

(The bounds code is necessary to ensure that each PNG file is the same size and displays exactly the same map coordinates, otherwise the graphs jump around in the movie.)

Now, combine all those .PNG files into an animated GIF.

import glob
import numpy as np
import imageio.v3 as iio
import os

def make_circuit_video(image_path, movie_filename, fps=5):
    # sorting filenames in order
    filenames = glob.glob(image_path + 'img*.png')
    filenames_sort_indices = np.argsort([int(os.path.basename(filename).split('.')[0][3:]) for filename in filenames])
    filenames = [filenames[i] for i in filenames_sort_indices]

    # make movie
    frames = []
    for filename in filenames:
        frames.append(iio.imread(filename))
    iio.imwrite(movie_filename, frames, duration = 1000/fps, loop=1)

The resulting animated GIF is optimized and displayed below:

Download the complete source code for this article here.

What is next to try? Gradient fuzz? A genetic search? Fixing some fundamental error I missed? Stay tuned (subscribed) until next time…

Related Posts: