This is certainly one of the dull bits to do, because, you know, everything else so far has been so exciting.
I've reduced the 270 stations of the real network into 150 stations (nodes) on my simplified network, and have 180 links connecting them together.
At the same time I'm also going to capture the frequency of trains at each station and I'll let that vary by time of day. This might be too much detail at this stage, but it won't be too much incremental work if I do it now. It will be useful for all the places where I change lines, or enter the station from outside.
A tpyical outout from the TFL site is shown in the screenshot to the left, and an example .pdf file in the attachments at the bottom of the page (file '029 Dis.pdf').
I'll make a few assumptions in the way that I'll later use the data, such as that train frequencies and travel times are the same in either direction between two nodes, and that trains arrive equally spaced out over the course of a time interval. Real life experiences might seem to suggest that last assumption to be laughably optimistic.
This data collection
Take for example one of the links in my list, between the nodes of North Acton and West Acton. In my file the link is entered as being in that specific direction: starting at North Acton and heading to West Acton. This is an instance where someone blindly entering data will make a mistake. The TFL output shows twice as many trains departing from North Acton as it does West Acton because from the first station they can travel in two directions. This is the kind of thing I can catch if I'm the one entering the data, in this case by looking up West Acton departures towards North Acton and deploying my assumption that train frequencies in either direction should be the same.
Another thing I'm taking care about is where there are two lines running parallel. So for frequencies for trains on the section between Uxbridge and Rayners Lane I'm combining the frequency of both the Piccadilly and Metropolitan services.