# docs tree support

## Docs Tree Feature

I just pushed an update to http://hg.hatta-wiki.org/hatta-randy that adds support for page hierarchies in directory trees. Without this update, all pages must be in one directory and they all reside in a flat name space. With this update, pages can be created and linked anywhere below the docs tree, i.e. docs/foo-dir/bar-page can be linked by [[foo-dir/bar-page|Bar-BQ page]].

The docs tree may contain symbolic links (the name space seen by Hatta can be glued together from other sub-trees). For example, pages from an "end user manual" tree and "programmer guide" tree can be glued together in a third "docs" tree (with "site overview" pages) and servered by Hatta.

Wiki links that contain a ".." component are rejected for security reasons. For example, someone editing a page can't snoop around the file system by creating a link like [[../../../etc/passwd]].

I'm a newbie to both Python and Mercurial, so my contributions should be reviewed very carefully before acceptance in Hatta. The docs tree feature is very important to my project, so I sincerely hope it will be accepted.

If you're interested, a preliminary version of my site can be temporarily browsed at http://pseis.org:8090/ (on DreamHost) and (sporadically) at http://72.192.119.192:8090 (workstation). To say its still "under construction" is a gross understatement. It has a couple hundred pages of content (geeky programmer design documents).

These two site reveal some mystery/bug that needs investigation. One will display the "Menu" on each page, but not the other. When displayed, a link to the "Locked" page is shown, even though it isn't in the "Menu" page…

You mentioned that symbolic links within the page directory can have unpredictable results. I suspect you're correct, at least for some scenarios, but I haven't associated any particular problem with them myself over the last couple months of using them with Hatta for my own stuff.

I just never anticipated using symbolic links with Hatta, and can't even start thinking about all the possible use cases. The fact that Hatta might modify something outside of its pages directory scares me. I decided to make Hatta refuse using symbolic links, so that we can remove that warning (which is probably only confusing for most users). It may make Hatta less useful as a general purpose web file browser, but hopefully it will make it a more robust and safe wiki. How are you using symlinks? – Radomir Dopieralski

Would you consider a compromise regarding symbolic links?

Rather than "explicitly refuse dealing with symbolic links", could the pages be forced to read-only?

Also, what are the chances of incorporating docs tree support in the mainline? – Randy

I did think about making symbolic links readable, but there is this scenario that makes me wake up screaming in the night:

• clone Hatta's repository
• do some changes and push them back
• among the changes include a couple of symbolic links, in particular, a link to /etc/passwd and /etc/shadow, or maybe just ~/.ssh/id_rsa
• read those files using the wiki

Those are not the only files that are sensitive on your system, and at least the ssh key has to be readable by the user that runs the wiki. Sure, this security hole only opens with specific, non-default configuration (hg repo published and publicly writable), but it's still a dangerous attack.

That sends an icy chill down my back too. Thank you for being (justifiably) paranoid .

Does the absence of comment on docs tree bode ill for my (desperately wanted) functionality? – Randy

I definitively also like the idea of the docs tree support : why would hatta force a flat structure where all the underlying tools support hierarchical structure. (The only quirk I found so far is about having a page called the same as a directory)

But I guess you would rise your chance of it being integrated if you would put the symlink trouble beside. Polish your docs tree support, and remove the parts symlink related. They are two different features after all …

As a side note I agree with Radomir about those symlink. As I understood, you need them to point outside of the Mercurial repository. That's basically (sorry to express it like that) a no-go. If you need files outside of the repository to be served, just run another Hatta on that other repository ! That's another location after all ! I guess with a bit of web server tweacking, you can make it feel like it's the same hatta serving all that.

Keep up the good work on that !

Ben

My Panic scream has faded into a distant memory. – Randy

Ahhhhhhhhh :0 (panic scream)

I'm using symbolic links rather extensively, have had little, if any trouble with them, and really like them for organizing a big project.

I mentioned a couple quirks in the note you moved to Randy Page naming conventions, but its just the logical consequences of using symbolic links that may not be obvious to the casual user.

I have several use cases and foresee the possibility for many more.

Use Case 1: my project has several logical dimensions that beg to be decomposed into their own sub trees.

• Target Audience (End User, Programmer, Sys Admin, …)
• License Category (Open Source, Proprietary, 3rd Pary Foo, …)
• Source (Wiki Pages, program code, cached support documents, …)

For example, the source code and documentation for the proprietary portion or 3rd Party stuff absolutely can't be in the same repository as the open source !!! (Huge legal liability). However, the Proprietary and 3rd Party documentation needs deep links into the Open Source stuff. BTW, the open source that I'm talking about is either:

• An original work of myself, colleagues and associates (which allows me to dual license it).
• Contributed by someone that legally grants me sufficient rights to distribute it with proprietary works.

The only way to fund this project overall is dual licensing.

I'll rant more, if it needs clarifying… – Randy

I got around to looking at your changes, but I see no commits in the hatta-randy repos, except for some changes from the hatta-dev. The problem that I have with directory structures is that I don't use or need them myself. I do see you guys want it, but I can't really get in your shoes and see how much needed it really is. If it's needed, I will definitely add it, but this may take a while. I'm very grateful for any work you do on hatta, including this, because it saves me some time I would normally have to devote to researching this, finding all the corner cases and tricky bits, coding it and finally testing. Testing is actually what takes the most time, especially when I don't really have many automatic tests yet. Every bit of documentation, explanation, specification, example code and tests helps me and reduces the amount of work needed to include this feature in Hatta. In the mean time, I may need to devote more energy to other projects, so please forgive me if I'm less responsive. – Radommir Dopieralski

The problem may be my lack of Mercurial experience or perhaps something was lost in the year-end holidays. I'll try to reconstruct the time-line and that may shed some light.

21 Dec: Updated Issues with a new heading. It's mixed with a flurry of 18 other page updates the same day, which may have contributed to miscommunication. Quoting…

== Pushed Updates ==
List and summarize updates that have been submitted (pushed)
to Hatta's main repository by Ben, Randy and others.

* [[docs tree support]] for page hierarchies in directory tree and added an entry


21 Dec: Created docs tree support page. Quoting…

== Docs Tree Feature

I just pushed an update to http://hg.hatta-wiki.org/hatta-randy
that adds support for page hierarchies in directory trees.
... //plus a bunch more dribble//.


Before pushing, I think I cloned hatta-dev, hatta-ben and hatta-randy. Then merged changes from dev (and ben?) into randy. Then changed randy to include my docs tree hack. Then "pushed" hatta-randy, hoping that Radomir would pick it up. The push seemed to complete without error (if I remember correctly).

Question: Should I do something different?

I thought only the repository owner could commit changes, but you mentioned something about not finding my commit.

FYI, you should be able to download my hatta.py from via wget 72.192.119.192:8090/+download/hatta-pseis-2010-01-04.py or the link at the bottom of http://72.192.119.192:8090/wiki/engine

This site also may provides insight into my usage of and desire for docs trees. It also illustrates my attempt at documenting Hatta, which might be of interest… my humble contribution.

– Randy "the confused newbie"

You did it allright, I guess Radomir was not expecting that you play with my tests … (Your patch is there by the way …)

Ben

I merged changes from hatta-dev and hatta-ben into my hatta-randy push, because I thought it would make it easier for Radomir. Perhaps I made it worse, instead of better

– Randy

[ moved from Radomir Dopieralski ]

Have you had a chance to look over the docs tree patch?

I understand if its just a matter of time and priority… Is there anything I can do to make it easier? I've tried to figure out how to add a unit test for it, but I'm stumped.

Is there a better way to pass along the patch?

• remerge it into hatta-randy -r 146cf75290b5 (20 Dec, before my initial merge-ben-dev-mine and push attempt) and push hatta-randy?
• merge it into hatta-dev tip (-r 22b21a40da4e dated 10 Jan) and push hatta-dev?
• merge hatta-dev tip into hatta-randy, merge my docs tree patch and push hatta-randy?

– Thanks, Randy

I actually started to prepare for including it, except that I want to move the tests for valid page name (containing ".." and such) much closer to the filesystem, to the WikiStorage class – checking them at the level of URL parsing somehow doesn't sound right, especially when you can also provide page names in configuration and take them from index database (which may contain page names from before the patch). I'm also thinking about %-encoding the offending dots, so that you can still use them in page names, they just become harmless.

I also have a number of options for integrating it:

• make it a configuration option (default to old way)
• make it the default and require converting on upgrade
• make it the default and fall back to old behavior when the page is not found (new pages created as subdirectories)
• make the old behavior default, but fall back to new one when the page is not found (new pages created with %-escaped slashes, but if a subdirectory already exists, it is used)

The problem with autodetection/falling back is that it's an extra system call (and probably disk access) for each page lookup – for example each time a link is checked if the page exists. This can be several hundred lookups for a large page with lots of broken links.

The problem with configuration option and upgrade requirement is that it's an extra hassle, and I would have to provide (and test) a script for that. It can also lead to strange errors when your repository is not in the same format as the configuration/version of Hatta. – Radomir Dopieralski

You make excellent points.

While tinkering with docs tree ideas, you may want to review the strategy for symbolic link detect and reject.

The chilling security scenario you mentioned before may be exposed again. That being a Wiki page update to Mercurial that contains a symbolic link that isn't noticed by the web site admin… something like ln -s / docs/hole along with a page that contains [[+download/hole/etc/passwd]].

Filesystem privileges would prevent +edit/hole/etc/passwd, but just exposing valid user names is bad (as you mentioned before).

While merging my patch with your latest hatta-dev last week, I notice the os.path.islink usage to detect and reject symbolic links. Python documentation doesn't clearly state what happens when a symbolic link occurs within the path given to islink. I made a simple test and it appears that only the last component in the name is tested.

While your current test catches the problem with flat name spaces, it will miss the tree hole mentioned above (I think).

You can probably think of a better solution, but here is one possible technique. I think the os.path.abspath returns the equivalent absolute path, with symbolic links and '..' expanded (normalized, canonical thingy). If the page's abspath begins with the same abspath as 'docs', then it's in the docs tree. If not, then the page could be rejected.

This abspath technique might also support symbolic links safely. For example, a list of allowed base paths could be accepted, instead of just the one for docs/. The list could be a configuration parameter, much like docs currently is.

This abspath technique would also catch "../" abuses, i.e. those that actually led outside the docs tree. If "../" traversal remained inside, they could be accepted.

I considered implementing this originally to catch "../" for docs trees, but rejected it for two reasons:

• It's more complicated to implement and "sell" (less KISS).
• It requires more overhead for system calls

Now that I understand the potential for symbolic link abuse, the abspath technique is more appealing again. But, like I said, you probably have a better twist on this.

– Randy

I just realized that I forgot to push my changes from last week, where I actually implemented a _check_path method that does almost exactly what you described. And I just wiped my laptop's disk clean before sending it to repair. Argh. Anyways, this is exactly the direction I wanted to take. I will try to re-create that over the weekend, and also add some simple tests that could be later extended to cover the changes that are going to happen. – Radomir Dopieralski

I documented a little better the tests/test_repo.py file, where you can see examples of unit tests for the WikiStorage obejct. Let me know if you have any questions.

I also added a test.py file in the main directory of Hatta, you can run it to run all the tests, without having to install the py.test library.

I will be adding some tests for the current functionality, you could try adding some (failing now) tests for the docs tree support. You can get the directory in which the test repository is created from repo.path if you need to check the filenames or create symbolic links in it. I hope that helps. – Radomir Dopieralski

I just stumbled upon another problem with using directories: you can't have both page and subpage in the wiki. For example, you can't have "Bugs" and "Bugs/Bug135". Moreover, if you have one of those pages and try to create the other one, Hatta will crash. – Radomir Dopieralski

It wouldn't have to crash. It could detect the conflict and issue an appropriate message. My clunky implementation would do that, but only upon Save, which wasn't particularly nice if the initial text was extensive. Even then, the text wasn't lost entirely. It could be cut-n-pasted into a new Edit sequence that used a better name. – Randy

I worked a little on the integration. I wrote some tests for the current storage testing for some corner cases we found there (and fixed one bug this way, yay), and prepared a place for the tests for the new storage. I also added a -D option for using the new storage, and a WikiSubdriectoryStorage class. For now it's identical to the default storage. The plan now is to write the tests and then fix them by overriding some methods. – Radomir Dopieralski

Thanks… I'm not sure how best to review it, based upon my limited understanding. The following is a dump of my thought process as I try, which might somehow be useful. The dump may reveal the wrong turns that a newbie makes.

Reviewing now (25 Jan)...
to discover changes I'm comparing hatta-dev (side-by-side diff of source code)
-r 53688022b592 (Sun Jan 24 16:11:12 2010 +0100)
-r 22b21a40da4e (Sun Jan 10 20:36:38 2010 +0100)
Okay, but my understanding is very superficial.

Now, executing the new hatta.py version with "-D" within an empty directory… starts okay… create Home okay… save and render Home, okay… create link [[subdir/foo]] and save Home again, okay…

No problems so far…

Your note above suggests that the new subdir functionality isn't actually ready yet, but I'll poke at it a little to see what subdir does anyhow…

Follow [[subdir/foo]] link, okay… new page title is subdir%2Ffoo (the "/" is encoded). Save page, okay…

List directory contents and I see a regular file named 'subdir%252Ffoo' which seems wrong. I guess it encoded the "%" as "%25" and kept the "2F", which seems odd.

Now shutting down the server and returning to the newest hatta-dev repo directory.

Executing ./test.py, okay (I guess)... 32 passed

cd to "tests" and executing py.test, okay (I guess)... 31 passed.

If you'd like me to poke at it more, let me know. If I hack at the code myself at this point, it would probably reduce your productivity (by being a distraction) toward a good implementation.

– Randy, the happy (newbie) camper

This is strange, I cannot reproduce that. Can you remind me if you are using cherrypy? Can you try running 'dev.py' instead of 'hatta.py' and see if it also happens? – Radomir Dopieralski

Here is a typescript log for "case 1", using hatta.py

Script started on Mon 25 Jan 2010 09:40:46 AM CST
0;randy@slzlr6: ~bash$export PS1="bash$ "
bash$mkdir /tmp/hatta_test1 bash$ cd /tmp/hatta_test1
bash$/home/randy/HATTA/hatta-dev-D/hatta.py /usr/local/lib/python2.6/dist-packages/CherryPy-3.1.2-py2.6.egg/cherrypy/wsgiserver/__init__.py:1499: DeprecationWarning: The ability to pass multiple apps is deprecated and will be removed in 3.2. You should explicitly include a WSGIPathInfoDispatcher instead. DeprecationWarning) ### NEXT I BROWSED TO localhost:8080, SAVED 2 PAGES AND CTL_C THE HATTA SERVER ^C bash$ ls -al *
cache:
total 16
drwxrwxr-x 2 randy randy 4096 2010-01-25 09:43 ./
drwxrwxr-x 4 randy randy 4096 2010-01-25 09:41 ../
-rw-r--r-- 1 randy randy 6144 2010-01-25 09:43 index.sqlite3

docs:
total 20
drwxrwxr-x 3 randy randy 4096 2010-01-25 09:43 ./
drwxrwxr-x 4 randy randy 4096 2010-01-25 09:41 ../
drwxrwxr-x 3 randy randy 4096 2010-01-25 09:43 .hg/
-rw-rw-r-- 1 randy randy   44 2010-01-25 09:42 Home
-rw-rw-r-- 1 randy randy   45 2010-01-25 09:43 subdir1%252Ffoo1
bash$cat docs/Home = Hello World 1 [[subdir1/foo1]] Bye-byebash$ cat docs/subdir1%252Ffoo1
= Subdir1 Foo1 page

Hello again 1
bye-byebash$exit exit Script done on Mon 25 Jan 2010 09:44:19 AM CST  Here is a typescript log for "case 2", using dev.py Script started on Mon 25 Jan 2010 09:33:22 AM CST 0;randy@slzlr6: ~bash$ export PS1="bash$" bash$ mkdir /tmp/hatta_test2
bash$cd /tmp/hatta_test2 bash$ /home/randy/HATTA/hatta-dev-D/dev.py
### NEXT I BROWSED TO localhost:8080, SAVED 2 PAGES AND CTL_C THE HATTA SERVER
^C
bash$ls -al * cache: total 16 drwxrwxr-x 2 randy randy 4096 2010-01-25 09:36 ./ drwxrwxr-x 4 randy randy 4096 2010-01-25 09:34 ../ -rw-r--r-- 1 randy randy 6144 2010-01-25 09:36 index.sqlite3 docs: total 20 drwxrwxr-x 3 randy randy 4096 2010-01-25 09:36 ./ drwxrwxr-x 4 randy randy 4096 2010-01-25 09:34 ../ drwxrwxr-x 3 randy randy 4096 2010-01-25 09:36 .hg/ -rw-rw-r-- 1 randy randy 33 2010-01-25 09:35 Home -rw-rw-r-- 1 randy randy 34 2010-01-25 09:36 subdir2%2Ffoo2 bash$ cat Home
cat: Home: No such file or directory
bash$cat docs/Home = Hello World [[subdir2/foo2]]bash$
bash$cat docs/subdir2%2Ffoo2 = Subdir2 Foo2 page Hello againbash$
bash$bash$ exit
exit

Script done on Mon 25 Jan 2010 09:38:19 AM CST


Let me know what else I might do

– Randy

Try adding the aforementioned -D option when starting hatta.py/dev.py – Radomir Dopieralski

Oooops…

Here is a typescript log for "case 1a", using hatta.py

Script started on Mon 25 Jan 2010 04:07:11 PM CST
0;randy@slzlr6: /tmp/hatta_testbash$mkdir /tmp/hatta_test1a 0;randy@slzlr6: /tmp/hatta_testbash$ cd /tmp/hatta_test1a
0;randy@slzlr6: /tmp/hatta_test1abash$/home/randy/HATTA/hatta-dev-D/hatta.py -D /usr/local/lib/python2.6/dist-packages/CherryPy-3.1.2-py2.6.egg/cherrypy/wsgiserver/__init__.py:1499: DeprecationWarning: The ability to pass multiple apps is deprecated and will be removed in 3.2. You should explicitly include a WSGIPathInfoDispatcher instead. DeprecationWarning) ^C0;randy@slzlr6: /tmp/hatta_test1abash$
0;randy@slzlr6: /tmp/hatta_test1abash$echo BKROWSED TO localhost:80K80, SAVED 2 PAGES, THKTHEN KILLED HATTA... BROWSED TO localhost:8080, SAVED 2 PAGES, THEN KILLED HATTA... 0;randy@slzlr6: /tmp/hatta_test1abash$ ls -al *
cache:
total 16
drwxrwxr-x 2 randy randy 4096 2010-01-25 16:10 ./
drwxrwxr-x 4 randy randy 4096 2010-01-25 16:07 ../
-rw-r--r-- 1 randy randy 6144 2010-01-25 16:10 index.sqlite3

docs:
total 20
drwxrwxr-x 3 randy randy 4096 2010-01-25 16:10 ./
drwxrwxr-x 4 randy randy 4096 2010-01-25 16:07 ../
drwxrwxr-x 3 randy randy 4096 2010-01-25 16:10 .hg/
-rw-rw-r-- 1 randy randy   45 2010-01-25 16:09 Home
-rw-rw-r-- 1 randy randy   33 2010-01-25 16:10 subdir%252Ffoo1a
0;randy@slzlr6: /tmp/hatta_test1abash$cat docs/Home = Home 1a [[subdir/foo1a]] less caffiene?0;randy@slzlr6: /tmp/hatta_test1abash$
0;randy@slzlr6: /tmp/hatta_test1abash$cat docs/subdir%252Ffoo1a = Subdir Foo1a Less caffiene !0;randy@slzlr6: /tmp/hatta_test1abash$ eK
0;randy@slzlr6: /tmp/hatta_test1abash$exit exit Script done on Mon 25 Jan 2010 04:12:52 PM CST  Here is a typescript log for "case 2a", using dev.py Script started on Mon 25 Jan 2010 04:13:49 PM CST 0;randy@slzlr6: /tmp/hatta_testbash$ mkdir /tmp/hatta_test2a
0;randy@slzlr6: /tmp/hatta_testbash$cd /tmp/hatta_test2a 0;randy@slzlr6: /tmp/hatta_test2abash$ /home/randy/HATTA/hatta-dev-D/dev.py -D
^C0;randy@slzlr6: /tmp/hatta_test2abash$0;randy@slzlr6: /tmp/hatta_test2abash$ echo BROWSED TO localhost:8080, SAVED 2 PAGES, THEN KILLED HATTA...
BROWSED TO localhost:8080, SAVED 2 PAGES, THEN KILLED HATTA...
0;randy@slzlr6: /tmp/hatta_test2abash$ls -al * cache: total 16 drwxrwxr-x 2 randy randy 4096 2010-01-25 16:16 ./ drwxrwxr-x 4 randy randy 4096 2010-01-25 16:14 ../ -rw-r--r-- 1 randy randy 6144 2010-01-25 16:16 index.sqlite3 docs: total 20 drwxrwxr-x 4 randy randy 4096 2010-01-25 16:16 ./ drwxrwxr-x 4 randy randy 4096 2010-01-25 16:14 ../ drwxrwxr-x 3 randy randy 4096 2010-01-25 16:16 .hg/ -rw-rw-r-- 1 randy randy 45 2010-01-25 16:15 Home drwxrwxr-x 2 randy randy 4096 2010-01-25 16:16 subdir/ 0;randy@slzlr6: /tmp/hatta_test2abash$ cat docs/Home
= Home 2a

[[subdir/foo2a]]

less beer?
0;randy@slzlr6: /tmp/hatta_test2abash$ls -al docs/subdir/ total 12 drwxrwxr-x 2 randy randy 4096 2010-01-25 16:16 ./ drwxrwxr-x 4 randy randy 4096 2010-01-25 16:16 ../ -rw-rw-r-- 1 randy randy 29 2010-01-25 16:16 foo2a 0;randy@slzlr6: /tmp/hatta_test2abash$ cat docs/subdir/foo2a
= Subdir Foo2a

Less beer !0;randy@slzlr6: /tmp/hatta_test2abash$0;randy@slzlr6: /tmp/hatta_test2abash$ exit
exit

Script done on Mon 25 Jan 2010 04:18:21 PM CST


Conclusion: I need more caffeine and less beer

dev.py appears to support subdirectories Whoooopie ! – Randy