Performance

RFC Test

A performance test done with 500 RFC documents (RFC 1500-2000), totaling about 64MB of plain text. My computer is Intel® Core™2 Duo CPU T5450, 1.66GHz, 3325.09 bogomips. Hatta version 1.3.2dev (changeset 463:74d02975ca4d).

• Indexing the pages:
real	0m27.771s
user	0m25.386s
sys	0m1.848s
• Searching for "green member":
user	0m1.3s
• Memory use: around 13MB

Old tests

I have downloaded 4900 text documents with the recent RFCs, totalling 240MB of plain text data, and used them as pages for Hatta (used changeset d95504d99157 of development version).

• Initial generating of words and links index took a lot of time – 54 minutes of CPU time with 3324.95 bogomips CPU. There is lots of space for improvement there, as the code hasn't been optimized for this. (This is with the support for Japanese enabled. Disabling it earns a couple of minutes, but it's not worth it.)
• Viewing the pages is not affected – works as fast as with a single-page wiki.
• Searching and backlinks are fast, there is no noticeable delay.
• The cache directory is taking about 40MB of disk space – almost 1/5 of the space taken by pages.
• Binary files don't affect those figures.
• Saving changes is a little slow – takes about 1 second.

Now, over an hour of CPU time (translates to several hours of work) is not a reasonable figure – but 4900 pages is not a reasonable figure either, at least not all at once. Normally you'd have them added one by one and indexed as they go. The good news is that the operation of all the rest of the wiki doesn't suffer.

The performance of indexer has improved greatly since version 1.3.2 – it can still take several minutes for a huge database like above (as opposed to almost an hour previously), and is done incrementally only for the modified pages.