pali_sort
is now written in C++ and is about 400-500x
faster than the previous R version.NEWS.md
file to track changes to the
package.Numbering of Tipitaka volumes is a bit of a mess. This is due to CST4 and PTS using somewhat different systems. Here is where things stand:
The last of these is an annoyance and should be fixed in a future release. It requires some delicate editing of the underlying Pali raw files so as not to introduce new errors and I have not undertaken this yet.
It would be hugely valuable to provide a stemming function for Pali. Right now, every sequence of letters surrounded by spaces or punctuation is treated as a distinct word. That is not precisely correct for Pali, where where words can be joined together for phonetic reasons (a process called sandhi). There are also numerous declensions of most words (like dhamma, dhammo, etc.). On top of these two factors, there are also numerous compound words.
All fo this makes a good stemming algorithm both more valuable an dmore difficult. I have not attempted one yet, but I hope to get to it eventually.