It's been almost a year working in MPP, Madan Puraskar Pustakalaya, as a software developer. The fact is that i do what ever job that i have been assigned. I have not only developed software but also made different contribution in the field of Nepali language computing, through the localization project PAN localization. MPP being mainly a pustakalaya, a library, is very much in front in the field of Nepali language computing.
At the start of my career in MPP i had been given a job to enable the spell checker in OOO, then OOO used Myspell. Myspell which could not support unicode, could not be helpful to our language. Then came Hunspell which was a Myspell based spell checker, for Hungarian langauge. It had many features in it that was suitable for nepali language, including having the support for detecting compound words, and two level morphology.
Hunspell basically works by combining the head word and the affixes as in usual grammar. There are two files present for it to achieve this the .dic file and the .aff file . The .dic file consists of head word and the .aff consists of the affix rules. Hunspell then combines the resources of these two files and forms a list of different words of the same head word, like happy, happiness, etc. This spell checker opened the door for nepali spell checking, by enabling spell checking in OOO, by educating us about the way of how Unicode could be enabled and how the spell checking works.
In Hunspell and other general spell checker the problem that we come across, is that the way the affix rules are made and the disability of the spell checker to correct the words automatically. Generally, users won't be knowing the way the affixes are attached to the head word, they just want to know if the spell checkers work properly or not, and the other thing is that the automated spell corrector may not seem very vital at present but in due course when we advance into the computational linguistics and natural language processing, it will play a huge part in it. To make it clear let us take a system that can scan a book and convert that scanned picture into text by an OCR, and in turn a text to speech program that would read out for the user that converted text, so that people unable to read the book for what ever reasons would be able to hear it without even reading it. In such a senario when the OCR tries to figure out what the character scanned is, it might get confused with similar letters or digits like 5 with S and 8 with B etc. These type of errors would be corrected through the help of the automated spell corrector. Then human interaction between these process could be avoided, which would be a milestone in the field of computation.