Tuesday, September 05, 2006

FOSS

Being an open source programmer it is impossible to skip this part of writing about open source. Since the beginning of 1990's, when Linus Trovalt, developed the Linux kernel, GNU/Linux has been the fastest growing OS used ever. This OS is open source, which means that the operating system can be used , redistributed , studied and modified ,which gives the definition for free. Free here means in the term of "Free as in speech" and not "Free as in beer" . You must have noticed there has not been any thing suggesting about the cost of open source. That is because it really doesn't matter.
Open Source is not just the ability to get freedom of softwares but it is mainly a concept. A concept that will uplift the living standards of people. This concept was the vision of the GNU project creator, Richard Stallman. By giving the source code to people, they will be able to study it, and change it according to their needs. By this there will be a community with out boundaries that help in the development of that source code , making it better and more useful for a larger community.
Open Source has been the next big thing. Now even CNN, in it's programmes like 'code breaker' and 'Global Office' ,does topics on open source. This shows how important open source has become, and how helpful it could be to help society. Like in Africa where proprietary softwares like Microsoft windows would be meaningless, as they cost a lot, to start a project to educate the people.
Another great example that is worth looking into is the project FireFox. This as all of us might know is a web based browser. This is also an open source project. This browser was of the same company that has another proprietary web browser called netscape. After giving FireFox to the community, it has developed in such a speed that it has left it's proprietary counter part far behind. We should get in to the move too with the phrase :- Open Yourself To Open Source.

Monday, September 04, 2006

Hunspell

It's been almost a year working in MPP, Madan Puraskar Pustakalaya, as a software developer. The fact is that i do what ever job that i have been assigned. I have not only developed software but also made different contribution in the field of Nepali language computing, through the localization project PAN localization. MPP being mainly a pustakalaya, a library, is very much in front in the field of Nepali language computing.

At the start of my career in MPP i had been given a job to enable the spell checker in OOO, then OOO used Myspell. Myspell which could not support unicode, could not be helpful to our language. Then came Hunspell which was a Myspell based spell checker, for Hungarian langauge. It had many features in it that was suitable for nepali language, including having the support for detecting compound words, and two level morphology.

Hunspell basically works by combining the head word and the affixes as in usual grammar. There are two files present for it to achieve this the .dic file and the .aff file . The .dic file consists of head word and the .aff consists of the affix rules. Hunspell then combines the resources of these two files and forms a list of different words of the same head word, like happy, happiness, etc. This spell checker opened the door for nepali spell checking, by enabling spell checking in OOO, by educating us about the way of how Unicode could be enabled and how the spell checking works.

In Hunspell and other general spell checker the problem that we come across, is that the way the affix rules are made and the disability of the spell checker to correct the words automatically. Generally, users won't be knowing the way the affixes are attached to the head word, they just want to know if the spell checkers work properly or not, and the other thing is that the automated spell corrector may not seem very vital at present but in due course when we advance into the computational linguistics and natural language processing, it will play a huge part in it. To make it clear let us take a system that can scan a book and convert that scanned picture into text by an OCR, and in turn a text to speech program that would read out for the user that converted text, so that people unable to read the book for what ever reasons would be able to hear it without even reading it. In such a senario when the OCR tries to figure out what the character scanned is, it might get confused with similar letters or digits like 5 with S and 8 with B etc. These type of errors would be corrected through the help of the automated spell corrector. Then human interaction between these process could be avoided, which would be a milestone in the field of computation.