Sinhala Unicode: A Real Problem or Just Fabrications

May 7, 2008 by Harshadewa

This is an explanation and/or constructive criticism on some claims made on the already standardized Sinhala Unicode system.

Claims

1) The letters like “ඩු” (DU) has not been stored in UNICODE. Instead it creates ඩු (DU) in this way = ඩ + පාපිල්ල = ඩු (DA + Papilla = DU) which is wrong!

2) All letters that can be created by adding පිලි (Pili) and යංශය (Yanshaya) should also be stored individually in UNICODE.

3) There’s an “IT security threat” in Sinhala Unicode! “Same strings gives two different characters in different browsers”.

Explanations

1) Yes. “ඩු” (DU) and other similar letters do exist in UNICODE. As we learnt in හෝඩිය පන්තිය (Nursery School), “ඩු” (DU) is created by adding a පාපිල්ල (Papilla) to letter ඩ (DA).

Therefore the Sinhala UNICODE system works just like that way understanding ඩ + පාපිල්ල = ඩු (DA + Papilla = DU).

A person who has learnt Sinhala language well enough knows this theory and there’s nothing wrong in representing the same thing in UNICODE as well.

2) This could be called somewhat “ridicules”! Does a calculator have a key for number 10? No! Why?

Because, you don’t ask a key for 10 as you can create 10 by pressing 1 and 0.

Therefore when there are ways to create “ඩු” (DU) as ඩ + පාපිල්ල = ඩු (DA + Papilla = DU), you don’t ask for separate “ඩු” (DU) character to be stored. It is Unnecessary.

3) Yes! There are some known problems in current versions of Firefox (FF 2.x and below) web browser in displaying very few Sinhala letters (Ex: Shri). But this has already been fixed in next version of Firefox which will become the mainstream browser of the Internet very soon. (Download Firefox 3)

What is not understandable is how this can be a Security Threat! I found a table of computer security threats. After googling a bit found a match there. It is called Human Error!

As per claims, the word SRI is written / displayed in different ways. I think this should be deliberately enabled anyway.

For instance, say a person really wants to write SRI in a different way. How can he write it if the language doesn’t allow him?

The freedom of using characters / words in the way they want should be given. It’s the user’s responsibility to do it in the correct way. This is comparable in this way that “Everyone is free to write anything in a blog! But one has to be honest enough to choose whether to highlight things wrongly for his or her own goodwill or not.

Moreover, there are claims that Sinhala text cannot be copied from Internet Explorer web browser and pasted into MS Word correctly (Copy from IE to MS Word). This is false. See how I have copied some Sinhala text from IE to MS Word.

However there are some speculations saying that there are ulterior motives and hidden agendas behind these claims. I’m not discussing those here or I’m not targeting anyone. But when there are things that mislead people, someone should correct it.

Listed below are some more valuable posts from Anuradha containing technical details with good arguments regarding the same issue.

Is Sinhala Unicode incomplete

Mr Donald please correct alphabet first

Unicode and Sinhala alphabet

Posted in සිංහල, Criticism, Software, Sri Lanka | Tagged Sinhala, Unicode | 121 Comments

121 Responses

on May 7, 2008 at 5:32 pm | Reply Azrael

Great post. Good explanations for the arguments 🙂
on May 7, 2008 at 7:57 pm | Reply Kulendra

(Im sure this is gonna get at least one more comment Check, http://www.kulendra.net/index.php?option=com_content&task=view&id=97&Itemid=9, unfortunately in a DB port, I changed all data to ASCII and the UTF-8 data got corrupted. Scroll down to the comments sections and see a preview of the comments you ‘might’ get 😉 )

I totally agree with you on point 3. There’s no god damn threat. And can you also explain on the it a bit; isnt it rather a problem of the browser being fully compliant with the sinhala unicode than anything else? Thats how I always saw it.

On 1and 2, the argument is that since each sound is a letter in sinhalese they should be given a place in the codemap. I guess the real ‘issue’ in this is that when you are coding for sinhalese. If I can remember the standard requires the font modifier (i.e. pili) to be placed after the font when storing unicode data. So when storing a letter like කො is stored, it is stored as ක + kombuwa + alapilla. Whoever the guy displaying this, needs to know that kombuwas are placed in front of the letter and so on. Again I might be wrong on this. As far as data storage, retrieval and presentation are concerned, there arent any issues of representing sinhalese in Unicode (or gaian, ‘at least as far as I know’), every sinhala word can be written, stored and presented in unicode.
on May 7, 2008 at 8:30 pm | Reply harshadewa

Kulendra,
“Im sure this is gonna get at least one more comment”
He he.. I’m looking forward to reply back. But not over the phone.. lol!

I read the comments of your post. Interesting!

“I guess the real ‘issue’ in this is that when you are coding for sinhalese.”
Can you elaborate on this more?

As far as an API or a Standard is published (for Sinhala Unicode), anyone can develop (code/systems) according to that. Because standards are not hidden. It’s open. But everything should be well documented and the developer must read it well.

By any chance if you are talking about developing a Sinhala programming language, it can be suggested that only to use limited number of characters stored in Unicode rather than going for one’s with modifiers. Am I right?
on May 7, 2008 at 8:52 pm | Reply Kulendra

He he well, hopefully, you’ll get the comments only in here. Anyway about the ‘issue’ thing;

I was actually referring to the ‘ease’ for a programmer to read and interpret sinhala Unicode. Say in a OCR program, English can be converted and stored as you scan (and recognise) them. But scanning and storing Sinhalese will have a bit more work as the standard requires the font modifiers to be placed after the original font. (Thus the ‘ko’ example above). There is no issue as to ‘how’ this should be done, it is pretty much clearly stated. However the programmer would have to do bit more than if he was scanning a Latin alphabet.

Of course this is from a person who has barely any coding background, so this is probably not a problem for you guys 🙂
on May 7, 2008 at 9:40 pm | Reply Anuradha Ratnaweera

Firefox 2 on GNU/Linux renders Sinhala Unicode text very well. All you need to do is to enable Pango shaper in it.

By the way (perhaps you may have seen it already) I also blogged about the topic. 🙂
on May 7, 2008 at 9:45 pm | Reply Anuradha Ratnaweera

I am really sorry, I didn’t notice that Harshadeva had already linked to my blog posts at the end of the post.
on May 7, 2008 at 10:54 pm | Reply Donald Gaminitillake

Unfortunately you guys have not seen the unicode consortium’s registrations.
Do you know all the latin characters are registered in the unicode. ü what ever they way you key in you can see the umlaut u on any Operating system. Likewise the sinhala “DU” or “KU” or any sinhala character need to be given an individual code point in SLSI 1134.(ISO or Unicode)

Regarding the security problem in characters are not human errors. Same string cannot give two different characters in two different browsers. Accept this error. all this is because the registrations are inadequte to represent sinhala language.

visit
http://www.unicode.org/review/pr-96.html

quote
The use of format characters in identifiers is problematical because the formatting effects they represent are normally just stylistic or otherwise out of scope for identifiers. To make matters worse, it’s possible to misapply format characters such that users can create strings that look the same but actually contain different characters, which can create security problems
unqoute

Unicode itself confirm the security problem pls do not mislead the people by saying human error sir

Quote from unicode
The goal for such a restriction of format characters to particular contexts is to

1. allow the use of these characters where required in normal text
2. exclude as many cases as possible where no visible distinction results
3. be simple enough to be easily implemented with standard mechanisms such as regular expressions

unquote

We cannot allow to change the representation of words the way you like. Sinhala is Sinhala. Sinhala can be represented correctly if you use my system encoding.
each and every character need to be given a absolute code point.

if any reader need more clarification please visit me in my office
290 DR Wijewardena mawatha
Ingrin Institute of printing and graphics

Donald Gaminitillake
I set the standard
on May 7, 2008 at 10:57 pm | Reply Donald Gaminitillake

Quote from unicode

http://www.unicode.org/reports/tr2.html

“There is a standard extant for Sinhala described in A Standard Code for
Information Interchange in Sinhalese by V.K. Samaranayake and S.T. Nandasara
(ISO-IEC JTC1/SCL/WG2 N 673, Oct. 1990). The coding proposed in it was found
to be an inadequate basis for a modern, computer-based interchange code,
though it is adequate to handle the capabilities of a Sinhala typewriter for
representing contemporary colloquial Sinhala. ”
Unquote

your system is just a typewriter. as I have clearly proved that the Sinhala unicode registration was done by a person who was not gone to a school in sri lanka nor have ever been to Sri Lanka. Worship the white skin anuraddha.
Listen to me and correct the language issue.
I have copyrights because you guys made a wrong format. Sinhala is not correctly registered or represented in the SLSI 1134 nor in Unicode Consortium.

Also see

same text seen on different browsers read different

Donald Gaminitillake
I set the standard
on May 7, 2008 at 11:03 pm | Reply Donald Gaminitillake

Hey Kulendra

quote
to do bit more than if he was scanning a Latin alphabet.
Unquote

For Latin script all characters are given individual code points almost over 600
We do the same thing into sinhala – you get the same results as latin script sir

But only I have listed this in ISBN955-98975-0-0

So get it from me and develop the OCR

Donald
I set the standard
on May 8, 2008 at 7:14 am | Reply Donald Gaminitillake

I am sorry I am not permitted to make any comments on the
සිංහල යුනිකෝඩ් සමූහය – Sinhala Unicode Group
by the moderator

I read your comments and comments made by others. I have proved that sinhala is not compatible across all platforms like the latin script. If you say yes come over and show it to me or come for a discussion

I have given my contact numbers for you to contact me and discuss the issues using sinhala in a computer.

Why are you so scared to meet me (by appointment) and openly talk.

Best

Donald Gaminitillake
I set the standard
on May 8, 2008 at 8:46 am | Reply Anuradha Ratnaweera

Mr Donald, can you please answer these questions in point form?

1. Unicode pr-96.html is about ZWJ which is used in Sinhala Unicode for joint letters. But you are using it to justify your argument about “du” which does *not* use ZWJ, how so?

2. Your system has only 1600+ glyphs. Where are the joint letters? And how about joint-joint letters (e.g.: “indriya” when written like this ඉන්‍ද්‍රිය). Please see the following screenshot taken years ago:

How do you address those in your system?

3. You have only quoted a part from the above PR. For example, you have conveniently ignored these sentences among others:

“For these reasons format characters are normally excluded from Unicode identifiers. However, visible distinctions created by certain format characters (particularly the joiner controls) are necessary and carry meaning in certain languages. A blanket exclusion of format characters makes it impossible to create identifiers based on certain words or phrases in those languages.”

The goal of the above PR is to identify the use of ZWJ etc in Unicode and to show how to *fix* any security problem arising, and not only to *expose* a security problem like you demonstrate.

The question no 3 is, how and why did you quote only a part of the above PR?

4. You say that installing anything is not needed for Latin scripts and it should be the same for Sinhala! So how does your “system” work without installing a font or a keyboard driver?
on May 8, 2008 at 11:00 am | Reply ඩිංගිරි මැණිකා

මේ ඩොනල්ඩ් ඩක්ව ආපහු ඉස්කෝලෙ යවන්න වෙයි වගේ හෝඩිය ඉගෙන ගන්න…
on May 8, 2008 at 1:49 pm | Reply Donald Gaminitillake

First I have listed all sinhala characters that I have identified in my character allocation table published.(copyrights)

Even if I have missed any character It can be added easily.

First you got to admit the errors in the present system. Once it has been accepted. we got to correct it.

The process will contain font set with proper open character coding for each sinhala character and a basic input method ( sinhala IME) options are open for the public to improve. Font is an art work and anyone else can develop and sell or give free with my encodings

Copyrights on industrial and on commercial usage of my allocation table is reserved by self. This is a right given by the Govt of Sri Lanka and by you guys. When I offered it freely at the SLSI you all got and rejected. There is a problem with your system the solution is my system. Solution to a problem is a copyright by LAW in SRI LANKA. I enjoy that right.

Donald Gaminitillake
I set the standard
on May 8, 2008 at 2:42 pm | Reply Anuradha Ratnaweera

Mr Donald, you are going round the mulberry bush.

Firstly, you did not answer all my 4 questions. You selected what you want to answer, just like you select what to quote.

So you are no event sure if you have listed all the characters in your patent-pending, ISBN registered allocation table!!!

I was not talking about mare one or two characters you may have missed. I was talking about JOINT LETTERS. Your system with 1660 characters doesn’t have any!

Come on, fix your system before criticizing others.

I hope you’ll answer ALL my questions.
on May 8, 2008 at 2:43 pm | Reply Anuradha Ratnaweera

Correction: “So you are no event sure” should read “So you are not even sure”.
on May 8, 2008 at 3:02 pm | Reply Donald Gaminitillake

Anuradha I have registered the joint characters in my ISBN.What I have selected from pali text books and what has been registered in the govt education publications for privena education.

So far no one have made any attempt to see the table and comment.
It is a far larger than the 1660 my first count.

Donald Gaminitillake
I set the standard
on May 8, 2008 at 3:31 pm | Reply harshadewa

“All questions that are listed in point form, should be answered in point form.” Not strictly as a rule of thumb, but only for the goodwill of readers who struggle reading long paragraphs. This also creates ways to do hide and seek game too.
on May 8, 2008 at 3:44 pm | Reply Anuradha Ratnaweera

Let us, for the moment, assume that question 2 is answered. Thanks Mr Donald!

How about about question 1, 3 and 4?
on May 8, 2008 at 4:03 pm | Reply Donald Gaminitillake

Unicode pr-96.html
is given to show that sinhala poses a security problem.
Whether “DU” or “KU” or “ksha” are not registered in the slsi 1134
These are hidden inside the Sinhala language kit
when the lang kit working it will show but when it is not working it shows as the unicode registered parts (glyphs) or characters.(junk)

If we have absolute character encodings like the latin script sinhala will not have all these problems.

Admit your errors Sinhala cannot use in excel, data sorting, OCR , across all platforms all this is simply that we have not published a character allocation table. ONLY I have published it.

SLSI 1134 is a part or a fraction list a typewriter list

Donald Gaminitillake
I set the standard
on May 8, 2008 at 5:15 pm | Reply harshadewa

Mr. Donald Gaminitillake,

May I know (cuz I dont know),
1) How can’t we use Sinhala in excel?
2) How can’t we use Sinhala in data sorting?
3) How can’t we use Sinhala in OCR?

With sound examples ?
on May 8, 2008 at 7:21 pm | Reply Donald Gaminitillake

ඩු use this and paste it to excel and see the result sir

Donald Gaminitillake
I set the standard
on May 8, 2008 at 7:29 pm | Reply harshadewa

Please see the below screenshot Sir.
on May 8, 2008 at 10:26 pm | Reply Donald Gaminitillake

This is how I see in excell

http://www.rotarycolombocentral.org/web-data/Components/Private/index.html

same string reads differently. additional software is not responding on my excel.
If unicode sinhala is correct we should see du in the cell as one unit not in two parts.

This is the problem. why not email me your du and I will email my du let see how these are seen on each other computers.

Donald Gaminitillake
I set the standard
on May 8, 2008 at 10:44 pm | Reply Anuradha Ratnaweera

Mr Donald, you didn’t answer my question no 2.

I asked, how does pr-96.html in Unicode site relate to your argument about du?

Be specific. Just saying “your whole standard is incorrect” is bullshit.
on May 9, 2008 at 1:29 am | Reply රජිත් විදානආරච්චි

Mr. Donald,

My guess is that you haven’t set up Unicode in your computer deliberately, so that it won’t work. Or else is it that you don’t know how to set it up? – It is hard to believe that a person who doesn’t understand to get a simple screen shot can understand how to set up even a simple thing as Unicode 😛 but I am sure that one of us will be able to help you to set it up properly.

Please ask us to display any character you want. We will show and prove it to you, that Unicode can display it correctly.

And maybe it is your right to “enjoy” commercial benefits from your so called correct standard. But it is not ethical to “sell” your language for your own benefit. All of the people who are popularizing Unicode because they want to help their language into the modern age. Pebbles like you will not stop them in their journey towards their goal.

Maybe “you” set the standard 😛 But it is us who use it!! you and your standard can rot in garbage bins while, Unicode is used everywhere including Google services, Wikipedia, Sri Lankan Government, and many other state/private organizations!! No one can stop Unicode in the internet, as well as in other places.. Please give up your lost battle, and enjoy Unicode! It will open the gates for a Sinhala Internet / Sinhala Language Information, by amounts you haven’t ever seen before!!

P.S. – Taking a photo of a computer screen is equal to photocopying a computer monitor 😛
on May 9, 2008 at 7:10 am | Reply Donald Gaminitillake

To represent sinhala correctly

You need a font set + additional software(sinhala kit)

Font set is what is in SLSI !!#$ or what is registered in unicode as sinhala
That is what is seen on my site.

Do we set up computers to see german , Bhasha malaysia or swhahili.
We can browse on any computer without any additional software

BUT to see sinhala we need font set and additional software
This additional software does not work well in all operating systems that is why we see raw form of sinhala or the typewriter concept.

Yes I used a digital camera because I use intel imac. On XP mode I need to follow that path.
Is it wrong to photocopy a monitor???

Since you guys are not willing to understand the problems in Sinhala SLSI 1134 there is no choice but to meet the top and give evidence.

You got to understand the difference between unicode consortium and what we registered in this consortium or SLSI 1134.

I never say unicode is incorrect but what has been registered in unicode consortium for sinhala language is incorrect and incomplete set of sinhala characters. This is the typewriter concept.

Donald Gaminitillake
I set the standard
on May 9, 2008 at 7:12 am | Reply Anuradha Ratnaweera

It seems Mr Donald you don’t seem to understand English, or you are trying to play hide and seem while the whole world is watching.

You are yet to answer my questions:

How does pr-96.html in Unicode site relate to your argument about du?
on May 9, 2008 at 7:15 am | Reply Donald Gaminitillake

Anuradha you can see the ‘du’ on my excel

http://www.rotarycolombocentral.org/web-data/Components/Private/index.html

that is the problem refers in pr 96

same string brakes sometimes , that is the problem with “SRI” “PRA” “KRI”
and many more.

Donald
on May 9, 2008 at 9:21 am | Reply Anuradha Ratnaweera

Wrong and false.

“SRI” and “DU” are two different things. You are totally mixed up, or pretend to be so.

PR-96 does NOT refer to “du” or similar stuff. It refers to the possible problems if (repeat *if*) somebody filters out ZWJ.

Others who read this blog, please check the following link (which Mr Donald himself brought up), and verify that it is NOT about da + papilla = du (or consonent + modifier = modified consonant in general), but it is about the filtering of ZWJ and ZWNJ.

http://www.unicode.org/review/pr-96.html
on May 9, 2008 at 9:35 am | Reply දෙඤ්ඤං බැටේ | Dennam Betey

Donald Gaminitillake has an amateur website at http://www.akuru.org/ He was a little known man but declaring war against Sinhala Unicode has made him a hero to a few and a villain to many. He was almost forgotten but suddenly this man again is in the limelight because of a letter he has got from the President’s Office. See http://www.flickr.com/photos/8503406@N05/2461712604/

Now what we see is that Sinhala Unicode supporters have started an all-out attack on Gaminitillake. Why the Unicode supporters panic? If Unicode is a perfect solution or at least a decent solution, they do not have to panic.

ICTA is under the President’s Office and if the Preident is convinced that Unicode is not the right solution there is a good chance of trying out Gaminitillake’s solution. With all what I have seen on the net about Sinhala Unicode issue is that Sinhala Unicode supporters are very protective and they are scared of a fair trial. Why???? There are some ‘hardcore supporters’ to Sinhala Unicode too. They will even die for Unicode. Nothing wrong. You have a right to admire what you love. All the same, Gaminitillake has a right to critisice it.

I see this issue like Microsoft vs Apple issue. Microsoft is widely used, popular, easily accessible to all walks of people even for a mere 100 rupee note from the software pirates. So, people love it. Take Apple. It is scarecely used, less accessible, and expensive. But we all know it is good in quality. So, someday, can’t Gaminitillake’s solution beat Sinhala Unicode?? Let the time decide.

So, my dear Unicode supporters, let Sinhala Unicode have a fair trial. Let the intelligent people llisten to Gaminitillake and then take a decision. You don’t have a right to mount Taleban-like attacks on this man. After all, he is a single man without no political influance (as far as I know) and he tries to prove a point. Let us listen.

I am neither a Gaminitillake sympethizer nor a Sinhala Unicode supporter. We want to see is a fair trial. Even criminals deserve a fair trial. Gaminitillake cannot be a crimanal by opposing Sinhala Unicode.

Some accuse him saying he is trying to make money out of his solution. So, what! He has all rights to make money if he can come up with a superior solution. If you try to deprive him of a fair trial, that indicates you have a inferior solution. I could be wrong. What I don’t understand is why you all are scared of a letter from the President’s Office to this lonely man.
on May 9, 2008 at 9:35 am | Reply Anuradha Ratnaweera

In fact, Unicode PR-96 gives the following examples:

sha + hal kireema = sh
ra + diga ispilla = rii

which is similar to da + papilla = du (i.e., consonant + modifier = modified consonant).

So, PR-96 in fact gives examples *against* Mr Donald’s du claim.

Mr Donald, let me teach you how to argue. Use the PR-96 point to support arguments about joint letters… 😉
on May 9, 2008 at 11:13 am | Reply Donald Gaminitillake

Anuradha thank you for teaching how to argue as you are a teacher at the university of sri lanka it is your duty to teach me and others.

So again you have admitted that when certain parts gets “FILTERED” and “THEY DO GET FILTERED” often you get in to a security problems as they image different sinhala characters (technically what has been registered in unicode or SLSI1134)

Computers cannot run like this with errors and we need to amend the SLSI 1134 as soon as possible.

Donald Gaminitillake
I set the standard
on May 9, 2008 at 12:17 pm | Reply kalinga

Mr. Donald,

why don’t you make your own software to do the things you clam rather then trying to find whats wrong with Unicode ? if Unicode is bad, don’t use it. its your choice. if its bad then come up with a solution that every one can use, not just web pages and documents, a product; we need some thing that works not fictions your taking about.

all and all what I’m see is your just a person who just don’t know much about computers.

what ever the matter, show us a working solutions first not documents and web pages.

“I set the standard” what standard ? a where is the product ? how can we use your product ? tell me ? ok i have linux & windows and tell me how to use your so call “I set the standard” ?

we don’t need people who do “katin bathala hitiwima” we need actions.

come up with a product that works in reality !!!!
on May 9, 2008 at 12:33 pm | Reply harshadewa

Dear දෙඤ්ඤං බැටේ | Dennam Betey,

Firstly, I’d like to say that I do not benefit from any party, organization or team for DISCUSSING about these fabrications on Sinhala Unicode. But I honestly do not know who you are and whether you benefit from anyone.

I’m not fear of anyone’s standards or patents. I don’t simply care. But what I care is a sound debate where actual practical problems are seen as they are. I’m not a Worshiper or a Blind Believer.

Secondly this post was written by me only to raise the awareness of people about the speculations. I suggest you to “first read the post” before reading and posting comments.

See the 2 texts quoted from above post.

“This is an EXPLANATION and/or CONSTRUCTIVE CRITICISM on some claims made on the already standardized Sinhala Unicode system.”

“However there are some speculations saying that there are ulterior motives and hidden agendas behind these claims. I’m not discussing those here or I’m not targeting anyone. But when there are things that mislead people, someone should correct it.”

Anyway it’s up to you to decide or understand whether we tried to start a DECENT DEBATE or ATTACK him like you say about the issues only Mr. Donald raised.
As far as we know, we never tried to attack him. May be people who couldn’t control their emotions commented in some rough way.

But if one can go through the comments made here, I think no one can come to a conclusion like you said. I suspect that it’s you who do blind unfair judgments.

One would even see you as a sypothizer of Mr. Donald depite your proactive denial.

If you honestly practically face problems with Sinhala Unicode, please discuss them here.

Many thanks,
Harshadewa Ariyasinghe.
on May 9, 2008 at 1:08 pm | Reply Anuradha Ratnaweera

Mr Donald, your logic is flawed again.

There are three pictures of Sinhala samples in PR-96. First two are almost the same (only a space is missing in the second).

First two are how we usually see “sri” and obviously it is not flawed.

So Mr Donald can use only the third picture for “bad Sinhala Unicode”.

Now that example has a “ree”. It is also, like “du”, not “registered” in Unicode, but made by combining “ra” and “diga ispilla”.

So, the only example Mr Donald can use in that PR (which is about rakaransaya), also has a counter example for his claims about “du”.

Here is the URL for the PR-96:

http://www.unicode.org/review/pr-96.html
on May 9, 2008 at 1:11 pm | Reply Anuradha Ratnaweera

Dear දෙඤ්ඤං බැටේ,

I don’t want to reply to Mr Donald. I gave up long ago. I wanted to add explanations on the web only for the benefit of newcomers.

He has a full right to make money. He can implement his system, patent it, and if it is good everybody will like it and pay him royalties. But if he try to falsely accuse a working standard in order to achieve that goal, therein lies the problem.
on May 9, 2008 at 1:51 pm | Reply tuxv || Yasith Vidanaarachchi

I don’t want to discriminate anyone.. but Mr. Donald is acting in a rather foolish manner.. as we can see from his screen-shots, I don’t even wanna call them screen-shots.. because even a child can understand that he had used a camera to photograph his monitor… by doing this he sets a great example of using the ‘latest-technology’.

But I can assure that the claims that he’s making are false.. because sinhala unicode works for anyone.. and.. when using the latest/popular operating systems.. you CAN see the sinhala unicode letters, out of the box.. if you want an example you can try using a Ubuntu live CD.. you don’t even have to install it to see the words, in uncode.

And it really seems obvious that he’s deliberately trying to ignore Anuradha’s questions, he only answers the questions that he CAN answer. But when it comes to the questions that he can’t beat.. he merely ignores them.

And.. දෙඤ්ඤං බැටේ if you want to see a fair trial between the sinhala unicode and Mr-Donald’s-Self-Standard-Sinhala.. it’s happening right now.. what do you see as unfair in this debate.. it’s just that there are not so many people on his side.. but there are many sinhala unicode supporters.. as it’s popular, great and works well. If his standard is better.. no one will be able to stop it.. but remember the truth, honesty and good deeds always win.. and as I can see sinhala unicode is gaining an upper-hand in this “standard-war”…. so isn’t it obvious that unicode is better.

So Mr. Donald.. please don’t be a self-righteous ignorant git and stop accusing sinhala unicode about flaws that it doesn’t even have. If your intentions are pure.. and you want to help spread sinhala in the web, you can support the sinhala unicode standard.. and help to make it better.

Thank you
on May 9, 2008 at 3:16 pm | Reply Donald Gaminitillake

You guys are not talking of incomparability of sinhala. We dont have e-dic, e-grammar, e-encyclopedias nothing. Cannot copy and paste .

You guys does not want even admit that characters get altered when zwj etc get filtered. Also not admitting when sinhala kit is not functioning you will see garbage sinhala.

The Sinhala registered in unicode is incomplete and a incorrect solution

Donald Gaminitillake
I set the standard
on May 9, 2008 at 3:20 pm | Reply Donald Gaminitillake

I challenge you all

the code point for ayanna is specified in unicode consortium as

01 CODE POINT VALUE: : : : : 0D85
02 NAME (UNICODE NAME) : : : SINHALA LETTER AYANNA
03 GENERAL CATEGORY: : : : : Letter, Other
04 COMBINING CLASS : : : : : Spacing, split, enclosing, reordrant, and Tibetan subjoined
05 BIDIRECTIONAL CATEGORY: : Left-to-Right
06 DECOMPOSITION MAPPING : : –
07 DECIMAL DIGIT VALUE : : : –
08 DIGIT VALUE : : : : : : : –
09 NUMERIC VALUE : : : : : : –
10 MIRRORED: : : : : : : : : No
11 UNICODE 1.0 NAME: : : : : SINHALA LETTER AYANNA
12 ISO 10646 COMMENT FIELD : –

13 UPPERCASE MAPPING : : : : –
14 LOWERCASE MAPPING : : : : –
15 TITLECASE MAPPING : : : : –
16 DECIMAL VALUE : : : : : : 3461
17 UTF-8 HEX VALUE : : : : : 0xE0B685
18 UTF-16 HEX VALUE: : : : : 0×0D85
19 UTF-32 HEX VALUE: : : : : 0×00000D85
20 XHTML : : : : : : : : : : &#3461
21 BLOCK : : : : : : : : : : Sinhala
22 PLANE : : : : : : : : : : Basic Multilingual Plane (BMP)
23 STROKE NUMBER : : : : : : –
24 RADICAL : : : : : : : : : –

Like wise give me the registered location for
“ksha” (Rajapaksha)
list the values
16 DECIMAL VALUE : : : : : :
17 UTF-8 HEX VALUE : : : : :
18 UTF-16 HEX VALUE: : : : :
19 UTF-32 HEX VALUE: : : : :
20 XHTML : : : : : : : : : :
21 BLOCK : : : : : : : : : :

Also I must be able to see in unicode consortium registration

Donald Gaminitillake
I set the standard
on May 9, 2008 at 5:30 pm | Reply රජිත් විදානආරච්චි

Mr Donald,

First of all ZWJ and ZWNJ is used in other languages to display joint characters, the following link will lead you into one such example.
http://www.unicode.org/standard/where/

If you have studied sinhala language even up to grade 5, you should know that, K and SHA (ක් සහ ෂ) are two different words!
The code points to KSHA is as follows:-

0D9A SINHALA LETTER ALPAPRAANA KAYANNA
= sinhala letter ka
0DCA SINHALA SIGN AL-LAKUNA
= virama
0DC2 SINHALA LETTER MUURDHAJA SAYANNA
= sinhala letter ssa
* retroflex

200D ZERO WIDTH JOINER
* commonly abbreviated ZWJ

so the sequence of code points 0D9A+0DCA+0DC2 will generate KSHA (ක්ෂ) if you want bandi akuru you can use the sequence 0D9A+0DCA+200D+0DC2 this would display (ක්‍ෂ)

I Hope that you would understand it, but my guess is you don’t have the capability to understand that, because if you have understood that K and SHA are two different letters you wouldn’t have asked this question in the first place!

And, do you know that none of the “hodi poth” on sinhala shows all the letters combined with all the signs… they don’t, because even a child is capable to understand that KA is a letter “Al lakuna” is a sign and SHA is another letter !! A much similar process is used in Unicode!!!

දෙඤ්ඤං බැටේ,

ඔබට යුනිකේත කියැවීමේ හැකියාව ඇති බව හැඟුණු බැවීන් සිංහළෙන් ලියමි.
ඔබ පැවසූ ආකාරයට ඩොනල්ඩ් මහතා ඔහුගේ ක්‍රමයක් භාවිතා කර මුදල් ඉපැයූවාට වරදක් නැත… ඇත්ත වශයෙන්ම නීත්‍යානුකූලව වරදක් මමත් නොදකිමි. එහෙත් මා සිතන ආකාරයට නම් තම බස විකුණා සල්ලි හෙවීම තරම් නීච ක්‍රියාවක් තවත් නැත! ඒ සඳහා යුනිකේතයට (සිංහළ බස අන්තර්ජාලය තුළ සවි ගැන්වූ) විරුද්ධව නැති බොරු ඇදබෑමත් සුදුසු නැත.
on May 9, 2008 at 6:32 pm | Reply Donald Gaminitillake

It is a character not registered in unicode.

example the greek character
1F8F GREEK CAPITAL LETTER ALPHA WITH DASIA AND PERISPOMENI AND PROSGEGRAMMENI

01 CODE POINT VALUE: : : : : 1F8F
02 NAME (UNICODE NAME) : : : GREEK CAPITAL LETTER ALPHA WITH DASIA AND PERISPOMENI AND PROSGEGRAMMENI
03 GENERAL CATEGORY: : : : : Letter, Titlecase
04 COMBINING CLASS : : : : : Spacing, split, enclosing, reordrant, and Tibetan subjoined
05 BIDIRECTIONAL CATEGORY: : Left-to-Right
……
14 LOWERCASE MAPPING : : : : U+1F87
15 TITLECASE MAPPING : : : : –
16 DECIMAL VALUE : : : : : : 8079
17 UTF-8 HEX VALUE : : : : : 0xE1BE8F
18 UTF-16 HEX VALUE: : : : : 0x1F8F
19 UTF-32 HEX VALUE: : : : : 0x00001F8F
20 XHTML : : : : : : : : : : &#8079
21 BLOCK : : : : : : : : : : Greek Extended
22 PLANE : : : : : : : : : : Supplementary Multilingual Plane (SMP)
23 STROKE NUMBER : : : : : : –
24 RADICAL : : : : : : : : : –

So these have numbers — UTF values

What you have given the input sequence for “ksha”. No registration in the unicode consortium. No utf value, it is not in the unicode registration.

Every sinhala character needs registration in unicode for it to be called a unicode sinhala.

Donald Gaminitillake
I set the standard
on May 9, 2008 at 6:55 pm | Reply harshadewa

“Every sinhala character needs registration in unicode for it to be called a unicode
sinhala.”

Ok! That’s it! Ultimately this is your RULE! It is neither a LAW nor RULE written in a book or set by a government.
Anyone can tell such things but who cares? We don’t care! and neither people with little upstairs.

Again.. If one says
“Every number in the world needs a KEY in the CALCULATOR!”
I don’t know how to call the person or that theory.

What you say is crystal clear Mr. Donald. We can understand your suggestion. But when we can put up 10 by pressing 1 + 0 why the heck we need all characters to be individually registered?

And we don’t need to accept each and every damn way that others have implemented their languages. We have to invent our own way. It has been invented and implemented successfully.

Your Excel and Sorting examples have been proved wrong here and Please use the latest correct software to enable sinhala 100%.

Problems shown in your screenshots are arising not because of a Sinhala Unicode problem but because of your inability to setup/use/implement software applications.

Talk wisely and choose honestly!

Thank you,
Harshadewa Ariyainghe
on May 9, 2008 at 7:58 pm | Reply රජිත් විදානආරච්චි

Donald,

Sinhaka is registered in Unicode as everyone except you know! and each of the unicode characters are mapped in utf-8.. so K have a utf value al kirima have a utf value and sha has a utf value!!!!

You can get more information about it in the following links!!

http://www.unicode.org/charts/PDF/U0D80.pdf – Unicode sinhala, official document
http://en.wikipedia.org/wiki/Utf_8 – what is meant by UTF-8
http://www.unicode.org/Public/UNIDATA/UnicodeData.txt – all the characters in unicode (You can search the text for “sinhala” and you will find all the information about it).. if you don’t know how to do a simple search.. sinhala block starts form 0D82… you can go manually upto that point and see all the sinhala characters from it onwards…

have fun with it!!!

We are the ones who use the standards!!!! (proper ones if you were wondering :P)
on May 9, 2008 at 8:03 pm | Reply Anuradha Ratnaweera

Devil is in the details, they say. Attention to detail is all about being specific.

Mr Donald, you don’t seem to know how to be specific when talking.

Do you like to look at my comments about the “ree” character in the third picture in Unicode PR-96 and show that you do pay attention to detail?

In 1920s, there was a famous debate between Buddhist Atheists and Buddhist Theists in සිංහල බෞද්ධයා (Sinhala Baudhdhaya) newspaper. It was later published as a book දෙවි දේවතා පෙරහැර (devi devatha perarhera).

With several personalities like Hemapala Munidasa, Polwatte Buddadaththa thero, Balangoda Ananda Maithriya thero at his young age, Yagirala thero etc taking part, the language and witty discussion itself is a treat to read.

David Karunarathna (whom you should know well if you love the language and have read enough literature), young but already well read, and who lead the Theists in the debate, asked five very specific questions from the oppssition.

Katuwellegama Amarasiri Thero, who lead the Theists, didn’t answer the questions. Instead he said some general comments and said “now it’s clear that my opponent’s arguments are false”.

David Karunarathna reminded the five questions and compared Amarasiri Thero’s arguments as “granny’s arguments”, because they were not at all specific.

I don’t want to draw parallels here, but we asked some specific questions, and answers were always about “incompleteness and incorrectness”, but not specific. Grannys and Grandps, all the same I suppose! 😉
on May 9, 2008 at 9:19 pm | Reply Donald Gaminitillake

Dear Harshadewa

The problem here is ou are using and calling what you use is SINHALA UNICODE

If anyone say it is unicode registered character set each and every character needs a registration and the utf value

“Khsa” have no utf value or CODE POINT VALUE or NAME (UNICODE NAME it is not a unicode character

admit this or give me the values for above (if you say it is sinahala unicode)

If you say the sinhala what you are using is ucsc anuruddha’s sinhala or ICTA wasantha Sinhala or VKS sinhala this question is not a problem. Moment you clasify it as unicode registered sinhala you got to give its code points It has to be in the unicode registry.

Donald Gaminitillake
I set the standard
on May 9, 2008 at 9:33 pm | Reply Donald Gaminitillake

Anuradha you are incapable of giving the utf values or code points for “ksha”
This is just one sample

Using joiners is not a problem it is a string of inputs but the utf value and unicode name etc etc must be there in unicode consortium to tell that is a unicode registered character

The Sinhala that you all are using is partly registered in the unicode the rest is under plasters of ICTA and relavent groups that try to control sinhala

Donald Gaminitillake
I set the standard
on May 9, 2008 at 9:37 pm | Reply Donald Gaminitillake

quote
Problems shown in your screenshots are arising not because of a Sinhala Unicode problem but because of your inability to setup/use/implement software applications.
unquote

If you have a font why you need additional software

Donald
on May 9, 2008 at 11:30 pm | Reply Anuradha Ratnaweera

Let me try to practice what I preach; to be specific and pay attention to detail.

Mr Donald asks what are the “utf values or code points of ksha”.

First, utf is not a numbering system, but an encoding system. Numbers are assigned in Unicode.

ksha = 0D9A 0DCA 0DC2 – not joint (ක්ෂ)
ksha = 0D9A 0DCA 200D 0DC2 – joint (ක්‍ෂ)

Hope you will also answer my specific point: about “ree” letter in third picture in PR-96, is it correct or not?
on May 10, 2008 at 9:34 am | Reply harshadewa

“Moment you clasify it as unicode registered sinhala you got to give its code points It has to be in the unicode registry.”

1. It’s in the UNICODE registry and the values are given over and over again.

2. You keep saying it is NOT, because you can’t understand the simple theory of “1 and 0 is 10“.

3. Your argument of “Every letter in Sinhala should be given an individual code in UNICODE” is NOT a hard written RULE nor specified by any GOD.

4. Trying to give each sinhala charactor a unique code in UNICODE looks hilarious for me.

Personally, I think there’s no problem in using the exact same representation of creating a letter in Sinhala for Computers. i.e.;

Quoted from an Anuradha’s comment above.

sha + hal kireema = sh
ra + diga ispilla = rii
which is similar to da + papilla = du (i.e., consonant + modifier = modified consonant).
on May 10, 2008 at 9:35 am | Reply Donald Gaminitillake

Dear Anuradha

Those are the input codes, 200D zwj
The end result comes not from unicode registered character sir.
It is inside your kits and additional plaster software

the problem is you are mixing the typewriter input method with unicode registrations

Ksha need to be registered in the unicode with a proper name and a utf numbers

æ or Æ

1 CODE POINT VALUE: : : : : 00C6
02 NAME (UNICODE NAME) : : : LATIN CAPITAL LETTER AE
03 GENERAL CATEGORY: : : : : Letter, Uppercase
04 COMBINING CLASS : : : : : Spacing, split, enclosing, reordrant, and Tibetan subjoined
05 BIDIRECTIONAL CATEGORY: : Left-to-Right
……
6 DECIMAL VALUE : : : : : : 198
17 UTF-8 HEX VALUE : : : : : 0xC386
18 UTF-16 HEX VALUE: : : : : 0x00C6
19 UTF-32 HEX VALUE: : : : : 0x000000C6
20 XHTML : : : : : : : : : : &#198

Above is a joint character but it has a individual registration in unicode.
How we input is not required because you can have several input methods depending on input OS and drivers. The out put value remains the same because the value has been given in unicode registration. we all see the same character. It wont break into parts.

So the characters registered in the sinhala unicode is not enough to represent the sinhala language. Therefore it is an incomplete solution has to be amended.

Either you got to accept my proposal or propose a another new way.

You have no other way because you all gulped Andy’s method and now trying to justify it.

another good example is the numeral 500 is written roman numeral is a “d”

but in unicode again the “d” is re-registered

01 CODE POINT VALUE: : : : : 217E
02 NAME (UNICODE NAME) : : : SMALL ROMAN NUMERAL FIVE HUNDRED
03 GENERAL CATEGORY: : : : : Number, Letter
—–

13 UPPERCASE MAPPING : : : : U+216E
14 LOWERCASE MAPPING : : : : –
15 TITLECASE MAPPING : : : : 216E
16 DECIMAL VALUE : : : : : : 8574
17 UTF-8 HEX VALUE : : : : : 0xE285BE
18 UTF-16 HEX VALUE: : : : : 0x217E
19 UTF-32 HEX VALUE: : : : : 0x0000217E
20 XHTML : : : : : : : : : : &#8574
21 BLOCK : : : : : : : : : : Number Forms
22 PLANE : : : : : : : : : : Supplementary Ideographic Plane (SIP)

BUT the same character is registered for text as latin

01 CODE POINT VALUE: : : : : 0064
02 NAME (UNICODE NAME) : : : LATIN SMALL LETTER D
03 GENERAL CATEGORY: : : : : Letter, Lowercase
—–
5 TITLECASE MAPPING : : : : 0044
16 DECIMAL VALUE : : : : : : 100
17 UTF-8 HEX VALUE : : : : : 0x64
18 UTF-16 HEX VALUE: : : : : 0x0064
19 UTF-32 HEX VALUE: : : : : 0x00000064
20 XHTML : : : : : : : : : : &#100
21 BLOCK : : : : : : : : : : Basic Latin
22 PLANE : : : : : : : : : : Basic Multilingual Plane (BMP)

We need proper utf values for all sinhala characters in unicode, SIR

Donald Gaminitillake
I set the standard
on May 10, 2008 at 6:19 pm | Reply Anuradha Ratnaweera

Mr Donald,

“Those are the input codes, 200D zwj
The end result comes not from unicode registered character sir.
It is inside your kits and additional plaster software”

Wrong, as usual. ZWJ is registered in Unicode to be shared among languages.

Do you use full stop and comma? Do you want to have a sepearte full stop for Sinhala? Having common characters like full stop and ZWJ is not a problem.

So how many times do you want me to remind my question about “ree” in PR-96?
on May 10, 2008 at 9:13 pm | Reply Donald Gaminitillake

I am not worried about the zwj

This is registerd in unicode as

01 CODE POINT VALUE: : : : : 200D
02 NAME (UNICODE NAME) : : : ZERO WIDTH JOINER
03 GENERAL CATEGORY: : : : : Other, Format
04 COMBINING CLASS : : : : : Spacing, split, enclosing, reordrant, and Tibetan subjoined
05 BIDIRECTIONAL CATEGORY: : Boundary Neutral
06 DECOMPOSITION MAPPING : : –
07 DECIMAL DIGIT VALUE : : : –
08 DIGIT VALUE : : : : : : : –
09 NUMERIC VALUE : : : : : : –
10 MIRRORED: : : : : : : : : No
11 UNICODE 1.0 NAME: : : : : ZERO WIDTH JOINER
12 ISO 10646 COMMENT FIELD : –

13 UPPERCASE MAPPING : : : : –
14 LOWERCASE MAPPING : : : : –
15 TITLECASE MAPPING : : : : –
16 DECIMAL VALUE : : : : : : 8205
17 UTF-8 HEX VALUE : : : : : 0xE2808D
18 UTF-16 HEX VALUE: : : : : 0x200D
19 UTF-32 HEX VALUE: : : : : 0x0000200D
20 XHTML : : : : : : : : : : &#8205
21 BLOCK : : : : : : : : : : General Punctuation
22 PLANE : : : : : : : : : : Supplementary Ideographic Plane (SIP)
23 STROKE NUMBER : : : : : : –
24 RADICAL : : : : : : : : :

After this input code ZWJ — “Ksha” has to be represent from unicode registration.
So give me the UTF value of “KSHA” sir. If you do not have one say so.
Write to the public that “KSHA” is given by the ” Sinhala kit” not from unicode registration.
You are manipulating the few characters registered in unicode through another software “sinhala kit” and other handi plast (palastara) software made by ICTA and others.

“KSHA” does not have a value in unicode but it is hidden. When the palastara software does not respond it goes back to the unicode registered level and we all see garbage sinhala.

So sinhala registered in unicode consortium are incapable to represent our language.
We cannot use the sinhala in any of the commercial applications as Adobe CS3 master collection etc etc

All this is because one need to run other palastara software to represent Sinhala. The unicode registrations are not enough to represent our language.

Therefore we need to amend the SLSI1134 to protect our language sinhala

Donald Gaminitillake
I set the standard
on May 10, 2008 at 10:00 pm | Reply Donald Gaminitillake

Dear Anuradha

I have illustrated what you had written in No 48 (joint ksha)

http://www.rotarycolombocentral.org/web-data/Components/Private/ksha.html

I have given an explanation can you give yours!!!

Donald Gamnitillake
I set the standard
on May 10, 2008 at 11:29 pm | Reply Anuradha Ratnaweera

Mr Donald,

ZWJ is registered in Unicode. Code point 200D.
on May 11, 2008 at 11:30 am | Reply Donald Gaminitillake

See my 52 I have given the details of ZWJ from unicode. In my illustration I too have given the zwj code point.

My question is after you input those sequences from where the computer image the “KSHA”

Is it from the unicode registered plane or it comes from “sinahla kit and it supported plaster software”

If unicode plane give its UTF numbers etc if not say it comes from the sinhala kit and plaster software.

Also say that without these additional plaster software the sinhala registered in unicode and in SLSI 1134 cannot be imaged correctly

Donald Gaminitillake
I st the standard
on May 11, 2008 at 12:58 pm | Reply Donald Gaminitillake

Regarding your question
“”Do you want to have a sepearte full stop for Sinhala?””

Yes we may need one for sinhala due to kernnig algos in sinhala.
Why not have an exclusive set of comma , full stop and other General Punctuation for sinhala?

Remember sinhala do have different ways to write the language
Kavi is written in a one format etc etc
2500 years of sinhala development has to be preserved. I have to give all options to the people.

Having “plaster software ” and Kits wont work. All the sinhala characters need Proper registration in SLSI and have proper UTF values

Donald Gaminitillake
I set the satnadard
on May 12, 2008 at 7:44 am | Reply Anuradha Ratnaweera

Modern fonts don’t need all the glyphs to have individual code points.

Some shapes (glyphs) in may come from individual characters (e.g.: අ)

Some glyphs will come from sequences (e.g.: ක්‍ෂ).

For example, in LKLUG font, the first free Sinhala Unicode font which we developed, has ksha, but it is not given an individual code point. Rather, it is assigned to a sequence which is ka + hal kireema + ZWJ + sha.

Even if we assign a code point in a “private area” in a font (if it is necessary), it is not a problem. Here is why:

In software design principles, each module can have it’s own implementation details, but only the interface matters to the rest. The best example is a subroutine. Once the subroutine is written, the other parts of the program have to think of it as a blank box.

Similarly, as the standard defines the interface with input (ka + hal kireema + zwj + sha) and output (ksha shape), it doesn’t matter how the font does it, or even how an implementation does it.

Read any standard like POSIX, SVID, POSIX, or even ANSI/ISO C. They all define the interfaces, and not implementation details.

Anyway, it seems Mr Donald is following this principle:

“When you can’t win an argument, confuse”.

That’s why he want to write long answers instead of in point form, and also to repeat the same old statements in between to add to the confusion. 😉

So I think it is going to be a waste of time to argue like this. But I have thought of a better mechanism to present matters. Will get back in a few days time.
on May 12, 2008 at 11:21 am | Reply Donald Gaminitillake

Quote
For example, in LKLUG font, the first free Sinhala Unicode font which we developed, has ksha, but it is not given an individual code point
Unqoue

By not giving a code point in the unicode registration It is a character not registered in the unicode. You can have all the sequences of inputs but the out put character need to be registered in unicode and have a UTF value.

Your point proves that all sinhala characters are not registered in unicode consortium and are hidden under a carpet of plaster software.

That is why always you need various type of fonts and additional soft ware to represent sinhala.

We got to exit from the typewriter technology. To save Sinhala language.

Also you are using the typewrite concept.
Andy Daniels has proved this to you.

I qoute again

http://www.unicode.org/reports/tr2.html

“There is a standard extant for Sinhala described in A Standard Code for
Information Interchange in Sinhalese by V.K. Samaranayake and S.T. Nandasara
(ISO-IEC JTC1/SCL/WG2 N 673, Oct. 1990). The coding proposed in it was found
to be an inadequate basis for a modern, computer-based interchange code,
though it is adequate to handle the capabilities of a Sinhala typewriter for
representing contemporary colloquial Sinhala. ”
Unquote

You say this old but when you read the allocations of sinhala only few shifts of locations had taken place. you and VKS group copied this and made the SLSI 1134 hurriedly to show the public that SLSI is same as unicode.

It should have been the other way the SLSI had to be first and then the content of SLSI to be registered in international arena.

I was the only person who made representation and told publicly that SLSI 1134 is incorrect and incomplete.

Now all the problems have come up with Sinhala.All because of you have not registered the SINHALA AKURU in ISO or Unicode or SLSI and given code points

Donald Gaminitillake
I set the standard
on May 12, 2008 at 11:59 am | Reply Donald Gaminitillake

Also Anuradha you further confirm that you are talking of LKLUG font, and that too you have KSHA location only in first three versions.

I talk of a standard common to SINAHALA Language. For people to do development they need to know the proper absolute code points for all sinhala language.

unconditionally the present system have serious flaws to correct it all characters needs UTF values, register it in the SLSI 1134 by amending it as soon as possible.

I am not confusing any one. You are confusing the public by classifying sinhala used in computers as unicode sinhala where all sinhala characters are not represented in it except few typewriter based characters.

Donald Gaminitillake
I set the standard
on May 12, 2008 at 1:25 pm | Reply harshadewa

Mr. Donald,
From all your arguments, a simple statement can be made.
“All letters that can be created by adding modifiers, should also be stored individually (with individual Unicode’s) in UNICODE.”

This is exactly what has been mentioned as 2nd point under Claims in the above main article.

We can clearly understand your law! But the problem here is, we do not see any PRACTICAL issues using the current Sinhala UNICODE as all Sinhala characters can be already represented by using the existing Unicode system.

As you keep saying there are problems, security issues, serious flaws and etc in Sinhala Unicode, you have not given any examples to prove what you say. Instead you either post a previous comment with some modifications or post a long comment with some UTF, Unicode and Hex values.

Therefore, I suggest you to;

1. Pick a question asked from you that hasn’t been answered (Guess you won’t find it difficult to find one) yet.
2. In point form, give us the steps to re-create the problem/ error that you’re getting.
2. Answer it in simple point form, but don’t forget to include your Operating system and version, Browser Type and Version, Other Applications and their versions that you use in your environment.

So, when you have free time can you do this? In this way we can easily re-create the error and agree with you happily.

There are knows problem in some applications with Sinhala Unicode. The fixes and how-to-avoids have been already given.

When there are ways to avoid problems, there’s no need to repeat the history!
on May 12, 2008 at 3:03 pm | Reply Donald Gaminitillake

Qoute
“All letters that can be created by adding modifiers, should also be stored individually (with individual Unicode’s) in UNICODE.”
Unqoute

This is wrong statement sir.

If the created letter is not stored in unicode registration (no UTF Value) it will not reflect correctly without additional software.

This is the issue I am addressing.

Quote from your reply
sha + hal kireema = sh
ra + diga ispilla = rii
which is similar to da + papilla = du (i.e., consonant + modifier = modified consonant).
Unquote

This is the exact input method but the final result has to be in unicode registration. NOT FROM “KIT” OR PLASTER SOFTWARE.

You all avoid these facts — that to represent sinhala correctly you need additional software other than what has been registered in Unicode Consortium

Admit this –yes or NO.
IF yes give me the utf value and plane at unicode for KSHA.
If no say so that it is not registered in unicode but you get it from additional software made by whom ??????

Donald Gaminitillake
I set the standard
on May 12, 2008 at 3:10 pm | Reply Donald Gaminitillake

By the way I use 4 operating systems

Windows Xp professional version 2002 service pack 2
iMac OSX 10.4
iMac on Window XP
e MAc on OSX 10.3

Donald
on May 12, 2008 at 3:16 pm | Reply Donald Gaminitillake

Quote
can be already represented by using the existing Unicode system.
Unqoute

Cannot be represented correctly without additional software.

Donald Gaminitillake
I set the standard
on May 12, 2008 at 4:24 pm | Reply harshadewa

Mr. Donald,

First, you did not answer my last question of giving a practical problem and details of your computer environment where we can reproduce the error.

Instead you ask me a (rather modified old) question and names of 4 operating systems that (you say) you are using.

Answer this (if possible).
Why should we give you the UTF values?

This is not a joke and I expect serious answers.
on May 12, 2008 at 6:26 pm | Reply Donald Gaminitillake

Quote
Why should we give you the UTF values?
Unquote

I quote from unicode consortium itself

http://www.unicode.org

What is Unicode?

Unicode provides a unique number for every character,
no matter what the platform,
no matter what the program,
no matter what the language.

Fundamentally, computers just deal with numbers. They store letters and other characters by assigning a number for each one.

unquote

In sinhala one side of the equation is registered with unicode but the all the answers are not registered with unicode. Some are inside the KIT or Plaster software

When you take other languages
eg
latin script falls into several pages of unicode.
Key in method differs with the OS but the final ans is registered with unicode.
Therefore all characters represent correctly across any platform as per unicode
Because all have UTF values

quote from unicode
http://unicode.org/faq/utf_bom.html#14

Q: What is a UTF?

A: A Unicode transformation format (UTF) is an algorithmic mapping from every Unicode code point (except surrogate code points) to a unique byte sequence. The ISO/IEC 10646 standard uses the term “UCS transformation format” for UTF; the two terms are merely synonyms for the same concept.

. Which of the UTFs do I need to support?

A: UTF-8 is most common on the web. UTF-16 is used by Java and Windows. UTF-32 is used by various Unix systems. The conversions between all of them are algorithmically based, fast and lossless. This makes it easy to support data input or output in multiple formats, while using a particular UTF for internal storage or processing.

Q: Are there any byte sequences that are not generated by a UTF? How should I interpret them?

A: None of the UTFs can generate every arbitrary byte sequence. For example, in UTF-8 every byte of the form 110xxxxx2 must be followed with a byte of the form 10xxxxxx2. A sequence such as is illegal, and must never be generated. When faced with this illegal byte sequence while transforming or interpreting, a UTF-8 conformant process must treat the first byte 110xxxxx2 as an illegal termination error: for example, either signaling an error, filtering the byte out, or representing the byte with a marker such as FFFD (REPLACEMENT CHARACTER). In the latter two cases, it will continue processing at the second byte 0xxxxxxx2.

A conformant process must not interpret illegal or ill-formed byte sequences as characters, however, it may take error recovery actions. No conformant process may use irregular byte sequences to encode out-of-band information.

————-

I hope readers understood

sinhala errors

all these are the characters registered in unicode
But unable to read because the additional software is not working

To represent sinhala we need additional software other than the unicode registration

This is the problem I am addressing

Donald Gaminitillake
I set the standard
on May 12, 2008 at 6:28 pm | Reply Donald Gaminitillake

Quote
Why should we give you the UTF values?
Unquote

I quote from the unicode consortium

What is Unicode?

Unicode provides a unique number for every character,
no matter what the platform,
no matter what the program,
no matter what the language.

Fundamentally, computers just deal with numbers. They store letters and other characters by assigning a number for each one.

Unquote

Donald Gaminitillake
I Set the standard
on May 12, 2008 at 7:06 pm | Reply harshadewa

Wow! You did exactly what I thought! Thanks for the brilliant answer. But I found a better one here.

“Unicode is an industry standard allowing computers to consistently represent and manipulate text expressed in most of the world’s writing systems.” – Wikipedia
[http://en.wikipedia.org/wiki/Unicode]

My question was not
Why should we use UTF values?
or
Why are we using Unicode?

I asked Why should we give YOU the UTF values? for what you didn’t answer.

Since you are not-so-up-to-standards to answer the questions, let me to make it more simple where you can select from MCQ (Multiple Choice Questions). Easy as piece of cake!

Q: Why should we give you the UTF values?

A 1. IN MY VIEW, every character in Sinhala should have a Unique UTF. No matter what others say. I SET THIS STANDARD.

A 2. Existing Sinhala Unicode does not work well and I don’t want to say how.

A 3 . All of the above.

A 4. None of the above and I want my own system to be implemented as soon as possible.
on May 12, 2008 at 9:27 pm | Reply Donald Gaminitillake

Quote

Why should we give YOU the UTF values?

Unquote

Because you say the sinhala that you are using is Sinhala unicode and it is registered in unicode.

If you say the sinhala that you are using is “ICTA SInhala” then you need not give any codepoint. That becomes a monopoky. I do not ask for the code points for Helawadana or Thibus.

Since you are talking of a public domain unicode.

Unconditainally you have to give the code point for ksha in unicode
Else you can say it is in side the kit and not in the unicode.

multiple ans is 1 and 2.

Other option is
my system or you can have your system so that we can use it across all platforms like latin script or korean or Japanese etc etc

interesting article for you to read (in sinhala)

http://www.lankaenews.com/Sinhala/news.php?id=2526

Donald Gaminitillake
I set the standard
on May 12, 2008 at 9:48 pm | Reply Donald Gaminitillake

Dear Anuradha and Harshadewa

Can you just copy few words from the lanka e news sinhala and paste it here for us to see.

Donald Gaminitillake
I set the standard
on May 12, 2008 at 10:18 pm | Reply රජිත් විදානආරච්චි

Donald,

I guess that you didn’t read the unicode character list, I gave you the link earlier, characters are registered in unicode under the code point!! the utf-8 is just an encoding , there are many utf encodings,

so, this time, can you “think” and answer the question “Why should we give you the utf-8 values?” and don’t forget to use your brain…

sinhala characters are (all of them) registered in unicode.. that’s why they have individual code points!! can’t you understand something simple as that one!!!

Quoting from the wikipedia article about utf-8, (A good guess is that you didn’t read this article when i sent you the link.. or that you merely ignored its contents and pretended that you didn’t read it),

UTF-8 (8-bit UCS/Unicode Transformation Format) is a variable-length character encoding for Unicode.

unquote

A user of “good” standards
on May 13, 2008 at 2:55 am | Reply ඔහේ

අතැරලාදාන්න.
on May 13, 2008 at 7:30 am | Reply Donald Gaminitillake

quote
so, this time, can you “think” and answer the question “Why should we give you the utf-8 values?”
unqoute

“Ksha: is not registered in unicode
If you cannot give UTF 8 give 16 or 32
Else give the location in unicode plane.

For Sinhala letter ayanna following are the utf values

01 CODE POINT VALUE: : : : : 0D85
02 NAME (UNICODE NAME) : : : SINHALA LETTER AYANNA
03 GENERAL CATEGORY: : : : : Letter, Other

17 UTF-8 HEX VALUE : : : : : 0xE0B685
18 UTF-16 HEX VALUE: : : : : 0×0D85
19 UTF-32 HEX VALUE: : : : : 0×00000D85
20 XHTML : : : : : : : : : : &#3461
21 BLOCK : : : : : : : : : : Sinhala
22 PLANE : : : : : : : : : : Basic Multilingual Plane (BMP)

Why cant you give these values to “KSHA”
Then we can check it in the unicode registry

Donald Gaminitillake
I set the standard
on May 13, 2008 at 10:40 am | Reply harshadewa

Question:
“Why cant you give these values to “KSHA”
Then we can check it in the unicode registry”

Answer:
(ක් + ෂ) = ක්‍ෂ
(k + sha) = ක්‍ෂ (UNICODE: 0D9A 0DCA 200D 0DC2)

I’m surprised for the [N]th time, that you cannot understand the similarities of the implementation between Sinhala language and Sinhala Unicode; and also we have to continue copying the same answer for the same question asked in various ways.
on May 13, 2008 at 10:55 am | Reply Donald Gaminitillake

Sorry sir those codes are NOT KSHA but

http://www.rotarycolombocentral.org/web-data/Components/Private/ksha.html

You can see it yourself – just the input sequence only —

Donald Gaminitillake
on May 13, 2008 at 11:31 am | Reply harshadewa

1. You say that ක් + ෂ is NOT ක්‍ෂ. This means that either you don’t know Sinhala or you don’t accept Sinhala. Either way, you have to first learn Sinhala and then reply.

2. You agree that there are original Unicode characters like ක් and ෂ. If ක් and ෂ are acceptable why cant ක්‍ෂ which is a mixture of these two characters be unacceptable. It is your LAW that ක්‍ෂ should also be there in Unicode.

Take the good old calculator example and think twice. You don’t store 10 separately when you can use 1 and 0 to get 10.

ක්‍ෂ is not a character came from Mars, but just a mixture of ක් and ෂ.

You say that ක්‍ෂ displays awkwardly sometimes. Well, give an example as I requested many times before. You don’t do it honestly with real examples, because you know that we can prove it works fine.

By the way, data duplication is not a very good thing!
on May 13, 2008 at 11:46 am | Reply Donald Gaminitillake

I am talking of KSHA (kayanna badhi shayanna) joint one

Kayanna -0D9A SINHALA LETTER ALPAPRAANA KAYANNA
Shayanna -0DC2 SINHALA LETTER MUURDHAJA SAYANNA
0DCA is SINHALA SIGN AL-LAKUNA

all these are input sequence that is it.

If you talk of kayanna and shayanna I have no problem Both these two have UTF values and a proper location in unicode.

not the alkayanna or joint “ksha”

because there are no location for alkayanna in unicode IT COMES FROM THE KIT OR THE PLASTER SOFTWARE same as joint “ksha”

Donald Gaminitillake
I set the standard
on May 13, 2008 at 12:19 pm | Reply Donald Gaminitillake

Keep the sinhala font and uninstall the sinhala kit and other plasters software

you will see ක kayanna and SINHALA SIGN AL-LAKUNA separately (this is what is in the unicode consortium)

For it to join and see correctly as ක්‍ෂ you need the Sinhala kit and plaster software.

ක්‍ෂ if you copy this and delete one by one using backspace it will get deleted in four moves sha, space,al lakuna and then the ka. This also prove that it is in parts. not as one ක් alka. and ක්‍ too have no utf value in unicode.

Donald Gaminitillake
I set the standard
on May 13, 2008 at 12:58 pm | Reply Donald Gaminitillake

quote
sinhala characters are (all of them) registered in unicode.. that’s why they have individual code points!! can’t you understand something simple as that one!!!
unqoute

Ranjith you know nothing!!!!
only few are registered in unicode.This is the issue I am addressing

Donald Gaminitillake
I set the standard
on May 13, 2008 at 2:16 pm | Reply harshadewa

Are you trying a 1 : 3 formula (intentionally) in replying to comments? 😐

You’ve said the exact same thing that I’ve explained (k + sha = ක්‍ෂ [UNICODE:
0D9A 0DCA 200D 0DC2]) in a much longer way.

What I believe is that ක්‍ෂ is as same as [ක් + ෂ].

So, I don’t see any problem in the way how Sinhala Unicode is implemented this since Sinhala language itself represents the same thing.

So far, I haven’t come across a single problem that you are trying to show us saying that Sinhala Unicode has problems.

As I have been continuously requesting, we would like to see an example with all the correct details. Then we can re-create the same thing and agree with you happily ever after.

Won’t it be a good idea, without going around the problem always?
on May 13, 2008 at 2:54 pm | Reply Donald Gaminitillake

ක්‍ is not in unicode same as ‘KSHA”
Input sequence is not what I am talking
You have all the input sequence but the problem is the out put of characters
“KSHA” joint is not in uniicode registration, ක්‍ is not in unicode registration
you see these only through the Sinhala kit or plaster software

You always avoid the sinhala kit and the plaster software issue.

Without these additional software sinhala cannot be represent correctly

Therefore the unicode sinhala in incomplete and incorrect solution

This is only a typewriter concept.

If you do not talk about the kit and plaster software you are just looping

Donald Gaminitillake
I set the standard
on May 13, 2008 at 3:46 pm | Reply harshadewa

he he!
What if your so called plaster software is no more in new operating systems?
on May 13, 2008 at 6:10 pm | Reply Donald Gaminitillake

You mean we need not have to down load the sinhala kit and the plaster software to read sinhala?

Harshadewa you are contradicting what you wrote in No 3

quote
By any chance if you are talking about developing a Sinhala programming language, it can be suggested that only to use limited number of characters stored in Unicode rather than going for one’s with modifiers. Am I right?
Unquote

So I am 100% correct unicode do have only limited number of sinhala registrations

Donald Gaminitillake
I set the standard
on May 13, 2008 at 6:36 pm | Reply Donald Gaminitillake

Enjoy seeing MTV news in the year 2003

http://www.flickr.com/photos/8503406@N05/2489614348/

you need a very good adsl connection

Donald Gaminitillake
I set the standard
on May 13, 2008 at 7:00 pm | Reply harshadewa

I challenge you to answer my question first and I bet you won’t! 😉

I asked “What if your so called plaster software is no more in new operating systems?”

Anyway, I suggest you to visit http://groups.google.com/group/Sinhala-Unicode?hl=si and see how to use Sinhala Unicode (In case you hate Sinhala Kit like hell, you can do without it as well). I guess you won’t try to do that as you have Sinhala-Unicode-Fobia. 😀

No Sir, you can neither confuse me nor the readers here by showing quoted text from above. The quoted text was a suggestion on mine in what I can explain why I suggested. It’s shameless that you take such attempts to prove right what is already proved wrong!

The problem here is, you’re trying to cover from all the questions we ask by asking us to admit on a childish fact.

To finish this off, I’d like to say that;

– You have failed to prove that Sinhala Unicode is incomplete.

– You have failed to prove that Sinhala Unicode has practical problems.

– You have failed to answer at least one major question that is asked here.

– You have failed to provide examples with detailed steps on how to re-create the alleged errors.

– You have failed to continue or support a decent debate here. What you did was somewhat similar to Google Bombing and I personally do not like to continue on this further.

Any problem can be solved by discussing. My believe is that when one has no facts to prove, he tries to keep blaming and repeats same thing thinking that it’ll work!

But No! Not this time Mr. Donald. Maybe you don’t have enough luck this time or may be people are more intelligent than you think.

I kindly ask you not to stuff the comment section furthermore. But you are free to give us the examples with steps and relevant information that we have asked repeatedly.

A Big thank for all of the people who contributed their precious time and efforts here!

Harshadewa Ariyasinghe
on May 13, 2008 at 9:40 pm | Reply Donald Gaminitillake

You too got bowled out sir. including anuradha

Let the public decide who had won

“What if your so called plaster software is no more in new operating systems?”

This is not a issue at this moment. Today we cannot reproduce all sinhala characters without the sinhala kit and the plaster software. We have no e dic e ncylopedia no e grammar
all because we do not have all sinhala characters with proper utf values (except for few)

Sri Lanka had spent over 50 million US$ on world bank funds to develop this typewriter concept and naturally the stake holders will have to defend it.

I have coprights over the list of sinhala characters 50 years after my death sir

Donald Gaminitillake
I set the standard
on May 13, 2008 at 10:17 pm | Reply රජිත් විදානආරච්චි

Donald,

As I think, you are someone who is craving for money and/or respect over the sinhala alphabet. But first you have to understand that, those who keep wanting them never gets them.

No one can copyright the sinhala alphabet, it is something that was developed for more than thousand years. If you think that you have the copyright, you are welcomed to take them with you to the grave.. it will remain with you , and we won’t miss it, … sir.
on May 13, 2008 at 10:26 pm | Reply harshadewa

Since you keep posting without answering;

You know that the Sinhala Unicode works with ZERO issues in Linux and Windows Operating Systems.

Moreover, the Sinhala Kit software (that you scream as plaster software) is only needed in current Windows XP operating systems that will be outdated very soon.

Sinhala Unicode comes (in other words, factory fitted) with Windows Vista OS (and in all future Windows OS’s too) and most of the current Linux OS’s.

Next time,
1. Get yourself a licensed copy of Windows Vista, or Linux OS such as Fedora.
2. See whether Sinhala Unicode works there.
3. If not please report.

At least, take this as a lesson and try not to comment on an outdated, already answered, fabricated issues.

Mark my words,
“within 3-5 years time, Sinhala Unicode will be all over the Internet, Public / Private organizations, Operating Systems, Mobiles Devices and in any place you name.”
on May 14, 2008 at 10:22 am | Reply Donald Gaminitillake

Ranjith I have done the sinhala characters after Attaragedara rajaguru Bandara (Wadan kavi potha) Dr Senarath Paranavithana and the group did the evolution of sinhala (Parinamaya) NOT the Current Sinhala Akuru.I am the only person who has done and published under ISBN the rules of the country. Therefore I have the copyrights of all individual sinhala akuru relating to the computer.(industrial and commercial acceptability and usage) ((Ranjith can write love letters but he cannot sell these love letter collection and make money – that is wh))This was offered to the SLSI and they refused to accept it. If they had taken it up I would not have these rights.

Harshadewa is boasing about Vista and linux but even on linux you cannot cut and paste a simple sinhala text to Vista or windows XP. Text compatibility will never be there unless we have UTF values for all sinhala characters.

All my comments will be archived in internet so that when some one does research they will know there was one person who had tried to save the SINHALA AKURU.

Donald Gaminitillake
I set the standard

Donald Gaminitillake
on May 14, 2008 at 6:25 pm | Reply harshadewa

It’s not good to do Donald-Bombing about fabricated issues!
And to then say that you tried to save Sinhala Akuru!
Funny! 😀

Donald-Bomb: Intentional commenting in 1:3 ratio with indirect information to fool the public, thinking that only Quantity matters in debates rather than the Quality

“but even on linux you cannot cut and paste a simple sinhala text to Vista or windows XP.”

Examples, Steps and SW Versions please… 😛
on May 15, 2008 at 6:44 am | Reply Donald Gaminitillake

Come on a public arena and show how Sinhala SLSI 1134 works with Linux, Windows Apple and unix.

Prove the Sinhala TEXT compatibility across all platforms and on Microsoft , Adobe Applications, Helawadena, Thibus and some other Sinhala only applications

Including the word “Rajapaksha” and many other that I would like to ask

You guys would never have the guts to show this in public and in front of Hon President.

Donald Gaminitillake
I Set the Standard
on May 15, 2008 at 11:29 am | Reply Donald Gaminitillake

Very interesting comment on fonts in parliament web site

For every borwser you need this pack to see sinhala unicode. This contradicts the terms of unicode consortium
Also this prove my comments that the unicode sinhala cannot be represented correctly without additional plaster software.

You lose again Harshadewa and Anuradha

http://www.fonts.lk/download/SinhalaIE.html

Quote
Sinhala for Internet Explorer 6
This pack enables Sinhala in Microsoft Internet Explorer 6.

It does not work with older versions of Internet Explorer, or with other browsers.

Download the software by selecting the link below, and then run the file with Administrator privileges.

If a page does not display properly in IE6, select:

View -> Encoding -> Unicode (UTF-8)

You can select your default font by going to:

Tools -> Internet Options -> Fonts

selecting “Language Script” as Sinhala, and then clicking on the font you want to use as default.

Once you have installed this pack, you may download additional Unicode Sinhala fonts which will work with any Unicode Sinhala web page.

Download Sinhala for IE 6

——————————————————————————–
If you do not have Internet Explorer 6, you can download it from Microsoft.

ICTA Language Group – 041113
on May 15, 2008 at 5:36 pm | Reply harshadewa

Ohh.. we got scared.. Don’t say that again. We are fear and scared of public places.. we also have Sociophobia.

But I must say that we do not have Agoraphobia, Neophobia, Technophobia, Ephebiphobia or Autophobia. 😉

(People, I think you deserve a search on Wiki on all these fobia’s)

Internet is the Ultimate Public Place in the known universe!
You can’t find a better public place than this.

“For every ~~borwser~~ browser you need this pack to see sinhala unicode?”

Oh.. It’s such a bad thing isn’t it? and it says IE6 (specifically) as well. What a waste! We should stop using Sinhala Unicode now… I just installed IE 6 after uninstalling all my latest browsers.

Happy? 😀 😀 😀
on May 16, 2008 at 10:59 am | Reply Donald Gaminitillake

You never answered the question of TEXT compatibility across platforms

Since it is not happening I have proved Sinhala unicode is an incompatible and incomplete solution

Donald Gaminitillake
I set the standard
on May 16, 2008 at 12:39 pm | Reply harshadewa

Mr. Donald,

Steps to follow,

1. Read from top and write down all the unanswered questions in a piece of paper.
2. Also write down all the Answers that we have provided in the other side of the paper.
3. Read both sides of the paper and try to understand the questions and answers by repeating several times.
4. If you’re done, start writing answers to the questions left.

You think that newcomers who read this would only read the last comment that’s posted. So by putting something interesting as the last comment here, do you think that it will help your false propaganda?

If you think that’d help your false propaganda and continue in this way, I would restrict future comments of you since I’m concern about the good conduct of using comments section of this blog.
on May 17, 2008 at 9:44 am | Reply Donald Gaminitillake

You are just going round the loop

When there is no SINHALA TEXT compatibility in all operating systems what are you talking.

Your solution is incomplete and incorrect solution including the LINUX SInhala

I always challenge you to come forward for public demonstration. You avoid even talking to me.

Donald Gaminitillake
I set the standard
on May 18, 2008 at 1:05 pm | Reply harshadewa

This is the mail I got from you on May 12, 2008 7:27 PM

Dear Sir

If you have free time pls call me
0777-xxx-xxx

Since you are an IT guy and say a free thinker — I think we can talk

Donald

This is the reply I sent you on May 12, 2008 7:49 PM

Dear Sir,

I think this issue of Sinhala Unicode should be spoken openly.

If I agree with you after calling you, there won’t be any effect on this problem because you can’t repeat it with all the people (hundreds of thousands of people ?) who accepts Sinhala Unicode.

I neither have any personal interest in so called Unicode problem nor I have any problem with you.

I believe what I see as truth and only sound arguments and facts backed by practical examples can change my opinion. Furthermore I think that kind of explanations needed to break Unicode can be expressed over web more flexibly.

Therefore, thanks for the interest shown in this subject, but No! I do not like calling you in this regard.

Many thanks,
Harshadewa
on May 18, 2008 at 10:49 pm | Reply Anuradha Ratnaweera

I don’t want to continue this discussion with Mr Donald.

Why?

He avoids answering questions asked in point form.

He also avoids the question: “isn’t Sinhala hodiya also incomplete?” 😉

He talks very vague. Here is an example:

“Best example is Sinhala text is not compatible across all platforms. No UTF values”

By “Sinhala text”, he probably means “Unicode sequences representing Sinhala text”.

The comment itself is a lie (if it is true, how can I read email sent by my Windoze friends), and of course, it is too vague.

And Mr Donald doesn’t seem know the difference between Unicode codepages and encoding schemes such as UTF-8.

Mr Donald says “No UTF values”. What are “UTF values”? There isn’t anything called “UTF values” in this context. You are probably referring to UTF-8 encoding scheme, which has got nothing to do with “values”.

See, you are confused. If you want to fight, know your enemy first. If Sinhala Unicode is your enemy, first take some time to study your enemy. 😉

Now I can predict what Mr Donald is going to do. He will send another shower of comments, so a newcomer may find it hard to find logical arguments amongst the mess.

If you can’t win an argument, confuse. A “draw” by “confusion” is better than loosing… 😉 I know it’s Mr Donald’s theory. And writing here is only going to help him achieve such malicious ends.

So, please await a better “reply”!
on May 19, 2008 at 1:26 pm | Reply harshadewa

Mr. Donald,

Thanks for the interest shown in this regard!

You will not be able to post vague comments on this Blog anymore. All comments from you will be waiting in the moderation queue.

Moreover, I’ll be filtering out any comment that is missed by WordPress filters, if any.

By any chance, if you post comment(s), that answer questions asked from you repeatedly, I’ll happily accept them to display here.

I strongly think that almost 100 comments are more than enough for one who has even a little bit of upstairs, to understand your problem on Sinhala Unicode.
on May 21, 2008 at 2:14 pm | Reply Dharma Gamage

I do not agree with Donald that we should give up Unicode and adopt his CAT, but simultaneously I do not think Unicode at its present state is complete. We talk about a standard here, not a fonts set. There are people who do not know the difference.

I have been a part of the ‘Sinhala Unicode’ debate long back, and thought I would keep out of it because both parties are too adamant to admit their own mistakes. At least if one party listens we can think of moving forward.

It is pity that even seemingly reasonable people like Anuradha ask compares Sinhala hodiya with Unicode chart. There is no one to one correlation between Sinhala hodiya and Unicode chart. Assuming both to be same is the fundamental mistake done by Prof. J. B. Disanayake and pathetic that we continue that mistake.

Also what Anuradha refers as ‘hodiya’ is not the correct and complete Sinhala hodiya, but a chart used to teach basic Sinhala. Not all Sinhala letters appear in hodiya.

Finally, I think moderating Donald is not fair, specially he represents one party in discussion. Why shut others mouths if you are so sure what you say is correct?
on May 21, 2008 at 3:30 pm | Reply harshadewa

Dharma Gamage,
“I think moderating Donald is not fair, specially he represents one party in discussion.”

The decision and reasons lead to moderate Mr. Donald’s comments, can be found in upper comments sections.

Furthermore, I believe it is undoubtedly unfair to block someone who does sound arguments with his/her real intention to uplift the standards.

But, It’s useless and apparently bulkier to read/ store and manage the same comments/ ideas in different words. If you read at least 50% of comments made by Mr. Donald, you might be able to understand this simple fact.
on May 22, 2008 at 1:56 pm | Reply Dharma Gamage

Guys,

Let me tell you one thing. I don’t know Donald personally, but I have enough encounters with him on the web. He is adamant that he is right and whatever others say he will do what he likes. He has no sense where to draw the line.

So knowing that either you guys just ignore him (best policy) or if you want to interact give him a fair chance to express himself. Otherwise you will never know what he will do.

I see now he is doing and interesting discussion at http://bandaragama.wordpress.com/2008/05/08/is-there-anything-wrong-with-sinhala-unicode/#comment-342

At least the discussion here was more decent.
on May 25, 2008 at 1:08 am | Reply Anuradha Ratnaweera

Dear Dharma,

The analogy is the correlation between basic characters and modifiers. Basic characters and modifiers are combined to get more characters. Even in Hodiya, “ku” is IS the combination of “ka” and “ku”. (Reference among many: සිංහල හෝඩිය, පැරණි අකුරු කරවන පොත් පෙළ by බළන්ගොඩ ආනන්‍දමෛත්‍රෙය තෙර).

Anuradha
on May 27, 2008 at 5:58 pm | Reply Dharma Gamage

Anuradha,

I did NOT say Unicode is wrong. I accept the Unicode approach. (think Donald is not the only one who doesn’t) All what I said is Sinhala Unicode chart is still *incomplete*. For current Unicode chart lacks widely used characters like yansaya and repaya, while have kept space for never used ‘ilu’ (0D8F) and ‘iluu’ (0D90). Sannaka ‘ja’ (0DA6) another unused character.

This is purely because J. B. Disanayake has done the stupid mistake by building the Sinhala Unicode based on hodiya. (There are no ilu, illu is any of the other South Asian language charts)

In addition to yansaya and repaya, we call include some of he widely used joint letters too, because there is enough space. That will save so many other issues.

Unfortunately when somebody suggest even a minor change in Unicode you people become so defensive and start shouting mad. This is the biggest obstacle to the Unicode.

We saw once somebody called Anandawardhana went to the length of suggesting to remove yansaya and repaya from Sinhala language instead doing a simple change.

Now I know your response. You will try to defend JB. That is your problem. None of you ever have the guts to admit a mistake and correct it. You are dead sacred of change. So you continue with mistakes.
on May 28, 2008 at 1:45 pm | Reply Dharma Gamage

I see an edited version of the last post I made here has been cut and paste at http://bandaragama.wordpress.com/2008/05/08/is-there-anything-wrong-with-sinhala-unicode/#comment-375

I do not know who did it, but whoever did it that is an unethical thing to do.

I am here not to personally attack anyone.
on May 29, 2008 at 3:19 pm | Reply Anuradha Ratnaweera

Dharma,

[In addition to yansaya and repaya, we call include some of he widely used joint letters too, because there is enough space. That will save so many other issues.]

Can you give an example of an issue?
on May 30, 2008 at 8:24 am | Reply Dharma Gamage

Anuradha,

Take Piruvana poth vahanse, select any page and reproduce the same (as they appear in the book) here and I will show you the issues.
on May 30, 2008 at 1:26 pm | Reply Anuradha Ratnaweera

Thanks Dharma. In fact, we recently started converting old texts into Unicode [only in our free time, so progress is not going to be VERY fast], so it will be a good exercise to figure out issues from a developers as well as a users point of view.

Please have a look at our first attempt here:

http://ar-si.blogspot.com/2008/05/blog-post_29.html

I found some problems with our font with this exercise. Should be able to fix in the next release.

I like to have this with the old “ddha” in “Namo buddhaya”. Will see how it goes.
on May 30, 2008 at 5:57 pm | Reply Dharma Gamage

Anuradha,

Ok, here are some issues from ‘Magul Lakuna’

10. ශ්‍වෙතඡත්‍ර දෙකය (NOT Shevatha should be Shvetha. Sha and Va should be jointed)

19. රක්‍තෝත්පල දෙකය (should be Rakthothpala. Ka and Tha should be jointed)

21. ශ්‍වෙතෝත්පල දෙකය

34. දක්‍ෂිණාවෘත්‍ත ශ්‍වෙතශංඛ දෙකය

57. චතුර්‍මුඛ සවර්‍ණ නෞකා දෙකය (Should be Svarna, NOt Savarna)

Go on typing. When you reach ‘Sakas Kata’ I will show the other issues.
on May 31, 2008 at 6:42 am | Reply Anuradha Ratnaweera

Dear Dharma,

Thanks for having a look at mangul lakuna.

57 was a typo. Fixed it. Thanks.

If you analyze the rest with a suitable tool, you will notice that there are joiners between the joint characters. Unfortunately, not all systems/tools have suitable fonts to support them.

Even with our LKLUG font, ක්‍ෂ in 34 is shown properly as a bandi akura, but not ක්‍ත or ත්‍ප.

Let me show a good analogy. The CSS standard tells how a browser should display web pages. But not all browsers support all of the CSS standard properly.

There are some good tests to test the browsers for standard. These “acid tests” show a reference image, and how a browser should display it with CSS. There are in fact three acid tests now, in increasing degree of complexity.

http://www.acidtests.org/

Old browsers fail even the first test, but the new ones seem to perform better. To get old browsers working with CSS, we need various tricks.

What we need for Sinhala Unicode are some similar tests for fonts/systems. I like to see these levels (we can refine them later):

Level 0: basic vowels and consonants: අ, ක etc
Level 1: consonants modified with a simple single modifiers: කැ, ක් etc
Level 2: consonants modified with a “kombuwa” modifier: කෙ, කෞ etc
Level 3: Mandatory joint letters and modified forms: ක්‍ර, ක්‍ය, ක්‍රි, ක්‍රෝ
Level 4: Used joint letters and forms: ක්‍ෂ, න්‍ද, ර්‍ම, ක්‍ෂෝ, ශ්‍වෙ
Level 5: All possible joint letters and forms

What we need is a set of web pages with two columns. First column is the Unicode test, second column is an image that shows how it should be rendered.

If an operating system / font can match all characters in the first column with the second column upto a particular level, we should say it’s “Level x” compliant.

A font/system with Level 4 compliance is what we need.

We have to refine Level 4 whenever we find a new joint form by moving it from Level 5.

This discussion is now getting very productive. Thanks!

Anuradha
on May 31, 2008 at 12:03 pm | Reply Dharma Gamage

Why make things complicated when simple straightforward solutions are available?

Number of practically used ‘joint letters’ in Sinhala/Pali/Sanskrit are less than 20. (Believe me, I have counted) There are 49 empty spaces in Unicode chart. Why not add them? (As Indians have done)

Tell me, why are you people so reluctant to change Unicode chart? I do not think any standard should be static. It has to be dynamic. Period revisions are essential in any standard.

Even the constitution is amended when needed. Why not Unicode chart?
Is it because every one of you are dead scared with the ‘reputation’ of the two ‘luminaries’ JBD and VKS?

As someone said do you think the avatar of VKS will squeeze your neck at night if you change Sinhala Unicode chart? 🙂
on May 31, 2008 at 3:36 pm | Reply Anuradha Ratnaweera

Dharma,

Buddha decided to change to middle-path only after trying out all the other systems existed in India at the time. I want to try to use the present system as is as a user and a developer before proposing amendments.

As you may have already noticed, we have already started entering old text to check problems.

I did come across problems, the ones you correctly pointed out.

Although the text was entered according to SLS1134/Unicode, fonts and systems were incapable of displaying all the characters as expected.

As a means of addressing the issue, I have created an “ACID Test” equivalent for Sinhala Unicode.

http://www.sayura.net/anuradha/sinhala/unicode/test/

It would be great if you can also contribute by telling me any missing joint letters when I get to class 4.

Right now, I have done only class 1 and 2. When we get to class 3 and 4, we will get to the fine points of joint letters. If I can’t get my system to pass all four classes, THEN I will start a campaign to amend the standard.

Anuradha
on August 12, 2008 at 7:44 am | Reply Interested party

This is a tip to Mr Donald, you could use Image Capture to take an screenshot of your Parallels screen running on a Mac.

http://www.apple.com/pro/tips/secretcapture.html
on March 11, 2009 at 10:35 pm | Reply bandara

i’m student. i want to write sinhala (letters) web page.How to write tags or other system.
Please Reply
on June 17, 2010 at 5:09 pm | Reply Indrajith Kumara

Dear All,

Now you can Free download all Sinha Fonts & Sinhala Fonts Related Softwares free of charge. This site very useful for Sri Lankans living around world.

Visit –

http://www.videshasewa.com/home/free_sinhala_fonts.html

or

http://www.videshasewa.com

Thanks..
on April 10, 2013 at 9:50 am | Reply Laverne Vermont

you have an incredible weblog here! would you prefer to make some invite posts on my weblog?
on November 12, 2014 at 9:11 am | Reply Muheeth Cassim

I have installed Sinhala so many time in my widows 7 but no luck. I can perfectly type Sinhala Unicode characters in Microsoft Word. The only problem is I cannot see Sinhala Unicode Characters in any of the browsers like FACEBOOK and other Sinhla online New Papers & Blogs too.

\what may be the reason?

email: sensaconcept@yahoo.com
on June 19, 2016 at 7:14 pm | Reply dilshan

Web page ekakin unicode power point wlata copy karama akuru wena wenama penne eka hdaganne khmda alpili papili ehe mehe ghn tyennne
on January 5, 2017 at 4:00 pm | Reply VH

If I copy the “ශ්‍රී ලංකාව” text and paste it to Microsoft Word (I have Microsoft® Word MSO (16.0.7571.7063) 32 bit on Windows 10 Home), it won’t look right. The zero width joiner has vanished if I copy that back to this comment: ශ්රී ලංකාව. This is a disgusting feature of Microsoft Word. In Excel the ligatures work fine.

The only way I’ve found to get a ligature like ශ්‍රී to Word is by making a plain text file and opening it with Microsoft Word, and then copying and pasting that. Resizing the font of the document will be impossible using the buttons that normally make the font bigger or smaller. That will only affect parts of the document that have never had the zero width joiner, not even all of that. I have to resize by defining the new font size as a number.

If I copy a ligature to an open plain text file in Microsoft Word, save it and reopen the file with Word, the ligature will look correct even though before saving and reopening the file the ligature won’t be shown.
on September 2, 2017 at 8:49 am | Reply ingenieurs marocains

هيئة المهندسين التجمعيين – corps des ingenieurs du parti du RNI

This is my expert
on October 19, 2017 at 8:18 am | Reply Jayantha Peiris

I have a problem in typing bandi akuru when I type Pali text. as you all know ‘hal lakuna’ is not used when writing pali in sinhala.

සත්ථා is written avoiding ‘hal lakuna’ and making last two letters ‘bandi akuru’
I am not talking about complex characters like සත්‍ථා .
on October 19, 2017 at 8:23 am | Reply Jayantha Peiris

Can’t we include ‘dakaranshaya’ in sinhala Unicode. in sinhala බෞද්ධ was written avoiding ‘ද්’ and adding dakaranshaya to ධ.

Comments RSS

	Jayantha Peiris on Sinhala Unicode: A Real Proble…
	Jayantha Peiris on Sinhala Unicode: A Real Proble…
	ingenieurs marocains on Sinhala Unicode: A Real Proble…
	VH on Sinhala Unicode: A Real Proble…
	dilshan on Sinhala Unicode: A Real Proble…

eXtremes of Truth

සත්‍යයේ ඉම | Ideas, Thoughts and Feelings..