This is an explanation and/or constructive criticism on some claims made on the already standardized Sinhala Unicode system.
Claims
1) The letters like “ඩු” (DU) has not been stored in UNICODE. Instead it creates ඩු (DU) in this way = ඩ + පාපිල්ල = ඩු (DA + Papilla = DU) which is wrong!
2) All letters that can be created by adding පිලි (Pili) and යංශය (Yanshaya) should also be stored individually in UNICODE.
3) There’s an “IT security threat” in Sinhala Unicode! “Same strings gives two different characters in different browsers”.
Explanations
1) Yes. “ඩු” (DU) and other similar letters do exist in UNICODE. As we learnt in හෝඩිය පන්තිය (Nursery School), “ඩු” (DU) is created by adding a පාපිල්ල (Papilla) to letter ඩ (DA).
Therefore the Sinhala UNICODE system works just like that way understanding ඩ + පාපිල්ල = ඩු (DA + Papilla = DU).
A person who has learnt Sinhala language well enough knows this theory and there’s nothing wrong in representing the same thing in UNICODE as well.
2) This could be called somewhat “ridicules”! Does a calculator have a key for number 10? No! Why?
Because, you don’t ask a key for 10 as you can create 10 by pressing 1 and 0.
Therefore when there are ways to create “ඩු” (DU) as ඩ + පාපිල්ල = ඩු (DA + Papilla = DU), you don’t ask for separate “ඩු” (DU) character to be stored. It is Unnecessary.
3) Yes! There are some known problems in current versions of Firefox (FF 2.x and below) web browser in displaying very few Sinhala letters (Ex: Shri). But this has already been fixed in next version of Firefox which will become the mainstream browser of the Internet very soon. (Download Firefox 3)
What is not understandable is how this can be a Security Threat! I found a table of computer security threats. After googling a bit found a match there. It is called Human Error!
As per claims, the word SRI is written / displayed in different ways. I think this should be deliberately enabled anyway.
For instance, say a person really wants to write SRI in a different way. How can he write it if the language doesn’t allow him?
The freedom of using characters / words in the way they want should be given. It’s the user’s responsibility to do it in the correct way. This is comparable in this way that “Everyone is free to write anything in a blog! But one has to be honest enough to choose whether to highlight things wrongly for his or her own goodwill or not.
Moreover, there are claims that Sinhala text cannot be copied from Internet Explorer web browser and pasted into MS Word correctly (Copy from IE to MS Word). This is false. See how I have copied some Sinhala text from IE to MS Word.
However there are some speculations saying that there are ulterior motives and hidden agendas behind these claims. I’m not discussing those here or I’m not targeting anyone. But when there are things that mislead people, someone should correct it.
Listed below are some more valuable posts from Anuradha containing technical details with good arguments regarding the same issue.
Great post. Good explanations for the arguments 🙂
(Im sure this is gonna get at least one more comment Check, http://www.kulendra.net/index.php?option=com_content&task=view&id=97&Itemid=9, unfortunately in a DB port, I changed all data to ASCII and the UTF-8 data got corrupted. Scroll down to the comments sections and see a preview of the comments you ‘might’ get 😉 )
I totally agree with you on point 3. There’s no god damn threat. And can you also explain on the it a bit; isnt it rather a problem of the browser being fully compliant with the sinhala unicode than anything else? Thats how I always saw it.
On 1and 2, the argument is that since each sound is a letter in sinhalese they should be given a place in the codemap. I guess the real ‘issue’ in this is that when you are coding for sinhalese. If I can remember the standard requires the font modifier (i.e. pili) to be placed after the font when storing unicode data. So when storing a letter like කො is stored, it is stored as ක + kombuwa + alapilla. Whoever the guy displaying this, needs to know that kombuwas are placed in front of the letter and so on. Again I might be wrong on this. As far as data storage, retrieval and presentation are concerned, there arent any issues of representing sinhalese in Unicode (or gaian, ‘at least as far as I know’), every sinhala word can be written, stored and presented in unicode.
Kulendra,
“Im sure this is gonna get at least one more comment”
He he.. I’m looking forward to reply back. But not over the phone.. lol!
I read the comments of your post. Interesting!
“I guess the real ‘issue’ in this is that when you are coding for sinhalese.”
Can you elaborate on this more?
As far as an API or a Standard is published (for Sinhala Unicode), anyone can develop (code/systems) according to that. Because standards are not hidden. It’s open. But everything should be well documented and the developer must read it well.
By any chance if you are talking about developing a Sinhala programming language, it can be suggested that only to use limited number of characters stored in Unicode rather than going for one’s with modifiers. Am I right?
He he well, hopefully, you’ll get the comments only in here. Anyway about the ‘issue’ thing;
I was actually referring to the ‘ease’ for a programmer to read and interpret sinhala Unicode. Say in a OCR program, English can be converted and stored as you scan (and recognise) them. But scanning and storing Sinhalese will have a bit more work as the standard requires the font modifiers to be placed after the original font. (Thus the ‘ko’ example above). There is no issue as to ‘how’ this should be done, it is pretty much clearly stated. However the programmer would have to do bit more than if he was scanning a Latin alphabet.
Of course this is from a person who has barely any coding background, so this is probably not a problem for you guys 🙂
Firefox 2 on GNU/Linux renders Sinhala Unicode text very well. All you need to do is to enable Pango shaper in it.
By the way (perhaps you may have seen it already) I also blogged about the topic. 🙂
I am really sorry, I didn’t notice that Harshadeva had already linked to my blog posts at the end of the post.
Unfortunately you guys have not seen the unicode consortium’s registrations.
Do you know all the latin characters are registered in the unicode. ü what ever they way you key in you can see the umlaut u on any Operating system. Likewise the sinhala “DU” or “KU” or any sinhala character need to be given an individual code point in SLSI 1134.(ISO or Unicode)
Regarding the security problem in characters are not human errors. Same string cannot give two different characters in two different browsers. Accept this error. all this is because the registrations are inadequte to represent sinhala language.
visit
http://www.unicode.org/review/pr-96.html
quote
The use of format characters in identifiers is problematical because the formatting effects they represent are normally just stylistic or otherwise out of scope for identifiers. To make matters worse, it’s possible to misapply format characters such that users can create strings that look the same but actually contain different characters, which can create security problems
unqoute
Unicode itself confirm the security problem pls do not mislead the people by saying human error sir
Quote from unicode
The goal for such a restriction of format characters to particular contexts is to
1. allow the use of these characters where required in normal text
2. exclude as many cases as possible where no visible distinction results
3. be simple enough to be easily implemented with standard mechanisms such as regular expressions
unquote
We cannot allow to change the representation of words the way you like. Sinhala is Sinhala. Sinhala can be represented correctly if you use my system encoding.
each and every character need to be given a absolute code point.
if any reader need more clarification please visit me in my office
290 DR Wijewardena mawatha
Ingrin Institute of printing and graphics
Donald Gaminitillake
I set the standard
Quote from unicode
http://www.unicode.org/reports/tr2.html
“There is a standard extant for Sinhala described in A Standard Code for
Information Interchange in Sinhalese by V.K. Samaranayake and S.T. Nandasara
(ISO-IEC JTC1/SCL/WG2 N 673, Oct. 1990). The coding proposed in it was found
to be an inadequate basis for a modern, computer-based interchange code,
though it is adequate to handle the capabilities of a Sinhala typewriter for
representing contemporary colloquial Sinhala. ”
Unquote
your system is just a typewriter. as I have clearly proved that the Sinhala unicode registration was done by a person who was not gone to a school in sri lanka nor have ever been to Sri Lanka. Worship the white skin anuraddha.
Listen to me and correct the language issue.
I have copyrights because you guys made a wrong format. Sinhala is not correctly registered or represented in the SLSI 1134 nor in Unicode Consortium.
Also see
same text seen on different browsers read different
Donald Gaminitillake
I set the standard
Hey Kulendra
quote
to do bit more than if he was scanning a Latin alphabet.
Unquote
For Latin script all characters are given individual code points almost over 600
We do the same thing into sinhala – you get the same results as latin script sir
But only I have listed this in ISBN955-98975-0-0
So get it from me and develop the OCR
Donald
I set the standard
I am sorry I am not permitted to make any comments on the
සිංහල යුනිකෝඩ් සමූහය – Sinhala Unicode Group
by the moderator
I read your comments and comments made by others. I have proved that sinhala is not compatible across all platforms like the latin script. If you say yes come over and show it to me or come for a discussion
I have given my contact numbers for you to contact me and discuss the issues using sinhala in a computer.
Why are you so scared to meet me (by appointment) and openly talk.
Best
Donald Gaminitillake
I set the standard
Mr Donald, can you please answer these questions in point form?
1. Unicode pr-96.html is about ZWJ which is used in Sinhala Unicode for joint letters. But you are using it to justify your argument about “du” which does *not* use ZWJ, how so?
2. Your system has only 1600+ glyphs. Where are the joint letters? And how about joint-joint letters (e.g.: “indriya” when written like this ඉන්ද්රිය). Please see the following screenshot taken years ago:
How do you address those in your system?
3. You have only quoted a part from the above PR. For example, you have conveniently ignored these sentences among others:
“For these reasons format characters are normally excluded from Unicode identifiers. However, visible distinctions created by certain format characters (particularly the joiner controls) are necessary and carry meaning in certain languages. A blanket exclusion of format characters makes it impossible to create identifiers based on certain words or phrases in those languages.”
The goal of the above PR is to identify the use of ZWJ etc in Unicode and to show how to *fix* any security problem arising, and not only to *expose* a security problem like you demonstrate.
The question no 3 is, how and why did you quote only a part of the above PR?
4. You say that installing anything is not needed for Latin scripts and it should be the same for Sinhala! So how does your “system” work without installing a font or a keyboard driver?
මේ ඩොනල්ඩ් ඩක්ව ආපහු ඉස්කෝලෙ යවන්න වෙයි වගේ හෝඩිය ඉගෙන ගන්න…
First I have listed all sinhala characters that I have identified in my character allocation table published.(copyrights)
Even if I have missed any character It can be added easily.
First you got to admit the errors in the present system. Once it has been accepted. we got to correct it.
The process will contain font set with proper open character coding for each sinhala character and a basic input method ( sinhala IME) options are open for the public to improve. Font is an art work and anyone else can develop and sell or give free with my encodings
Copyrights on industrial and on commercial usage of my allocation table is reserved by self. This is a right given by the Govt of Sri Lanka and by you guys. When I offered it freely at the SLSI you all got and rejected. There is a problem with your system the solution is my system. Solution to a problem is a copyright by LAW in SRI LANKA. I enjoy that right.
Donald Gaminitillake
I set the standard
Mr Donald, you are going round the mulberry bush.
Firstly, you did not answer all my 4 questions. You selected what you want to answer, just like you select what to quote.
So you are no event sure if you have listed all the characters in your patent-pending, ISBN registered allocation table!!!
I was not talking about mare one or two characters you may have missed. I was talking about JOINT LETTERS. Your system with 1660 characters doesn’t have any!
Come on, fix your system before criticizing others.
I hope you’ll answer ALL my questions.
Correction: “So you are no event sure” should read “So you are not even sure”.
Anuradha I have registered the joint characters in my ISBN.What I have selected from pali text books and what has been registered in the govt education publications for privena education.
So far no one have made any attempt to see the table and comment.
It is a far larger than the 1660 my first count.
Donald Gaminitillake
I set the standard
“All questions that are listed in point form, should be answered in point form.” Not strictly as a rule of thumb, but only for the goodwill of readers who struggle reading long paragraphs. This also creates ways to do hide and seek game too.
Let us, for the moment, assume that question 2 is answered. Thanks Mr Donald!
How about about question 1, 3 and 4?
Unicode pr-96.html
is given to show that sinhala poses a security problem.
Whether “DU” or “KU” or “ksha” are not registered in the slsi 1134
These are hidden inside the Sinhala language kit
when the lang kit working it will show but when it is not working it shows as the unicode registered parts (glyphs) or characters.(junk)
If we have absolute character encodings like the latin script sinhala will not have all these problems.
Admit your errors Sinhala cannot use in excel, data sorting, OCR , across all platforms all this is simply that we have not published a character allocation table. ONLY I have published it.
SLSI 1134 is a part or a fraction list a typewriter list
Donald Gaminitillake
I set the standard
Mr. Donald Gaminitillake,
May I know (cuz I dont know),
1) How can’t we use Sinhala in excel?
2) How can’t we use Sinhala in data sorting?
3) How can’t we use Sinhala in OCR?
With sound examples ?
ඩු use this and paste it to excel and see the result sir
Donald Gaminitillake
I set the standard
Please see the below screenshot Sir.
This is how I see in excell
http://www.rotarycolombocentral.org/web-data/Components/Private/index.html
same string reads differently. additional software is not responding on my excel.
If unicode sinhala is correct we should see du in the cell as one unit not in two parts.
This is the problem. why not email me your du and I will email my du let see how these are seen on each other computers.
Donald Gaminitillake
I set the standard
Mr Donald, you didn’t answer my question no 2.
I asked, how does pr-96.html in Unicode site relate to your argument about du?
Be specific. Just saying “your whole standard is incorrect” is bullshit.
Mr. Donald,
My guess is that you haven’t set up Unicode in your computer deliberately, so that it won’t work. Or else is it that you don’t know how to set it up? – It is hard to believe that a person who doesn’t understand to get a simple screen shot can understand how to set up even a simple thing as Unicode 😛 but I am sure that one of us will be able to help you to set it up properly.
Please ask us to display any character you want. We will show and prove it to you, that Unicode can display it correctly.
And maybe it is your right to “enjoy” commercial benefits from your so called correct standard. But it is not ethical to “sell” your language for your own benefit. All of the people who are popularizing Unicode because they want to help their language into the modern age. Pebbles like you will not stop them in their journey towards their goal.
Maybe “you” set the standard 😛 But it is us who use it!! you and your standard can rot in garbage bins while, Unicode is used everywhere including Google services, Wikipedia, Sri Lankan Government, and many other state/private organizations!! No one can stop Unicode in the internet, as well as in other places.. Please give up your lost battle, and enjoy Unicode! It will open the gates for a Sinhala Internet / Sinhala Language Information, by amounts you haven’t ever seen before!!
P.S. – Taking a photo of a computer screen is equal to photocopying a computer monitor 😛
To represent sinhala correctly
You need a font set + additional software(sinhala kit)
Font set is what is in SLSI !!#$ or what is registered in unicode as sinhala
That is what is seen on my site.
Do we set up computers to see german , Bhasha malaysia or swhahili.
We can browse on any computer without any additional software
BUT to see sinhala we need font set and additional software
This additional software does not work well in all operating systems that is why we see raw form of sinhala or the typewriter concept.
Yes I used a digital camera because I use intel imac. On XP mode I need to follow that path.
Is it wrong to photocopy a monitor???
Since you guys are not willing to understand the problems in Sinhala SLSI 1134 there is no choice but to meet the top and give evidence.
You got to understand the difference between unicode consortium and what we registered in this consortium or SLSI 1134.
I never say unicode is incorrect but what has been registered in unicode consortium for sinhala language is incorrect and incomplete set of sinhala characters. This is the typewriter concept.
Donald Gaminitillake
I set the standard
It seems Mr Donald you don’t seem to understand English, or you are trying to play hide and seem while the whole world is watching.
You are yet to answer my questions:
How does pr-96.html in Unicode site relate to your argument about du?
Anuradha you can see the ‘du’ on my excel
http://www.rotarycolombocentral.org/web-data/Components/Private/index.html
that is the problem refers in pr 96
same string brakes sometimes , that is the problem with “SRI” “PRA” “KRI”
and many more.
Donald
Wrong and false.
“SRI” and “DU” are two different things. You are totally mixed up, or pretend to be so.
PR-96 does NOT refer to “du” or similar stuff. It refers to the possible problems if (repeat *if*) somebody filters out ZWJ.
Others who read this blog, please check the following link (which Mr Donald himself brought up), and verify that it is NOT about da + papilla = du (or consonent + modifier = modified consonant in general), but it is about the filtering of ZWJ and ZWNJ.
http://www.unicode.org/review/pr-96.html
Donald Gaminitillake has an amateur website at http://www.akuru.org/ He was a little known man but declaring war against Sinhala Unicode has made him a hero to a few and a villain to many. He was almost forgotten but suddenly this man again is in the limelight because of a letter he has got from the President’s Office. See http://www.flickr.com/photos/8503406@N05/2461712604/
Now what we see is that Sinhala Unicode supporters have started an all-out attack on Gaminitillake. Why the Unicode supporters panic? If Unicode is a perfect solution or at least a decent solution, they do not have to panic.
ICTA is under the President’s Office and if the Preident is convinced that Unicode is not the right solution there is a good chance of trying out Gaminitillake’s solution. With all what I have seen on the net about Sinhala Unicode issue is that Sinhala Unicode supporters are very protective and they are scared of a fair trial. Why???? There are some ‘hardcore supporters’ to Sinhala Unicode too. They will even die for Unicode. Nothing wrong. You have a right to admire what you love. All the same, Gaminitillake has a right to critisice it.
I see this issue like Microsoft vs Apple issue. Microsoft is widely used, popular, easily accessible to all walks of people even for a mere 100 rupee note from the software pirates. So, people love it. Take Apple. It is scarecely used, less accessible, and expensive. But we all know it is good in quality. So, someday, can’t Gaminitillake’s solution beat Sinhala Unicode?? Let the time decide.
So, my dear Unicode supporters, let Sinhala Unicode have a fair trial. Let the intelligent people llisten to Gaminitillake and then take a decision. You don’t have a right to mount Taleban-like attacks on this man. After all, he is a single man without no political influance (as far as I know) and he tries to prove a point. Let us listen.
I am neither a Gaminitillake sympethizer nor a Sinhala Unicode supporter. We want to see is a fair trial. Even criminals deserve a fair trial. Gaminitillake cannot be a crimanal by opposing Sinhala Unicode.
Some accuse him saying he is trying to make money out of his solution. So, what! He has all rights to make money if he can come up with a superior solution. If you try to deprive him of a fair trial, that indicates you have a inferior solution. I could be wrong. What I don’t understand is why you all are scared of a letter from the President’s Office to this lonely man.
In fact, Unicode PR-96 gives the following examples:
sha + hal kireema = sh
ra + diga ispilla = rii
which is similar to da + papilla = du (i.e., consonant + modifier = modified consonant).
So, PR-96 in fact gives examples *against* Mr Donald’s du claim.
Mr Donald, let me teach you how to argue. Use the PR-96 point to support arguments about joint letters… 😉
Anuradha thank you for teaching how to argue as you are a teacher at the university of sri lanka it is your duty to teach me and others.
So again you have admitted that when certain parts gets “FILTERED” and “THEY DO GET FILTERED” often you get in to a security problems as they image different sinhala characters (technically what has been registered in unicode or SLSI1134)
Computers cannot run like this with errors and we need to amend the SLSI 1134 as soon as possible.
Donald Gaminitillake
I set the standard
Mr. Donald,
why don’t you make your own software to do the things you clam rather then trying to find whats wrong with Unicode ? if Unicode is bad, don’t use it. its your choice. if its bad then come up with a solution that every one can use, not just web pages and documents, a product; we need some thing that works not fictions your taking about.
all and all what I’m see is your just a person who just don’t know much about computers.
what ever the matter, show us a working solutions first not documents and web pages.
“I set the standard” what standard ? a where is the product ? how can we use your product ? tell me ? ok i have linux & windows and tell me how to use your so call “I set the standard” ?
we don’t need people who do “katin bathala hitiwima” we need actions.
come up with a product that works in reality !!!!
Dear දෙඤ්ඤං බැටේ | Dennam Betey,
Firstly, I’d like to say that I do not benefit from any party, organization or team for DISCUSSING about these fabrications on Sinhala Unicode. But I honestly do not know who you are and whether you benefit from anyone.
I’m not fear of anyone’s standards or patents. I don’t simply care. But what I care is a sound debate where actual practical problems are seen as they are. I’m not a Worshiper or a Blind Believer.
Secondly this post was written by me only to raise the awareness of people about the speculations. I suggest you to “first read the post” before reading and posting comments.
See the 2 texts quoted from above post.
“This is an EXPLANATION and/or CONSTRUCTIVE CRITICISM on some claims made on the already standardized Sinhala Unicode system.”
“However there are some speculations saying that there are ulterior motives and hidden agendas behind these claims. I’m not discussing those here or I’m not targeting anyone. But when there are things that mislead people, someone should correct it.”
Anyway it’s up to you to decide or understand whether we tried to start a DECENT DEBATE or ATTACK him like you say about the issues only Mr. Donald raised.
As far as we know, we never tried to attack him. May be people who couldn’t control their emotions commented in some rough way.
But if one can go through the comments made here, I think no one can come to a conclusion like you said. I suspect that it’s you who do blind unfair judgments.
One would even see you as a sypothizer of Mr. Donald depite your proactive denial.
If you honestly practically face problems with Sinhala Unicode, please discuss them here.
Many thanks,
Harshadewa Ariyasinghe.
Mr Donald, your logic is flawed again.
There are three pictures of Sinhala samples in PR-96. First two are almost the same (only a space is missing in the second).
First two are how we usually see “sri” and obviously it is not flawed.
So Mr Donald can use only the third picture for “bad Sinhala Unicode”.
Now that example has a “ree”. It is also, like “du”, not “registered” in Unicode, but made by combining “ra” and “diga ispilla”.
So, the only example Mr Donald can use in that PR (which is about rakaransaya), also has a counter example for his claims about “du”.
Here is the URL for the PR-96:
http://www.unicode.org/review/pr-96.html
Dear දෙඤ්ඤං බැටේ,
I don’t want to reply to Mr Donald. I gave up long ago. I wanted to add explanations on the web only for the benefit of newcomers.
He has a full right to make money. He can implement his system, patent it, and if it is good everybody will like it and pay him royalties. But if he try to falsely accuse a working standard in order to achieve that goal, therein lies the problem.
I don’t want to discriminate anyone.. but Mr. Donald is acting in a rather foolish manner.. as we can see from his screen-shots, I don’t even wanna call them screen-shots.. because even a child can understand that he had used a camera to photograph his monitor… by doing this he sets a great example of using the ‘latest-technology’.
But I can assure that the claims that he’s making are false.. because sinhala unicode works for anyone.. and.. when using the latest/popular operating systems.. you CAN see the sinhala unicode letters, out of the box.. if you want an example you can try using a Ubuntu live CD.. you don’t even have to install it to see the words, in uncode.
And it really seems obvious that he’s deliberately trying to ignore Anuradha’s questions, he only answers the questions that he CAN answer. But when it comes to the questions that he can’t beat.. he merely ignores them.
And.. දෙඤ්ඤං බැටේ if you want to see a fair trial between the sinhala unicode and Mr-Donald’s-Self-Standard-Sinhala.. it’s happening right now.. what do you see as unfair in this debate.. it’s just that there are not so many people on his side.. but there are many sinhala unicode supporters.. as it’s popular, great and works well. If his standard is better.. no one will be able to stop it.. but remember the truth, honesty and good deeds always win.. and as I can see sinhala unicode is gaining an upper-hand in this “standard-war”…. so isn’t it obvious that unicode is better.
So Mr. Donald.. please don’t be a self-righteous ignorant git and stop accusing sinhala unicode about flaws that it doesn’t even have. If your intentions are pure.. and you want to help spread sinhala in the web, you can support the sinhala unicode standard.. and help to make it better.
Thank you
You guys are not talking of incomparability of sinhala. We dont have e-dic, e-grammar, e-encyclopedias nothing. Cannot copy and paste .
You guys does not want even admit that characters get altered when zwj etc get filtered. Also not admitting when sinhala kit is not functioning you will see garbage sinhala.
The Sinhala registered in unicode is incomplete and a incorrect solution
Donald Gaminitillake
I set the standard
I challenge you all
the code point for ayanna is specified in unicode consortium as
01 CODE POINT VALUE: : : : : 0D85
02 NAME (UNICODE NAME) : : : SINHALA LETTER AYANNA
03 GENERAL CATEGORY: : : : : Letter, Other
04 COMBINING CLASS : : : : : Spacing, split, enclosing, reordrant, and Tibetan subjoined
05 BIDIRECTIONAL CATEGORY: : Left-to-Right
06 DECOMPOSITION MAPPING : : –
07 DECIMAL DIGIT VALUE : : : –
08 DIGIT VALUE : : : : : : : –
09 NUMERIC VALUE : : : : : : –
10 MIRRORED: : : : : : : : : No
11 UNICODE 1.0 NAME: : : : : SINHALA LETTER AYANNA
12 ISO 10646 COMMENT FIELD : –
13 UPPERCASE MAPPING : : : : –
14 LOWERCASE MAPPING : : : : –
15 TITLECASE MAPPING : : : : –
16 DECIMAL VALUE : : : : : : 3461
17 UTF-8 HEX VALUE : : : : : 0xE0B685
18 UTF-16 HEX VALUE: : : : : 0×0D85
19 UTF-32 HEX VALUE: : : : : 0×00000D85
20 XHTML : : : : : : : : : : අ
21 BLOCK : : : : : : : : : : Sinhala
22 PLANE : : : : : : : : : : Basic Multilingual Plane (BMP)
23 STROKE NUMBER : : : : : : –
24 RADICAL : : : : : : : : : –
Like wise give me the registered location for
“ksha” (Rajapaksha)
list the values
16 DECIMAL VALUE : : : : : :
17 UTF-8 HEX VALUE : : : : :
18 UTF-16 HEX VALUE: : : : :
19 UTF-32 HEX VALUE: : : : :
20 XHTML : : : : : : : : : :
21 BLOCK : : : : : : : : : :
Also I must be able to see in unicode consortium registration
Donald Gaminitillake
I set the standard
Mr Donald,
First of all ZWJ and ZWNJ is used in other languages to display joint characters, the following link will lead you into one such example.
http://www.unicode.org/standard/where/
If you have studied sinhala language even up to grade 5, you should know that, K and SHA (ක් සහ ෂ) are two different words!
The code points to KSHA is as follows:-
0D9A SINHALA LETTER ALPAPRAANA KAYANNA
= sinhala letter ka
0DCA SINHALA SIGN AL-LAKUNA
= virama
0DC2 SINHALA LETTER MUURDHAJA SAYANNA
= sinhala letter ssa
* retroflex
200D ZERO WIDTH JOINER
* commonly abbreviated ZWJ
so the sequence of code points 0D9A+0DCA+0DC2 will generate KSHA (ක්ෂ) if you want bandi akuru you can use the sequence 0D9A+0DCA+200D+0DC2 this would display (ක්ෂ)
I Hope that you would understand it, but my guess is you don’t have the capability to understand that, because if you have understood that K and SHA are two different letters you wouldn’t have asked this question in the first place!
And, do you know that none of the “hodi poth” on sinhala shows all the letters combined with all the signs… they don’t, because even a child is capable to understand that KA is a letter “Al lakuna” is a sign and SHA is another letter !! A much similar process is used in Unicode!!!
දෙඤ්ඤං බැටේ,
ඔබට යුනිකේත කියැවීමේ හැකියාව ඇති බව හැඟුණු බැවීන් සිංහළෙන් ලියමි.
ඔබ පැවසූ ආකාරයට ඩොනල්ඩ් මහතා ඔහුගේ ක්රමයක් භාවිතා කර මුදල් ඉපැයූවාට වරදක් නැත… ඇත්ත වශයෙන්ම නීත්යානුකූලව වරදක් මමත් නොදකිමි. එහෙත් මා සිතන ආකාරයට නම් තම බස විකුණා සල්ලි හෙවීම තරම් නීච ක්රියාවක් තවත් නැත! ඒ සඳහා යුනිකේතයට (සිංහළ බස අන්තර්ජාලය තුළ සවි ගැන්වූ) විරුද්ධව නැති බොරු ඇදබෑමත් සුදුසු නැත.
It is a character not registered in unicode.
example the greek character
1F8F GREEK CAPITAL LETTER ALPHA WITH DASIA AND PERISPOMENI AND PROSGEGRAMMENI
01 CODE POINT VALUE: : : : : 1F8F
02 NAME (UNICODE NAME) : : : GREEK CAPITAL LETTER ALPHA WITH DASIA AND PERISPOMENI AND PROSGEGRAMMENI
03 GENERAL CATEGORY: : : : : Letter, Titlecase
04 COMBINING CLASS : : : : : Spacing, split, enclosing, reordrant, and Tibetan subjoined
05 BIDIRECTIONAL CATEGORY: : Left-to-Right
……
14 LOWERCASE MAPPING : : : : U+1F87
15 TITLECASE MAPPING : : : : –
16 DECIMAL VALUE : : : : : : 8079
17 UTF-8 HEX VALUE : : : : : 0xE1BE8F
18 UTF-16 HEX VALUE: : : : : 0x1F8F
19 UTF-32 HEX VALUE: : : : : 0x00001F8F
20 XHTML : : : : : : : : : : ᾏ
21 BLOCK : : : : : : : : : : Greek Extended
22 PLANE : : : : : : : : : : Supplementary Multilingual Plane (SMP)
23 STROKE NUMBER : : : : : : –
24 RADICAL : : : : : : : : : –
So these have numbers — UTF values
What you have given the input sequence for “ksha”. No registration in the unicode consortium. No utf value, it is not in the unicode registration.
Every sinhala character needs registration in unicode for it to be called a unicode sinhala.
Donald Gaminitillake
I set the standard
“Every sinhala character needs registration in unicode for it to be called a unicode
sinhala.”
Ok! That’s it! Ultimately this is your RULE! It is neither a LAW nor RULE written in a book or set by a government.
Anyone can tell such things but who cares? We don’t care! and neither people with little upstairs.
Again.. If one says
“Every number in the world needs a KEY in the CALCULATOR!”
I don’t know how to call the person or that theory.
What you say is crystal clear Mr. Donald. We can understand your suggestion. But when we can put up 10 by pressing 1 + 0 why the heck we need all characters to be individually registered?
And we don’t need to accept each and every damn way that others have implemented their languages. We have to invent our own way. It has been invented and implemented successfully.
Your Excel and Sorting examples have been proved wrong here and Please use the latest correct software to enable sinhala 100%.
Problems shown in your screenshots are arising not because of a Sinhala Unicode problem but because of your inability to setup/use/implement software applications.
Talk wisely and choose honestly!
Thank you,
Harshadewa Ariyainghe
Donald,
Sinhaka is registered in Unicode as everyone except you know! and each of the unicode characters are mapped in utf-8.. so K have a utf value al kirima have a utf value and sha has a utf value!!!!
You can get more information about it in the following links!!
http://www.unicode.org/charts/PDF/U0D80.pdf – Unicode sinhala, official document
http://en.wikipedia.org/wiki/Utf_8 – what is meant by UTF-8
http://www.unicode.org/Public/UNIDATA/UnicodeData.txt – all the characters in unicode (You can search the text for “sinhala” and you will find all the information about it).. if you don’t know how to do a simple search.. sinhala block starts form 0D82… you can go manually upto that point and see all the sinhala characters from it onwards…
have fun with it!!!
We are the ones who use the standards!!!! (proper ones if you were wondering :P)
Devil is in the details, they say. Attention to detail is all about being specific.
Mr Donald, you don’t seem to know how to be specific when talking.
Do you like to look at my comments about the “ree” character in the third picture in Unicode PR-96 and show that you do pay attention to detail?
In 1920s, there was a famous debate between Buddhist Atheists and Buddhist Theists in සිංහල බෞද්ධයා (Sinhala Baudhdhaya) newspaper. It was later published as a book දෙවි දේවතා පෙරහැර (devi devatha perarhera).
With several personalities like Hemapala Munidasa, Polwatte Buddadaththa thero, Balangoda Ananda Maithriya thero at his young age, Yagirala thero etc taking part, the language and witty discussion itself is a treat to read.
David Karunarathna (whom you should know well if you love the language and have read enough literature), young but already well read, and who lead the Theists in the debate, asked five very specific questions from the oppssition.
Katuwellegama Amarasiri Thero, who lead the Theists, didn’t answer the questions. Instead he said some general comments and said “now it’s clear that my opponent’s arguments are false”.
David Karunarathna reminded the five questions and compared Amarasiri Thero’s arguments as “granny’s arguments”, because they were not at all specific.
I don’t want to draw parallels here, but we asked some specific questions, and answers were always about “incompleteness and incorrectness”, but not specific. Grannys and Grandps, all the same I suppose! 😉
Dear Harshadewa
The problem here is ou are using and calling what you use is SINHALA UNICODE
If anyone say it is unicode registered character set each and every character needs a registration and the utf value
“Khsa” have no utf value or CODE POINT VALUE or NAME (UNICODE NAME it is not a unicode character
admit this or give me the values for above (if you say it is sinahala unicode)
If you say the sinhala what you are using is ucsc anuruddha’s sinhala or ICTA wasantha Sinhala or VKS sinhala this question is not a problem. Moment you clasify it as unicode registered sinhala you got to give its code points It has to be in the unicode registry.
Donald Gaminitillake
I set the standard
Anuradha you are incapable of giving the utf values or code points for “ksha”
This is just one sample
Using joiners is not a problem it is a string of inputs but the utf value and unicode name etc etc must be there in unicode consortium to tell that is a unicode registered character
The Sinhala that you all are using is partly registered in the unicode the rest is under plasters of ICTA and relavent groups that try to control sinhala
Donald Gaminitillake
I set the standard
quote
Problems shown in your screenshots are arising not because of a Sinhala Unicode problem but because of your inability to setup/use/implement software applications.
unquote
If you have a font why you need additional software
Donald
Let me try to practice what I preach; to be specific and pay attention to detail.
Mr Donald asks what are the “utf values or code points of ksha”.
First, utf is not a numbering system, but an encoding system. Numbers are assigned in Unicode.
ksha = 0D9A 0DCA 0DC2 – not joint (ක්ෂ)
ksha = 0D9A 0DCA 200D 0DC2 – joint (ක්ෂ)
Hope you will also answer my specific point: about “ree” letter in third picture in PR-96, is it correct or not?
“Moment you clasify it as unicode registered sinhala you got to give its code points It has to be in the unicode registry.”
1. It’s in the UNICODE registry and the values are given over and over again.
2. You keep saying it is NOT, because you can’t understand the simple theory of “1 and 0 is 10“.
3. Your argument of “Every letter in Sinhala should be given an individual code in UNICODE” is NOT a hard written RULE nor specified by any GOD.
4. Trying to give each sinhala charactor a unique code in UNICODE looks hilarious for me.
Personally, I think there’s no problem in using the exact same representation of creating a letter in Sinhala for Computers. i.e.;
Quoted from an Anuradha’s comment above.
sha + hal kireema = sh
ra + diga ispilla = rii
which is similar to da + papilla = du (i.e., consonant + modifier = modified consonant).
Dear Anuradha
Those are the input codes, 200D zwj
The end result comes not from unicode registered character sir.
It is inside your kits and additional plaster software
the problem is you are mixing the typewriter input method with unicode registrations
Ksha need to be registered in the unicode with a proper name and a utf numbers
æ or Æ
1 CODE POINT VALUE: : : : : 00C6
02 NAME (UNICODE NAME) : : : LATIN CAPITAL LETTER AE
03 GENERAL CATEGORY: : : : : Letter, Uppercase
04 COMBINING CLASS : : : : : Spacing, split, enclosing, reordrant, and Tibetan subjoined
05 BIDIRECTIONAL CATEGORY: : Left-to-Right
……
6 DECIMAL VALUE : : : : : : 198
17 UTF-8 HEX VALUE : : : : : 0xC386
18 UTF-16 HEX VALUE: : : : : 0x00C6
19 UTF-32 HEX VALUE: : : : : 0x000000C6
20 XHTML : : : : : : : : : : Æ
Above is a joint character but it has a individual registration in unicode.
How we input is not required because you can have several input methods depending on input OS and drivers. The out put value remains the same because the value has been given in unicode registration. we all see the same character. It wont break into parts.
So the characters registered in the sinhala unicode is not enough to represent the sinhala language. Therefore it is an incomplete solution has to be amended.
Either you got to accept my proposal or propose a another new way.
You have no other way because you all gulped Andy’s method and now trying to justify it.
another good example is the numeral 500 is written roman numeral is a “d”
but in unicode again the “d” is re-registered
01 CODE POINT VALUE: : : : : 217E
02 NAME (UNICODE NAME) : : : SMALL ROMAN NUMERAL FIVE HUNDRED
03 GENERAL CATEGORY: : : : : Number, Letter
—–
13 UPPERCASE MAPPING : : : : U+216E
14 LOWERCASE MAPPING : : : : –
15 TITLECASE MAPPING : : : : 216E
16 DECIMAL VALUE : : : : : : 8574
17 UTF-8 HEX VALUE : : : : : 0xE285BE
18 UTF-16 HEX VALUE: : : : : 0x217E
19 UTF-32 HEX VALUE: : : : : 0x0000217E
20 XHTML : : : : : : : : : : ⅾ
21 BLOCK : : : : : : : : : : Number Forms
22 PLANE : : : : : : : : : : Supplementary Ideographic Plane (SIP)
BUT the same character is registered for text as latin
01 CODE POINT VALUE: : : : : 0064
02 NAME (UNICODE NAME) : : : LATIN SMALL LETTER D
03 GENERAL CATEGORY: : : : : Letter, Lowercase
—–
5 TITLECASE MAPPING : : : : 0044
16 DECIMAL VALUE : : : : : : 100
17 UTF-8 HEX VALUE : : : : : 0x64
18 UTF-16 HEX VALUE: : : : : 0x0064
19 UTF-32 HEX VALUE: : : : : 0x00000064
20 XHTML : : : : : : : : : : d
21 BLOCK : : : : : : : : : : Basic Latin
22 PLANE : : : : : : : : : : Basic Multilingual Plane (BMP)
We need proper utf values for all sinhala characters in unicode, SIR
Donald Gaminitillake
I set the standard
Mr Donald,
“Those are the input codes, 200D zwj
The end result comes not from unicode registered character sir.
It is inside your kits and additional plaster software”
Wrong, as usual. ZWJ is registered in Unicode to be shared among languages.
Do you use full stop and comma? Do you want to have a sepearte full stop for Sinhala? Having common characters like full stop and ZWJ is not a problem.
So how many times do you want me to remind my question about “ree” in PR-96?
I am not worried about the zwj
This is registerd in unicode as
01 CODE POINT VALUE: : : : : 200D
02 NAME (UNICODE NAME) : : : ZERO WIDTH JOINER
03 GENERAL CATEGORY: : : : : Other, Format
04 COMBINING CLASS : : : : : Spacing, split, enclosing, reordrant, and Tibetan subjoined
05 BIDIRECTIONAL CATEGORY: : Boundary Neutral
06 DECOMPOSITION MAPPING : : –
07 DECIMAL DIGIT VALUE : : : –
08 DIGIT VALUE : : : : : : : –
09 NUMERIC VALUE : : : : : : –
10 MIRRORED: : : : : : : : : No
11 UNICODE 1.0 NAME: : : : : ZERO WIDTH JOINER
12 ISO 10646 COMMENT FIELD : –
13 UPPERCASE MAPPING : : : : –
14 LOWERCASE MAPPING : : : : –
15 TITLECASE MAPPING : : : : –
16 DECIMAL VALUE : : : : : : 8205
17 UTF-8 HEX VALUE : : : : : 0xE2808D
18 UTF-16 HEX VALUE: : : : : 0x200D
19 UTF-32 HEX VALUE: : : : : 0x0000200D
20 XHTML : : : : : : : : : : ‍
21 BLOCK : : : : : : : : : : General Punctuation
22 PLANE : : : : : : : : : : Supplementary Ideographic Plane (SIP)
23 STROKE NUMBER : : : : : : –
24 RADICAL : : : : : : : : :
After this input code ZWJ — “Ksha” has to be represent from unicode registration.
So give me the UTF value of “KSHA” sir. If you do not have one say so.
Write to the public that “KSHA” is given by the ” Sinhala kit” not from unicode registration.
You are manipulating the few characters registered in unicode through another software “sinhala kit” and other handi plast (palastara) software made by ICTA and others.
“KSHA” does not have a value in unicode but it is hidden. When the palastara software does not respond it goes back to the unicode registered level and we all see garbage sinhala.
So sinhala registered in unicode consortium are incapable to represent our language.
We cannot use the sinhala in any of the commercial applications as Adobe CS3 master collection etc etc
All this is because one need to run other palastara software to represent Sinhala. The unicode registrations are not enough to represent our language.
Therefore we need to amend the SLSI1134 to protect our language sinhala
Donald Gaminitillake
I set the standard
Dear Anuradha
I have illustrated what you had written in No 48 (joint ksha)
http://www.rotarycolombocentral.org/web-data/Components/Private/ksha.html
I have given an explanation can you give yours!!!
Donald Gamnitillake
I set the standard
Mr Donald,
ZWJ is registered in Unicode. Code point 200D.
See my 52 I have given the details of ZWJ from unicode. In my illustration I too have given the zwj code point.
My question is after you input those sequences from where the computer image the “KSHA”
Is it from the unicode registered plane or it comes from “sinahla kit and it supported plaster software”
If unicode plane give its UTF numbers etc if not say it comes from the sinhala kit and plaster software.
Also say that without these additional plaster software the sinhala registered in unicode and in SLSI 1134 cannot be imaged correctly
Donald Gaminitillake
I st the standard
Regarding your question
“”Do you want to have a sepearte full stop for Sinhala?””
Yes we may need one for sinhala due to kernnig algos in sinhala.
Why not have an exclusive set of comma , full stop and other General Punctuation for sinhala?
Remember sinhala do have different ways to write the language
Kavi is written in a one format etc etc
2500 years of sinhala development has to be preserved. I have to give all options to the people.
Having “plaster software ” and Kits wont work. All the sinhala characters need Proper registration in SLSI and have proper UTF values
Donald Gaminitillake
I set the satnadard
Modern fonts don’t need all the glyphs to have individual code points.
Some shapes (glyphs) in may come from individual characters (e.g.: අ)
Some glyphs will come from sequences (e.g.: ක්ෂ).
For example, in LKLUG font, the first free Sinhala Unicode font which we developed, has ksha, but it is not given an individual code point. Rather, it is assigned to a sequence which is ka + hal kireema + ZWJ + sha.
Even if we assign a code point in a “private area” in a font (if it is necessary), it is not a problem. Here is why:
In software design principles, each module can have it’s own implementation details, but only the interface matters to the rest. The best example is a subroutine. Once the subroutine is written, the other parts of the program have to think of it as a blank box.
Similarly, as the standard defines the interface with input (ka + hal kireema + zwj + sha) and output (ksha shape), it doesn’t matter how the font does it, or even how an implementation does it.
Read any standard like POSIX, SVID, POSIX, or even ANSI/ISO C. They all define the interfaces, and not implementation details.
Anyway, it seems Mr Donald is following this principle:
“When you can’t win an argument, confuse”.
That’s why he want to write long answers instead of in point form, and also to repeat the same old statements in between to add to the confusion. 😉
So I think it is going to be a waste of time to argue like this. But I have thought of a better mechanism to present matters. Will get back in a few days time.
Quote
For example, in LKLUG font, the first free Sinhala Unicode font which we developed, has ksha, but it is not given an individual code point
Unqoue
By not giving a code point in the unicode registration It is a character not registered in the unicode. You can have all the sequences of inputs but the out put character need to be registered in unicode and have a UTF value.
Your point proves that all sinhala characters are not registered in unicode consortium and are hidden under a carpet of plaster software.
That is why always you need various type of fonts and additional soft ware to represent sinhala.
We got to exit from the typewriter technology. To save Sinhala language.
Also you are using the typewrite concept.
Andy Daniels has proved this to you.
I qoute again
http://www.unicode.org/reports/tr2.html
“There is a standard extant for Sinhala described in A Standard Code for
Information Interchange in Sinhalese by V.K. Samaranayake and S.T. Nandasara
(ISO-IEC JTC1/SCL/WG2 N 673, Oct. 1990). The coding proposed in it was found
to be an inadequate basis for a modern, computer-based interchange code,
though it is adequate to handle the capabilities of a Sinhala typewriter for
representing contemporary colloquial Sinhala. ”
Unquote
You say this old but when you read the allocations of sinhala only few shifts of locations had taken place. you and VKS group copied this and made the SLSI 1134 hurriedly to show the public that SLSI is same as unicode.
It should have been the other way the SLSI had to be first and then the content of SLSI to be registered in international arena.
I was the only person who made representation and told publicly that SLSI 1134 is incorrect and incomplete.
Now all the problems have come up with Sinhala.All because of you have not registered the SINHALA AKURU in ISO or Unicode or SLSI and given code points
Donald Gaminitillake
I set the standard
Also Anuradha you further confirm that you are talking of LKLUG font, and that too you have KSHA location only in first three versions.
I talk of a standard common to SINAHALA Language. For people to do development they need to know the proper absolute code points for all sinhala language.
unconditionally the present system have serious flaws to correct it all characters needs UTF values, register it in the SLSI 1134 by amending it as soon as possible.
I am not confusing any one. You are confusing the public by classifying sinhala used in computers as unicode sinhala where all sinhala characters are not represented in it except few typewriter based characters.
Donald Gaminitillake
I set the standard
Mr. Donald,
From all your arguments, a simple statement can be made.
“All letters that can be created by adding modifiers, should also be stored individually (with individual Unicode’s) in UNICODE.”
This is exactly what has been mentioned as 2nd point under Claims in the above main article.
We can clearly understand your law! But the problem here is, we do not see any PRACTICAL issues using the current Sinhala UNICODE as all Sinhala characters can be already represented by using the existing Unicode system.
As you keep saying there are problems, security issues, serious flaws and etc in Sinhala Unicode, you have not given any examples to prove what you say. Instead you either post a previous comment with some modifications or post a long comment with some UTF, Unicode and Hex values.
Therefore, I suggest you to;
1. Pick a question asked from you that hasn’t been answered (Guess you won’t find it difficult to find one) yet.
2. In point form, give us the steps to re-create the problem/ error that you’re getting.
2. Answer it in simple point form, but don’t forget to include your Operating system and version, Browser Type and Version, Other Applications and their versions that you use in your environment.
So, when you have free time can you do this? In this way we can easily re-create the error and agree with you happily.
There are knows problem in some applications with Sinhala Unicode. The fixes and how-to-avoids have been already given.
When there are ways to avoid problems, there’s no need to repeat the history!
Qoute
“All letters that can be created by adding modifiers, should also be stored individually (with individual Unicode’s) in UNICODE.”
Unqoute
This is wrong statement sir.
If the created letter is not stored in unicode registration (no UTF Value) it will not reflect correctly without additional software.
This is the issue I am addressing.
Quote from your reply
sha + hal kireema = sh
ra + diga ispilla = rii
which is similar to da + papilla = du (i.e., consonant + modifier = modified consonant).
Unquote
This is the exact input method but the final result has to be in unicode registration. NOT FROM “KIT” OR PLASTER SOFTWARE.
You all avoid these facts — that to represent sinhala correctly you need additional software other than what has been registered in Unicode Consortium
Admit this –yes or NO.
IF yes give me the utf value and plane at unicode for KSHA.
If no say so that it is not registered in unicode but you get it from additional software made by whom ??????
Donald Gaminitillake
I set the standard
By the way I use 4 operating systems
Windows Xp professional version 2002 service pack 2
iMac OSX 10.4
iMac on Window XP
e MAc on OSX 10.3
Donald
Quote
can be already represented by using the existing Unicode system.
Unqoute
Cannot be represented correctly without additional software.
Donald Gaminitillake
I set the standard
Mr. Donald,
First, you did not answer my last question of giving a practical problem and details of your computer environment where we can reproduce the error.
Instead you ask me a (rather modified old) question and names of 4 operating systems that (you say) you are using.
Answer this (if possible).
Why should we give you the UTF values?
This is not a joke and I expect serious answers.
Quote
Why should we give you the UTF values?
Unquote
I quote from unicode consortium itself
http://www.unicode.org
What is Unicode?
Unicode provides a unique number for every character,
no matter what the platform,
no matter what the program,
no matter what the language.
Fundamentally, computers just deal with numbers. They store letters and other characters by assigning a number for each one.
unquote
In sinhala one side of the equation is registered with unicode but the all the answers are not registered with unicode. Some are inside the KIT or Plaster software
When you take other languages
eg
latin script falls into several pages of unicode.
Key in method differs with the OS but the final ans is registered with unicode.
Therefore all characters represent correctly across any platform as per unicode
Because all have UTF values
quote from unicode
http://unicode.org/faq/utf_bom.html#14
Q: What is a UTF?
A: A Unicode transformation format (UTF) is an algorithmic mapping from every Unicode code point (except surrogate code points) to a unique byte sequence. The ISO/IEC 10646 standard uses the term “UCS transformation format” for UTF; the two terms are merely synonyms for the same concept.
. Which of the UTFs do I need to support?
A: UTF-8 is most common on the web. UTF-16 is used by Java and Windows. UTF-32 is used by various Unix systems. The conversions between all of them are algorithmically based, fast and lossless. This makes it easy to support data input or output in multiple formats, while using a particular UTF for internal storage or processing.
Q: Are there any byte sequences that are not generated by a UTF? How should I interpret them?
A: None of the UTFs can generate every arbitrary byte sequence. For example, in UTF-8 every byte of the form 110xxxxx2 must be followed with a byte of the form 10xxxxxx2. A sequence such as is illegal, and must never be generated. When faced with this illegal byte sequence while transforming or interpreting, a UTF-8 conformant process must treat the first byte 110xxxxx2 as an illegal termination error: for example, either signaling an error, filtering the byte out, or representing the byte with a marker such as FFFD (REPLACEMENT CHARACTER). In the latter two cases, it will continue processing at the second byte 0xxxxxxx2.
A conformant process must not interpret illegal or ill-formed byte sequences as characters, however, it may take error recovery actions. No conformant process may use irregular byte sequences to encode out-of-band information.
————-
I hope readers understood
sinhala errors
all these are the characters registered in unicode
But unable to read because the additional software is not working
To represent sinhala we need additional software other than the unicode registration
This is the problem I am addressing
Donald Gaminitillake
I set the standard
Quote
Why should we give you the UTF values?
Unquote
I quote from the unicode consortium
What is Unicode?
Unicode provides a unique number for every character,
no matter what the platform,
no matter what the program,
no matter what the language.
Fundamentally, computers just deal with numbers. They store letters and other characters by assigning a number for each one.
Unquote
Donald Gaminitillake
I Set the standard
Wow! You did exactly what I thought! Thanks for the brilliant answer. But I found a better one here.
“Unicode is an industry standard allowing computers to consistently represent and manipulate text expressed in most of the world’s writing systems.” – Wikipedia
[http://en.wikipedia.org/wiki/Unicode]
My question was not
Why should we use UTF values?
or
Why are we using Unicode?
I asked Why should we give YOU the UTF values? for what you didn’t answer.
Since you are not-so-up-to-standards to answer the questions, let me to make it more simple where you can select from MCQ (Multiple Choice Questions). Easy as piece of cake!
Q: Why should we give you the UTF values?
A 1. IN MY VIEW, every character in Sinhala should have a Unique UTF. No matter what others say. I SET THIS STANDARD.
A 2. Existing Sinhala Unicode does not work well and I don’t want to say how.
A 3 . All of the above.
A 4. None of the above and I want my own system to be implemented as soon as possible.
Quote
Why should we give YOU the UTF values?
Unquote
Because you say the sinhala that you are using is Sinhala unicode and it is registered in unicode.
If you say the sinhala that you are using is “ICTA SInhala” then you need not give any codepoint. That becomes a monopoky. I do not ask for the code points for Helawadana or Thibus.
Since you are talking of a public domain unicode.
Unconditainally you have to give the code point for ksha in unicode
Else you can say it is in side the kit and not in the unicode.
multiple ans is 1 and 2.
Other option is
my system or you can have your system so that we can use it across all platforms like latin script or korean or Japanese etc etc
interesting article for you to read (in sinhala)
http://www.lankaenews.com/Sinhala/news.php?id=2526
Donald Gaminitillake
I set the standard
Dear Anuradha and Harshadewa
Can you just copy few words from the lanka e news sinhala and paste it here for us to see.
Donald Gaminitillake
I set the standard
Donald,
I guess that you didn’t read the unicode character list, I gave you the link earlier, characters are registered in unicode under the code point!! the utf-8 is just an encoding , there are many utf encodings,
so, this time, can you “think” and answer the question “Why should we give you the utf-8 values?” and don’t forget to use your brain…
sinhala characters are (all of them) registered in unicode.. that’s why they have individual code points!! can’t you understand something simple as that one!!!
Quoting from the wikipedia article about utf-8, (A good guess is that you didn’t read this article when i sent you the link.. or that you merely ignored its contents and pretended that you didn’t read it),
UTF-8 (8-bit UCS/Unicode Transformation Format) is a variable-length character encoding for Unicode.
unquote
A user of “good” standards
අතැරලාදාන්න.
quote
so, this time, can you “think” and answer the question “Why should we give you the utf-8 values?”
unqoute
“Ksha: is not registered in unicode
If you cannot give UTF 8 give 16 or 32
Else give the location in unicode plane.
For Sinhala letter ayanna following are the utf values
01 CODE POINT VALUE: : : : : 0D85
02 NAME (UNICODE NAME) : : : SINHALA LETTER AYANNA
03 GENERAL CATEGORY: : : : : Letter, Other
17 UTF-8 HEX VALUE : : : : : 0xE0B685
18 UTF-16 HEX VALUE: : : : : 0×0D85
19 UTF-32 HEX VALUE: : : : : 0×00000D85
20 XHTML : : : : : : : : : : අ
21 BLOCK : : : : : : : : : : Sinhala
22 PLANE : : : : : : : : : : Basic Multilingual Plane (BMP)
Why cant you give these values to “KSHA”
Then we can check it in the unicode registry
Donald Gaminitillake
I set the standard
Question:
“Why cant you give these values to “KSHA”
Then we can check it in the unicode registry”
Answer:
(ක් + ෂ) = ක්ෂ
(k + sha) = ක්ෂ (UNICODE: 0D9A 0DCA 200D 0DC2)
I’m surprised for the [N]th time, that you cannot understand the similarities of the implementation between Sinhala language and Sinhala Unicode; and also we have to continue copying the same answer for the same question asked in various ways.
Sorry sir those codes are NOT KSHA but
http://www.rotarycolombocentral.org/web-data/Components/Private/ksha.html
You can see it yourself – just the input sequence only —
Donald Gaminitillake
1. You say that ක් + ෂ is NOT ක්ෂ. This means that either you don’t know Sinhala or you don’t accept Sinhala. Either way, you have to first learn Sinhala and then reply.
2. You agree that there are original Unicode characters like ක් and ෂ. If ක් and ෂ are acceptable why cant ක්ෂ which is a mixture of these two characters be unacceptable. It is your LAW that ක්ෂ should also be there in Unicode.
Take the good old calculator example and think twice. You don’t store 10 separately when you can use 1 and 0 to get 10.
ක්ෂ is not a character came from Mars, but just a mixture of ක් and ෂ.
You say that ක්ෂ displays awkwardly sometimes. Well, give an example as I requested many times before. You don’t do it honestly with real examples, because you know that we can prove it works fine.
By the way, data duplication is not a very good thing!
I am talking of KSHA (kayanna badhi shayanna) joint one
Kayanna -0D9A SINHALA LETTER ALPAPRAANA KAYANNA
Shayanna -0DC2 SINHALA LETTER MUURDHAJA SAYANNA
0DCA is SINHALA SIGN AL-LAKUNA
all these are input sequence that is it.
If you talk of kayanna and shayanna I have no problem Both these two have UTF values and a proper location in unicode.
not the alkayanna or joint “ksha”
because there are no location for alkayanna in unicode IT COMES FROM THE KIT OR THE PLASTER SOFTWARE same as joint “ksha”
Donald Gaminitillake
I set the standard
Keep the sinhala font and uninstall the sinhala kit and other plasters software
you will see ක kayanna and SINHALA SIGN AL-LAKUNA separately (this is what is in the unicode consortium)
For it to join and see correctly as ක්ෂ you need the Sinhala kit and plaster software.
ක්ෂ if you copy this and delete one by one using backspace it will get deleted in four moves sha, space,al lakuna and then the ka. This also prove that it is in parts. not as one ක් alka. and ක් too have no utf value in unicode.
Donald Gaminitillake
I set the standard
quote
sinhala characters are (all of them) registered in unicode.. that’s why they have individual code points!! can’t you understand something simple as that one!!!
unqoute
Ranjith you know nothing!!!!
only few are registered in unicode.This is the issue I am addressing
Donald Gaminitillake
I set the standard
Are you trying a 1 : 3 formula (intentionally) in replying to comments? 😐
You’ve said the exact same thing that I’ve explained (k + sha = ක්ෂ [UNICODE:
0D9A 0DCA 200D 0DC2]) in a much longer way.
What I believe is that ක්ෂ is as same as [ක් + ෂ].
So, I don’t see any problem in the way how Sinhala Unicode is implemented this since Sinhala language itself represents the same thing.
So far, I haven’t come across a single problem that you are trying to show us saying that Sinhala Unicode has problems.
As I have been continuously requesting, we would like to see an example with all the correct details. Then we can re-create the same thing and agree with you happily ever after.
Won’t it be a good idea, without going around the problem always?
ක් is not in unicode same as ‘KSHA”
Input sequence is not what I am talking
You have all the input sequence but the problem is the out put of characters
“KSHA” joint is not in uniicode registration, ක් is not in unicode registration
you see these only through the Sinhala kit or plaster software
You always avoid the sinhala kit and the plaster software issue.
Without these additional software sinhala cannot be represent correctly
Therefore the unicode sinhala in incomplete and incorrect solution
This is only a typewriter concept.
If you do not talk about the kit and plaster software you are just looping
Donald Gaminitillake
I set the standard
he he!
What if your so called plaster software is no more in new operating systems?
You mean we need not have to down load the sinhala kit and the plaster software to read sinhala?
Harshadewa you are contradicting what you wrote in No 3
quote
By any chance if you are talking about developing a Sinhala programming language, it can be suggested that only to use limited number of characters stored in Unicode rather than going for one’s with modifiers. Am I right?
Unquote
So I am 100% correct unicode do have only limited number of sinhala registrations
Donald Gaminitillake
I set the standard
Enjoy seeing MTV news in the year 2003
http://www.flickr.com/photos/8503406@N05/2489614348/
you need a very good adsl connection
Donald Gaminitillake
I set the standard
I challenge you to answer my question first and I bet you won’t! 😉
I asked “What if your so called plaster software is no more in new operating systems?”
Anyway, I suggest you to visit http://groups.google.com/group/Sinhala-Unicode?hl=si and see how to use Sinhala Unicode (In case you hate Sinhala Kit like hell, you can do without it as well). I guess you won’t try to do that as you have Sinhala-Unicode-Fobia. 😀
No Sir, you can neither confuse me nor the readers here by showing quoted text from above. The quoted text was a suggestion on mine in what I can explain why I suggested. It’s shameless that you take such attempts to prove right what is already proved wrong!
The problem here is, you’re trying to cover from all the questions we ask by asking us to admit on a childish fact.
To finish this off, I’d like to say that;
– You have failed to prove that Sinhala Unicode is incomplete.
– You have failed to prove that Sinhala Unicode has practical problems.
– You have failed to answer at least one major question that is asked here.
– You have failed to provide examples with detailed steps on how to re-create the alleged errors.
– You have failed to continue or support a decent debate here. What you did was somewhat similar to Google Bombing and I personally do not like to continue on this further.
Any problem can be solved by discussing. My believe is that when one has no facts to prove, he tries to keep blaming and repeats same thing thinking that it’ll work!
But No! Not this time Mr. Donald. Maybe you don’t have enough luck this time or may be people are more intelligent than you think.
I kindly ask you not to stuff the comment section furthermore. But you are free to give us the examples with steps and relevant information that we have asked repeatedly.
A Big thank for all of the people who contributed their precious time and efforts here!
Harshadewa Ariyasinghe
You too got bowled out sir. including anuradha
Let the public decide who had won
“What if your so called plaster software is no more in new operating systems?”
This is not a issue at this moment. Today we cannot reproduce all sinhala characters without the sinhala kit and the plaster software. We have no e dic e ncylopedia no e grammar
all because we do not have all sinhala characters with proper utf values (except for few)
Sri Lanka had spent over 50 million US$ on world bank funds to develop this typewriter concept and naturally the stake holders will have to defend it.
I have coprights over the list of sinhala characters 50 years after my death sir
Donald Gaminitillake
I set the standard
Donald,
As I think, you are someone who is craving for money and/or respect over the sinhala alphabet. But first you have to understand that, those who keep wanting them never gets them.
No one can copyright the sinhala alphabet, it is something that was developed for more than thousand years. If you think that you have the copyright, you are welcomed to take them with you to the grave.. it will remain with you , and we won’t miss it, … sir.
Since you keep posting without answering;
You know that the Sinhala Unicode works with ZERO issues in Linux and Windows Operating Systems.
Moreover, the Sinhala Kit software (that you scream as plaster software) is only needed in current Windows XP operating systems that will be outdated very soon.
Sinhala Unicode comes (in other words, factory fitted) with Windows Vista OS (and in all future Windows OS’s too) and most of the current Linux OS’s.
Next time,
1. Get yourself a licensed copy of Windows Vista, or Linux OS such as Fedora.
2. See whether Sinhala Unicode works there.
3. If not please report.
At least, take this as a lesson and try not to comment on an outdated, already answered, fabricated issues.
Mark my words,
“within 3-5 years time, Sinhala Unicode will be all over the Internet, Public / Private organizations, Operating Systems, Mobiles Devices and in any place you name.”
Ranjith I have done the sinhala characters after Attaragedara rajaguru Bandara (Wadan kavi potha) Dr Senarath Paranavithana and the group did the evolution of sinhala (Parinamaya) NOT the Current Sinhala Akuru.I am the only person who has done and published under ISBN the rules of the country. Therefore I have the copyrights of all individual sinhala akuru relating to the computer.(industrial and commercial acceptability and usage) ((Ranjith can write love letters but he cannot sell these love letter collection and make money – that is wh))This was offered to the SLSI and they refused to accept it. If they had taken it up I would not have these rights.
Harshadewa is boasing about Vista and linux but even on linux you cannot cut and paste a simple sinhala text to Vista or windows XP. Text compatibility will never be there unless we have UTF values for all sinhala characters.
All my comments will be archived in internet so that when some one does research they will know there was one person who had tried to save the SINHALA AKURU.
Donald Gaminitillake
I set the standard
Donald Gaminitillake
It’s not good to do Donald-Bombing about fabricated issues!
And to then say that you tried to save Sinhala Akuru!
Funny! 😀
Donald-Bomb: Intentional commenting in 1:3 ratio with indirect information to fool the public, thinking that only Quantity matters in debates rather than the Quality
“but even on linux you cannot cut and paste a simple sinhala text to Vista or windows XP.”
Examples, Steps and SW Versions please… 😛
Come on a public arena and show how Sinhala SLSI 1134 works with Linux, Windows Apple and unix.
Prove the Sinhala TEXT compatibility across all platforms and on Microsoft , Adobe Applications, Helawadena, Thibus and some other Sinhala only applications
Including the word “Rajapaksha” and many other that I would like to ask
You guys would never have the guts to show this in public and in front of Hon President.
Donald Gaminitillake
I Set the Standard
Very interesting comment on fonts in parliament web site
For every borwser you need this pack to see sinhala unicode. This contradicts the terms of unicode consortium
Also this prove my comments that the unicode sinhala cannot be represented correctly without additional plaster software.
You lose again Harshadewa and Anuradha
http://www.fonts.lk/download/SinhalaIE.html
Quote
Sinhala for Internet Explorer 6
This pack enables Sinhala in Microsoft Internet Explorer 6.
It does not work with older versions of Internet Explorer, or with other browsers.
Download the software by selecting the link below, and then run the file with Administrator privileges.
If a page does not display properly in IE6, select:
View -> Encoding -> Unicode (UTF-8)
You can select your default font by going to:
Tools -> Internet Options -> Fonts
selecting “Language Script” as Sinhala, and then clicking on the font you want to use as default.
Once you have installed this pack, you may download additional Unicode Sinhala fonts which will work with any Unicode Sinhala web page.
Download Sinhala for IE 6
——————————————————————————–
If you do not have Internet Explorer 6, you can download it from Microsoft.
ICTA Language Group – 041113
Ohh.. we got scared.. Don’t say that again. We are fear and scared of public places.. we also have Sociophobia.
But I must say that we do not have Agoraphobia, Neophobia, Technophobia, Ephebiphobia or Autophobia. 😉
(People, I think you deserve a search on Wiki on all these fobia’s)
Internet is the Ultimate Public Place in the known universe!
You can’t find a better public place than this.
“For every
borwserbrowser you need this pack to see sinhala unicode?”Oh.. It’s such a bad thing isn’t it? and it says IE6 (specifically) as well. What a waste! We should stop using Sinhala Unicode now… I just installed IE 6 after uninstalling all my latest browsers.
Happy? 😀 😀 😀
You never answered the question of TEXT compatibility across platforms
Since it is not happening I have proved Sinhala unicode is an incompatible and incomplete solution
Donald Gaminitillake
I set the standard
Mr. Donald,
Steps to follow,
1. Read from top and write down all the unanswered questions in a piece of paper.
2. Also write down all the Answers that we have provided in the other side of the paper.
3. Read both sides of the paper and try to understand the questions and answers by repeating several times.
4. If you’re done, start writing answers to the questions left.
You think that newcomers who read this would only read the last comment that’s posted. So by putting something interesting as the last comment here, do you think that it will help your false propaganda?
If you think that’d help your false propaganda and continue in this way, I would restrict future comments of you since I’m concern about the good conduct of using comments section of this blog.
You are just going round the loop
When there is no SINHALA TEXT compatibility in all operating systems what are you talking.
Your solution is incomplete and incorrect solution including the LINUX SInhala
I always challenge you to come forward for public demonstration. You avoid even talking to me.
Donald Gaminitillake
I set the standard
This is the mail I got from you on May 12, 2008 7:27 PM
Dear Sir
If you have free time pls call me
0777-xxx-xxx
Since you are an IT guy and say a free thinker — I think we can talk
Donald
This is the reply I sent you on May 12, 2008 7:49 PM
Dear Sir,
I think this issue of Sinhala Unicode should be spoken openly.
If I agree with you after calling you, there won’t be any effect on this problem because you can’t repeat it with all the people (hundreds of thousands of people ?) who accepts Sinhala Unicode.
I neither have any personal interest in so called Unicode problem nor I have any problem with you.
I believe what I see as truth and only sound arguments and facts backed by practical examples can change my opinion. Furthermore I think that kind of explanations needed to break Unicode can be expressed over web more flexibly.
Therefore, thanks for the interest shown in this subject, but No! I do not like calling you in this regard.
Many thanks,
Harshadewa
I don’t want to continue this discussion with Mr Donald.
Why?
He avoids answering questions asked in point form.
He also avoids the question: “isn’t Sinhala hodiya also incomplete?” 😉
He talks very vague. Here is an example:
“Best example is Sinhala text is not compatible across all platforms. No UTF values”
By “Sinhala text”, he probably means “Unicode sequences representing Sinhala text”.
The comment itself is a lie (if it is true, how can I read email sent by my Windoze friends), and of course, it is too vague.
And Mr Donald doesn’t seem know the difference between Unicode codepages and encoding schemes such as UTF-8.
Mr Donald says “No UTF values”. What are “UTF values”? There isn’t anything called “UTF values” in this context. You are probably referring to UTF-8 encoding scheme, which has got nothing to do with “values”.
See, you are confused. If you want to fight, know your enemy first. If Sinhala Unicode is your enemy, first take some time to study your enemy. 😉
Now I can predict what Mr Donald is going to do. He will send another shower of comments, so a newcomer may find it hard to find logical arguments amongst the mess.
If you can’t win an argument, confuse. A “draw” by “confusion” is better than loosing… 😉 I know it’s Mr Donald’s theory. And writing here is only going to help him achieve such malicious ends.
So, please await a better “reply”!
Mr. Donald,
Thanks for the interest shown in this regard!
You will not be able to post vague comments on this Blog anymore. All comments from you will be waiting in the moderation queue.
Moreover, I’ll be filtering out any comment that is missed by WordPress filters, if any.
By any chance, if you post comment(s), that answer questions asked from you repeatedly, I’ll happily accept them to display here.
I strongly think that almost 100 comments are more than enough for one who has even a little bit of upstairs, to understand your problem on Sinhala Unicode.
I do not agree with Donald that we should give up Unicode and adopt his CAT, but simultaneously I do not think Unicode at its present state is complete. We talk about a standard here, not a fonts set. There are people who do not know the difference.
I have been a part of the ‘Sinhala Unicode’ debate long back, and thought I would keep out of it because both parties are too adamant to admit their own mistakes. At least if one party listens we can think of moving forward.
It is pity that even seemingly reasonable people like Anuradha ask compares Sinhala hodiya with Unicode chart. There is no one to one correlation between Sinhala hodiya and Unicode chart. Assuming both to be same is the fundamental mistake done by Prof. J. B. Disanayake and pathetic that we continue that mistake.
Also what Anuradha refers as ‘hodiya’ is not the correct and complete Sinhala hodiya, but a chart used to teach basic Sinhala. Not all Sinhala letters appear in hodiya.
Finally, I think moderating Donald is not fair, specially he represents one party in discussion. Why shut others mouths if you are so sure what you say is correct?
Dharma Gamage,
“I think moderating Donald is not fair, specially he represents one party in discussion.”
The decision and reasons lead to moderate Mr. Donald’s comments, can be found in upper comments sections.
Furthermore, I believe it is undoubtedly unfair to block someone who does sound arguments with his/her real intention to uplift the standards.
But, It’s useless and apparently bulkier to read/ store and manage the same comments/ ideas in different words. If you read at least 50% of comments made by Mr. Donald, you might be able to understand this simple fact.
Guys,
Let me tell you one thing. I don’t know Donald personally, but I have enough encounters with him on the web. He is adamant that he is right and whatever others say he will do what he likes. He has no sense where to draw the line.
So knowing that either you guys just ignore him (best policy) or if you want to interact give him a fair chance to express himself. Otherwise you will never know what he will do.
I see now he is doing and interesting discussion at http://bandaragama.wordpress.com/2008/05/08/is-there-anything-wrong-with-sinhala-unicode/#comment-342
At least the discussion here was more decent.
Dear Dharma,
The analogy is the correlation between basic characters and modifiers. Basic characters and modifiers are combined to get more characters. Even in Hodiya, “ku” is IS the combination of “ka” and “ku”. (Reference among many: සිංහල හෝඩිය, පැරණි අකුරු කරවන පොත් පෙළ by බළන්ගොඩ ආනන්දමෛත්රෙය තෙර).
Anuradha
Anuradha,
I did NOT say Unicode is wrong. I accept the Unicode approach. (think Donald is not the only one who doesn’t) All what I said is Sinhala Unicode chart is still *incomplete*. For current Unicode chart lacks widely used characters like yansaya and repaya, while have kept space for never used ‘ilu’ (0D8F) and ‘iluu’ (0D90). Sannaka ‘ja’ (0DA6) another unused character.
This is purely because J. B. Disanayake has done the stupid mistake by building the Sinhala Unicode based on hodiya. (There are no ilu, illu is any of the other South Asian language charts)
In addition to yansaya and repaya, we call include some of he widely used joint letters too, because there is enough space. That will save so many other issues.
Unfortunately when somebody suggest even a minor change in Unicode you people become so defensive and start shouting mad. This is the biggest obstacle to the Unicode.
We saw once somebody called Anandawardhana went to the length of suggesting to remove yansaya and repaya from Sinhala language instead doing a simple change.
Now I know your response. You will try to defend JB. That is your problem. None of you ever have the guts to admit a mistake and correct it. You are dead sacred of change. So you continue with mistakes.
I see an edited version of the last post I made here has been cut and paste at http://bandaragama.wordpress.com/2008/05/08/is-there-anything-wrong-with-sinhala-unicode/#comment-375
I do not know who did it, but whoever did it that is an unethical thing to do.
I am here not to personally attack anyone.
Dharma,
[In addition to yansaya and repaya, we call include some of he widely used joint letters too, because there is enough space. That will save so many other issues.]
Can you give an example of an issue?
Anuradha,
Take Piruvana poth vahanse, select any page and reproduce the same (as they appear in the book) here and I will show you the issues.
Thanks Dharma. In fact, we recently started converting old texts into Unicode [only in our free time, so progress is not going to be VERY fast], so it will be a good exercise to figure out issues from a developers as well as a users point of view.
Please have a look at our first attempt here:
http://ar-si.blogspot.com/2008/05/blog-post_29.html
I found some problems with our font with this exercise. Should be able to fix in the next release.
I like to have this with the old “ddha” in “Namo buddhaya”. Will see how it goes.
Anuradha,
Ok, here are some issues from ‘Magul Lakuna’
10. ශ්වෙතඡත්ර දෙකය (NOT Shevatha should be Shvetha. Sha and Va should be jointed)
19. රක්තෝත්පල දෙකය (should be Rakthothpala. Ka and Tha should be jointed)
21. ශ්වෙතෝත්පල දෙකය
34. දක්ෂිණාවෘත්ත ශ්වෙතශංඛ දෙකය
57. චතුර්මුඛ සවර්ණ නෞකා දෙකය (Should be Svarna, NOt Savarna)
Go on typing. When you reach ‘Sakas Kata’ I will show the other issues.
Dear Dharma,
Thanks for having a look at mangul lakuna.
57 was a typo. Fixed it. Thanks.
If you analyze the rest with a suitable tool, you will notice that there are joiners between the joint characters. Unfortunately, not all systems/tools have suitable fonts to support them.
Even with our LKLUG font, ක්ෂ in 34 is shown properly as a bandi akura, but not ක්ත or ත්ප.
Let me show a good analogy. The CSS standard tells how a browser should display web pages. But not all browsers support all of the CSS standard properly.
There are some good tests to test the browsers for standard. These “acid tests” show a reference image, and how a browser should display it with CSS. There are in fact three acid tests now, in increasing degree of complexity.
http://www.acidtests.org/
Old browsers fail even the first test, but the new ones seem to perform better. To get old browsers working with CSS, we need various tricks.
What we need for Sinhala Unicode are some similar tests for fonts/systems. I like to see these levels (we can refine them later):
Level 0: basic vowels and consonants: අ, ක etc
Level 1: consonants modified with a simple single modifiers: කැ, ක් etc
Level 2: consonants modified with a “kombuwa” modifier: කෙ, කෞ etc
Level 3: Mandatory joint letters and modified forms: ක්ර, ක්ය, ක්රි, ක්රෝ
Level 4: Used joint letters and forms: ක්ෂ, න්ද, ර්ම, ක්ෂෝ, ශ්වෙ
Level 5: All possible joint letters and forms
What we need is a set of web pages with two columns. First column is the Unicode test, second column is an image that shows how it should be rendered.
If an operating system / font can match all characters in the first column with the second column upto a particular level, we should say it’s “Level x” compliant.
A font/system with Level 4 compliance is what we need.
We have to refine Level 4 whenever we find a new joint form by moving it from Level 5.
This discussion is now getting very productive. Thanks!
Anuradha
Why make things complicated when simple straightforward solutions are available?
Number of practically used ‘joint letters’ in Sinhala/Pali/Sanskrit are less than 20. (Believe me, I have counted) There are 49 empty spaces in Unicode chart. Why not add them? (As Indians have done)
Tell me, why are you people so reluctant to change Unicode chart? I do not think any standard should be static. It has to be dynamic. Period revisions are essential in any standard.
Even the constitution is amended when needed. Why not Unicode chart?
Is it because every one of you are dead scared with the ‘reputation’ of the two ‘luminaries’ JBD and VKS?
As someone said do you think the avatar of VKS will squeeze your neck at night if you change Sinhala Unicode chart? 🙂
Dharma,
Buddha decided to change to middle-path only after trying out all the other systems existed in India at the time. I want to try to use the present system as is as a user and a developer before proposing amendments.
As you may have already noticed, we have already started entering old text to check problems.
I did come across problems, the ones you correctly pointed out.
Although the text was entered according to SLS1134/Unicode, fonts and systems were incapable of displaying all the characters as expected.
As a means of addressing the issue, I have created an “ACID Test” equivalent for Sinhala Unicode.
http://www.sayura.net/anuradha/sinhala/unicode/test/
It would be great if you can also contribute by telling me any missing joint letters when I get to class 4.
Right now, I have done only class 1 and 2. When we get to class 3 and 4, we will get to the fine points of joint letters. If I can’t get my system to pass all four classes, THEN I will start a campaign to amend the standard.
Anuradha
This is a tip to Mr Donald, you could use Image Capture to take an screenshot of your Parallels screen running on a Mac.
http://www.apple.com/pro/tips/secretcapture.html
i’m student. i want to write sinhala (letters) web page.How to write tags or other system.
Please Reply
Dear All,
Now you can Free download all Sinha Fonts & Sinhala Fonts Related Softwares free of charge. This site very useful for Sri Lankans living around world.
Visit –
http://www.videshasewa.com/home/free_sinhala_fonts.html
or
http://www.videshasewa.com
Thanks..
you have an incredible weblog here! would you prefer to make some invite posts on my weblog?
I have installed Sinhala so many time in my widows 7 but no luck. I can perfectly type Sinhala Unicode characters in Microsoft Word. The only problem is I cannot see Sinhala Unicode Characters in any of the browsers like FACEBOOK and other Sinhla online New Papers & Blogs too.
\what may be the reason?
email: sensaconcept@yahoo.com
Web page ekakin unicode power point wlata copy karama akuru wena wenama penne eka hdaganne khmda alpili papili ehe mehe ghn tyennne
If I copy the “ශ්රී ලංකාව” text and paste it to Microsoft Word (I have Microsoft® Word MSO (16.0.7571.7063) 32 bit on Windows 10 Home), it won’t look right. The zero width joiner has vanished if I copy that back to this comment: ශ්රී ලංකාව. This is a disgusting feature of Microsoft Word. In Excel the ligatures work fine.
The only way I’ve found to get a ligature like ශ්රී to Word is by making a plain text file and opening it with Microsoft Word, and then copying and pasting that. Resizing the font of the document will be impossible using the buttons that normally make the font bigger or smaller. That will only affect parts of the document that have never had the zero width joiner, not even all of that. I have to resize by defining the new font size as a number.
If I copy a ligature to an open plain text file in Microsoft Word, save it and reopen the file with Word, the ligature will look correct even though before saving and reopening the file the ligature won’t be shown.
هيئة المهندسين التجمعيين – corps des ingenieurs du parti du RNI
This is my expert
I have a problem in typing bandi akuru when I type Pali text. as you all know ‘hal lakuna’ is not used when writing pali in sinhala.
සත්ථා is written avoiding ‘hal lakuna’ and making last two letters ‘bandi akuru’
I am not talking about complex characters like සත්ථා .
Can’t we include ‘dakaranshaya’ in sinhala Unicode. in sinhala බෞද්ධ was written avoiding ‘ද්’ and adding dakaranshaya to ධ.