Bryan in Isaan Posted September 20, 2005 Share Posted September 20, 2005 3. Learn how to use the Thai dictionary...alphabetical order in Thai dictionaries is a bit more difficult than how its done in English.... <{POST_SNAPBACK}> Just a little snippet from another thread... I have been making myself Thai wordlists for self study in Excel. When I sort them it puts all the แ เ ไ and โ at the head of the list, rather than sorting according to the consonants as in a dictionary. The alphabetical sort is just not smart enough (or maybe its the operator). Has anyone else had this problem and solved it? I have an early version of Office XP on WinMe. Thanks, Bryan Link to comment Share on other sites More sharing options...
Mr. Farang Posted September 20, 2005 Share Posted September 20, 2005 (edited) 3. Learn how to use the Thai dictionary...alphabetical order in Thai dictionaries is a bit more difficult than how its done in English.... <{POST_SNAPBACK}> Just a little snippet from another thread... I have been making myself Thai wordlists for self study in Excel. When I sort them it puts all the แ เ ไ and โ at the head of the list, rather than sorting according to the consonants as in a dictionary. The alphabetical sort is just not smart enough (or maybe its the operator). Has anyone else had this problem and solved it? I have an early version of Office XP on WinMe. Thanks, Bryan <{POST_SNAPBACK}> Dear Bryan, I am not sure, but I am guessing that the order is based on the unicode numbers and I venture to guess, based on your (interesting) post, that the Thai vowels are sorted (in MS EXCEL) in unicode as they are written, with the consonant first and the vowels second. One techincal problem is that the sort routine should be, to the effect, check the first character, if it is a vowel, check the next character, and sort on the first consonant, if if the consonant is หอ หีบ, then ..... if and then... and then ..... Good post, BTW. Technically interesting. I also would be interested to review your idea of a good sort algorithm. Would you like to virtually develop one (or two) sort algorithms here, in collaboration, to (perhaps) submit to Microsoft? Yours sincerely, Mr. Farang Edited September 20, 2005 by Mr. Farang Link to comment Share on other sites More sharing options...
Richard W Posted September 20, 2005 Share Posted September 20, 2005 3. Learn how to use the Thai dictionary...alphabetical order in Thai dictionaries is a bit more difficult than how its done in English.... <{POST_SNAPBACK}> Actually, Thai alphabetic order is easier to remember than English alphabetic alphabet. The only complications are (1) remember to swap the preposed vowels with the following consonant and (2) compare the tone and similar marks only if you get a tie on the consonants and basic vowels - left to right, not right to left as in French. Just a little snippet from another thread...I have been making myself Thai wordlists for self study in Excel. When I sort them it puts all the แ เ ไ and โ at the head of the list, rather than sorting according to the consonants as in a dictionary. The alphabetical sort is just not smart enough (or maybe its the operator). Has anyone else had this problem and solved it? I have an early version of Office XP on WinMe. Thanks, Bryan <{POST_SNAPBACK}> Dear Bryan, I am not sure, but I am guessing that the order is based on the unicode numbers and I venture to guess, based on your (interesting) post, that the Thai vowels are sorted (in MS EXCEL) in unicode as they are written, with the consonant first and the vowels second. Bizarrely enough, that does not explain it. The consonants precede the vowels in numerical code order, both in TIS-620 and in Unicode, which for Thai is just a shift of the TIS-620 encoding. Does Excel know the data is Thai, or does it just appear as Thai because of the font you have selected? Are the consonants sorted properly? It's conceivable that Excel thinks you have Latin-1 data. One techincal problem is that the sort routine should be, to the effect, check the first character, if it is a vowel, check the next character, and sort on the first consonant, if if the consonant is หอ หีบ, then ..... if and then... and then ..... Good post, BTW. Technically interesting. I also would be interested to review your idea of a good sort algorithm. Would you like to virtually develop one (or two) sort algorithms here, in collaboration, to (perhaps) submit to Microsoft? Yours sincerely, Mr. Farang <{POST_SNAPBACK}> The Unicode Consortium has already implicitly proposed a sort algorithm - the Unicode Collation Algorithm. Microsoft's response has been along the lines of 'Not invented here'. Link to comment Share on other sites More sharing options...
Mr. Farang Posted September 20, 2005 Share Posted September 20, 2005 Does Excel know the data is Thai, or does it just appear as Thai because of the font you have selected? Are the consonants sorted properly? It's conceivable that Excel thinks you have Latin-1 data. Dear Richard, Those are good questions (above). Dear Bryan, Can you post an example of how Excel sorts Thai now (screen shot maybe)? Thanks. The Unicode Consortium has already implicitly proposed a sort algorithm - the Unicode Collation Algorithm. Microsoft's response has been along the lines of 'Not invented here'. <{POST_SNAPBACK}> Dear Richard, I searched the Unicode link above and could not find any reference to an algorithm to sort Thai (also did not find the word "Thai" in the document). The algorithm I was thinking about would be more specific to Thai, but implemented based on unicode. Unicode would (or could) be the basis, but the actual sort algorithm would be based on Thai grammer. In fact, there might be a different ways to sort, based on different "views" of the grammer. Another area to explore, but I have not thought about it, are user macros v. Excel built-in sorts. Yours sincerely, Mr. Farang Link to comment Share on other sites More sharing options...
Bryan in Isaan Posted September 21, 2005 Author Share Posted September 21, 2005 Good, well informed answers. I will pore over them for awhile. It does appear that the sort routine sorts on the first character, then the second and so on; putting the vowels first for some reason.. This results in several errors in the following list. I have been fixing these manually, but if it is a large list, say 1000 words, that could be time consuming. เครื่องหมาย เวลา แท้ ใจความ กริยานุเคราะห์ กำหนด ขยาย ข้าราชการ ครบบริบูรณ์ ดังนี้ ตัวอย่าง ทั่วไป ประโยค ประกอบ ประธาน ระเบียบ ระดับ ราชาศัพท์ วิกรรตถกริยา สมบูรณ์ สมุหนาม สุภาษิต I will look into this, and would be happy to share anything I come up with or work with someone, though I am not any kind of computer programmer. Maybe there would be some kind of macro that could be written. Thank you all, Bryan Link to comment Share on other sites More sharing options...
Richard W Posted September 21, 2005 Share Posted September 21, 2005 The Unicode Consortium has already implicitly proposed a sort algorithm - the Unicode Collation Algorithm. Microsoft's response has been along the lines of 'Not invented here'. <{POST_SNAPBACK}> I searched the Unicode link above and could not find any reference to an algorithm to sort Thai (also did not find the word "Thai" in the document). Section 3.1.3 Line 1 Word 12. The Default Unicode Collation Element Table (DUCET) (1.1 Mbyte) contains the data (numerical 'weights') defining how Thai is sorted, both in comparison with Thai and in comparison with other scripts. The algorithm I was thinking about would be more specific to Thai, but implemented based on unicode. Unicode would (or could) be the basis, but the actual sort algorithm would be based on Thai grammer. In fact, there might be a different ways to sort, based on different "views" of the grammer. <{POST_SNAPBACK}> As far as words are concerned, the default Unicode collation order for non-compound words seems to accord with Thai dictionaries unless the words have the same spelling. Dictionaries list compound words under one of their elements, so the oder naturally differs, as with English dictionaries. The Unicode collation order is helpless when words are spelt the same but pronounced differently, and indeed dictionaries differ on the ordering of แหน and แหน. The Unicode collation order is supposed to produce a reasonable global sort order for use in all languages and to sort text in mixtures of scripts. Imagine a directory listing with file names in a mixture of scripts! One may in principle customise the sort according to one's language (or even idiosyncratic preferences), but there is no reason to sort words in the Thai script in anything but the order appropriate for the Thai language. Other languages in the Thai script (even Pali) are not important enough to override the Thai rules. The Thai deviations from this global order are documented in file th.xml in the core localisation data (1.0 Mbytes) from Unicode - unfortunately I can't find the file loose. The only deviation for real words that I can see is in the placement of yamakkan. I may be mistaken - what difference can it make to sorting whether ฤๅ is treated as one letter or two? I deliberately made no claims about the sorting of Thai punctuation and symbols - global consistency would override any expectation of seeing Thai sorted in Thai order. I concede that it is surprising (but generally convenient) that Thai dictionaries ignore consonant clustering in their sort order. Link to comment Share on other sites More sharing options...
Edward B Posted September 21, 2005 Share Posted September 21, 2005 ...I concede that it is surprising (but generally convenient) that Thai dictionaries ignore consonant clustering in their sort order. Sorry Richard, but can you explain what you mean by this, perhaps with a couple of examples? Link to comment Share on other sites More sharing options...
Mr. Farang Posted September 21, 2005 Share Posted September 21, 2005 Dear Bryan, I kindly suggest that one of the next steps in your analysis is to map the Unicode encoding for the Thai language to your list, to see if a pattern emerges. Yours sincerely, Mr. Farang Good, well informed answers. I will pore over them for awhile. It does appear that the sort routine sorts on the first character, then the second and so on; putting the vowels first for some reason.. This results in several errors in the following list. I have been fixing these manually, but if it is a large list, say 1000 words, that could be time consuming.เครื่องหมาย เวลา แท้ ใจความ กริยานุเคราะห์ กำหนด ขยาย ข้าราชการ ครบบริบูรณ์ ดังนี้ ตัวอย่าง ทั่วไป ประโยค ประกอบ ประธาน ระเบียบ ระดับ ราชาศัพท์ วิกรรตถกริยา สมบูรณ์ สมุหนาม สุภาษิต I will look into this, and would be happy to share anything I come up with or work with someone, though I am not any kind of computer programmer. Maybe there would be some kind of macro that could be written. Thank you all, Bryan <{POST_SNAPBACK}> Link to comment Share on other sites More sharing options...
wolf5370 Posted September 21, 2005 Share Posted September 21, 2005 Cheap and nasty way to sort only on Thai Consts in column (in this case column A). Knocked it up in 30 minutes, so its not perfect, and only sorts for a single column which must be A - can be changed to accomodate multi columns and/or prompt for the column etc. It still uses Excel's sort (which is much faster than a bubble or ripple sort written in VBA!), but it ignores the vowels. Anyway, cut and past this into a VBA Macro (XL 98+ - though I used 2002, so may need modifing for earlier versions): Option Explicit Sub ThaiSort_Click() Dim lngLoop As Long, lngStartRow As Long Dim strWord1 As String Dim blnHDR As Boolean Dim intWC1 As Integer, intWC2 As Integer If MsgBox("Header Row?", vbYesNo + vbQuestion, "Thai Sort") = vbYes Then lngStartRow = 2 blnHDR = xlYes And 1 Else lngStartRow = 1 blnHDR = xlNo And 1 End If 'Create a new sheet temporarily Dim tempWS As Worksheet Dim tempWSName As String, curWSName As String Dim intMissCount As Integer curWSName = ActiveSheet.Name tempWSName = "Wolf5370" & Left(Time, 2) & Mid(Time, 4, 2) & Right(Time, 2) Set tempWS = Worksheets.Add tempWS.Name = tempWSName 'Copy over the column to be sorted (Col A) using C&P Sheets(curWSName).Select Columns("A:A").Select Selection.Copy Sheets(tempWSName).Select Columns("A:A").Select ActiveSheet.Paste 'Remove all vowels and other crap and dump into column B on the temp sheet intMissCount = 0 For lngLoop = lngStartRow To Rows.Count If Range("A" & lngLoop).Value = "" Then intMissCount = intMissCount + 1 Else intMissCount = 0 End If If intMissCount > 100 Then Exit For Range("B" & lngLoop).Value = ConstsOnly(Range("A" & lngLoop).Value) Next lngLoop 'Now sort it by the de-vowelled column Columns("A:B").Select Selection.Sort Key1:=Range("B" & lngStartRow), Order1:=xlAscending, Key2:=Range("A" & lngStartRow) _ , Order2:=xlAscending, Header:=blnHDR, OrderCustom:=1, MatchCase:=False _ , Orientation:=xlTopToBottom, DataOption1:=xlSortNormal, DataOption2:= _ xlSortNormal 'Now copy it back Columns("A:A").Select Selection.Copy Sheets(curWSName).Select Columns("A:A").Select ActiveSheet.Paste 'Now kill the temp sheet Application.DisplayAlerts = False Sheets(tempWSName).Delete Application.DisplayAlerts = True End Sub Function ConstsOnly(ByVal strWord As String) As String 'Routine to remove Thai Vowels Dim intInnerLoop As Integer Dim strTemp As String, strChar As String Dim lngUniCode As Long If Len(strWord) = 0 Then ConstsOnly = "" Exit Function End If strTemp = "" For intInnerLoop = 1 To Len(strWord) strChar = Mid(strWord, intInnerLoop, 1) lngUniCode = AscW(strChar) Select Case lngUniCode Case 161 To 206: strTemp = strTemp & strChar 'First set of UniCode Thai Consts. Case 3585 To 3630: strTemp = strTemp & strChar 'Second set of UniCode Thai Consts. Case 3663 To 3673: strTemp = strTemp & strChar ' Thai numbers in UniCode Case 63247: strTemp = strTemp & strChar 'Another Thai Const Case 63247: strTemp = strTemp & strChar 'Another Thai Const Case 63232: strTemp = strTemp & strChar 'Another Thai Const End Select Next intInnerLoop ConstsOnly = strTemp End Function Link to comment Share on other sites More sharing options...
katana Posted September 21, 2005 Share Posted September 21, 2005 I have been making myself Thai wordlists for self study in Excel. When I sort them it puts all the แ เ ไ and โ at the head of the list, rather than sorting according to the consonants as in a dictionary. The alphabetical sort is just not smart enough (or maybe its the operator). Has anyone else had this problem and solved it? I have an early version of Office XP on WinMe.Thanks, Bryan <{POST_SNAPBACK}> Hi, Are you using a Thai or English version of Windows ME? I found that some programs eg Pirch would only sort Thai in alphabetical order under Thai Windows. Secondly, I seem to remember that getting Office XP to display Thai properly on English Windows ME was not straightforward and involved hacking the registry and installing extra files etc. Link to comment Share on other sites More sharing options...
Bryan in Isaan Posted September 21, 2005 Author Share Posted September 21, 2005 Dear Bryan,I kindly suggest that one of the next steps in your analysis is to map the Unicode encoding for the Thai language to your list, to see if a pattern emerges. Yours sincerely, Mr. Farang I agree that would be a good place to start. I will do some googling to figure out how to do it. So far all I have found is an Excel function "code()", which returns someting like an ascii number for each character, but its always the same, 63, for all the Thai chars. I will need to find a way to extract the real code number. As I recall from my little bit of programming experience in fortran (or was it pascal) in college, there was some "string" functions which would get it. Maybe then there would be a way to sort or do an customized sort on those numbers. As an aside, I also used a program which typed Thai in old ascii font, and I sorted that. Unlike unicode, the consonants came before the vowels, and I could get the ascii code for all the characters. Thanks, Bryan Link to comment Share on other sites More sharing options...
Mr. Farang Posted September 21, 2005 Share Posted September 21, 2005 Cheap and nasty way to sort only on Thai Consts in column (in this case column A). Knocked it up in 30 minutes, so its not perfect, and only sorts for a single column which must be A - can be changed to accomodate multi columns and/or prompt for the column etc. It still uses Excel's sort (which is much faster than a bubble or ripple sort written in VBA!), but it ignores the vowels. Anyway, cut and past this into a VBA Macro (XL 98+ - though I used 2002, so may need modifing for earlier versions): Dear Khun Wolf, Quite nice and very instructive as well. Thank you for this contribution! Yours sincerely, Mr. Farang Link to comment Share on other sites More sharing options...
Bryan in Isaan Posted September 21, 2005 Author Share Posted September 21, 2005 Hi,Are you using a Thai or English version of Windows ME? I found that some programs eg Pirch would only sort Thai in alphabetical order under Thai Windows. Secondly, I seem to remember that getting Office XP to display Thai properly on English Windows ME was not straightforward and involved hacking the registry and installing extra files etc. <{POST_SNAPBACK}> I am using English WinMe and English OfficeXP. It has been generally well behaved and was easy to set up, but missing a few functions, for example, it won't read Thai filenames, and this sorting problem. Cheap and nasty way to sort only on Thai Consts in column (in this case column A). Knocked it up in 30 minutes, so its not perfect, and only sorts for a single column which must be A - can be changed to accomodate multi columns and/or prompt for the column etc. It still uses Excel's sort (which is much faster than a bubble or ripple sort written in VBA!), but it ignores the vowels. Anyway, cut and past this into a VBA Macro (XL 98+ - though I used 2002, so may need modifing for earlier versions): This will be quite a learning experience for me, as I try using this. It probably answers the questions raised in my previous post from a few minutes ago. Thank you for spending the time and expertise in writing the code. And I thank everyone else for all the other good answers. Bryan Link to comment Share on other sites More sharing options...
wolf5370 Posted September 21, 2005 Share Posted September 21, 2005 So far all I have found is an Excel function "code()", which returns someting like an ascii number for each character, but its always the same, 63, for all the Thai chars Yeah, its better addressing the Thai Characters by their Unicode number. The way to do this is using AscW (W stands for Wide I think - ASC for American Standard Code, as in ASCII). You can go the other way using ChrW with the Unicode value. The Character Map (Start->Programs->Accessories->Character Map) will show the Ascii and Unicode characters for any font set - select a Thai font and clicking any character will give its values. Link to comment Share on other sites More sharing options...
Richard W Posted September 21, 2005 Share Posted September 21, 2005 I have been making myself Thai wordlists for self study in Excel. When I sort them it puts all the แ เ ไ and โ at the head of the list, rather than sorting according to the consonants as in a dictionary. The alphabetical sort is just not smart enough (or maybe its the operator). Has anyone else had this problem and solved it? I have an early version of Office XP on WinMe. <{POST_SNAPBACK}> Well, now you've asked, I now have the 'problem' myself. I'm using Excel 2002 (Thai edition? - the menus are all in Thai) under Windows XP Home Edition SP2. I write 'problem' because I use Word for tables of text, and Word 2002 sorts the posted list corectly. I'm not sure if Windows XP installations have a language - if it does, mine's is English. The only thing that could conceivably make a difference is that the default codepage is Windows-874 (i.e. Thai). I had to switch it back because otherwise the command interpreter would not handle Thai filenames. So, unless you are using Excel as a spreadsheet (or chess-playing program, or whatever), one possible solution is to copy your table to Word, sort it there, and copy back to Excel. Link to comment Share on other sites More sharing options...
Mr. Farang Posted September 21, 2005 Share Posted September 21, 2005 Yeah, its better addressing the Thai Characters by their Unicode number. The way to do this is using AscW (W stands for Wide I think - ASC for American Standard Code, as in ASCII). You can go the other way using ChrW with the Unicode value.The Character Map (Start->Programs->Accessories->Character Map) will show the Ascii and Unicode characters for any font set - select a Thai font and clicking any character will give its values. <{POST_SNAPBACK}> For starters, I kindly suggest someone create an Excel spreadsheet with the Thai alphabet in one column and the Unicode numeric identifers in the next column and post the XLS to this thread, we can use this file for "configuration control".....so we can "sing" off the same "sheet of music".... so to speak. If we can't post an XLS in TV, I suggest that we rename the file FILENAME.XLS.PDF then someone can upload and we can download and change the name back to FILENAME.XLS :-_ Anyone have time to do this? Volunteers? Link to comment Share on other sites More sharing options...
Richard W Posted September 21, 2005 Share Posted September 21, 2005 Yeah, its better addressing the Thai Characters by their Unicode number. The way to do this is using AscW (W stands for Wide I think - ASC for American Standard Code, as in ASCII). You can go the other way using ChrW with the Unicode value.The Character Map (Start->Programs->Accessories->Character Map) will show the Ascii and Unicode characters for any font set - select a Thai font and clicking any character will give its values. <{POST_SNAPBACK}> And Unicode publishes charts by script, partly broken up by code range. There are some curious gaps in what the Character Map will show. It hasn't yet been updated to display Tamil letter sha (U+0BB6) (the equivalent of ศ), which was admitted this year. I'm not sure if any versions of the Windows renderer (Uniscribe) yet treat it as a Tamil letter. Link to comment Share on other sites More sharing options...
Richard W Posted September 21, 2005 Share Posted September 21, 2005 (edited) For starters, I kindly suggest someone create an Excel spreadsheet with the Thai alphabet in one column and the Unicode numeric identifers in the next column and post the XLS to this thread, we can use this file for "configuration control".....so we can "sing" off the same "sheet of music".... so to speak.If we can't post an XLS in TV, I suggest that we rename the file FILENAME.XLS.PDF then someone can upload and we can download and change the name back to FILENAME.XLS :-_ <{POST_SNAPBACK}> Why? The Thai code chart is readily available from Unicode, and the basic data for Thai Unicode characters (from UnicodeData.txt) follows: 0E01;THAI CHARACTER KO KAI;Lo;0;L;;;;;N;THAI LETTER KO KAI;;;; 0E02;THAI CHARACTER KHO KHAI;Lo;0;L;;;;;N;THAI LETTER KHO KHAI;;;; 0E03;THAI CHARACTER KHO KHUAT;Lo;0;L;;;;;N;THAI LETTER KHO KHUAT;;;; 0E04;THAI CHARACTER KHO KHWAI;Lo;0;L;;;;;N;THAI LETTER KHO KHWAI;;;; 0E05;THAI CHARACTER KHO KHON;Lo;0;L;;;;;N;THAI LETTER KHO KHON;;;; 0E06;THAI CHARACTER KHO RAKHANG;Lo;0;L;;;;;N;THAI LETTER KHO RAKHANG;;;; 0E07;THAI CHARACTER NGO NGU;Lo;0;L;;;;;N;THAI LETTER NGO NGU;;;; 0E08;THAI CHARACTER CHO CHAN;Lo;0;L;;;;;N;THAI LETTER CHO CHAN;;;; 0E09;THAI CHARACTER CHO CHING;Lo;0;L;;;;;N;THAI LETTER CHO CHING;;;; 0E0A;THAI CHARACTER CHO CHANG;Lo;0;L;;;;;N;THAI LETTER CHO CHANG;;;; 0E0B;THAI CHARACTER SO SO;Lo;0;L;;;;;N;THAI LETTER SO SO;;;; 0E0C;THAI CHARACTER CHO CHOE;Lo;0;L;;;;;N;THAI LETTER CHO CHOE;;;; 0E0D;THAI CHARACTER YO YING;Lo;0;L;;;;;N;THAI LETTER YO YING;;;; 0E0E;THAI CHARACTER DO CHADA;Lo;0;L;;;;;N;THAI LETTER DO CHADA;;;; 0E0F;THAI CHARACTER TO PATAK;Lo;0;L;;;;;N;THAI LETTER TO PATAK;;;; 0E10;THAI CHARACTER THO THAN;Lo;0;L;;;;;N;THAI LETTER THO THAN;;;; 0E11;THAI CHARACTER THO NANGMONTHO;Lo;0;L;;;;;N;THAI LETTER THO NANGMONTHO;;;; 0E12;THAI CHARACTER THO PHUTHAO;Lo;0;L;;;;;N;THAI LETTER THO PHUTHAO;;;; 0E13;THAI CHARACTER NO NEN;Lo;0;L;;;;;N;THAI LETTER NO NEN;;;; 0E14;THAI CHARACTER DO DEK;Lo;0;L;;;;;N;THAI LETTER DO DEK;;;; 0E15;THAI CHARACTER TO TAO;Lo;0;L;;;;;N;THAI LETTER TO TAO;;;; 0E16;THAI CHARACTER THO THUNG;Lo;0;L;;;;;N;THAI LETTER THO THUNG;;;; 0E17;THAI CHARACTER THO THAHAN;Lo;0;L;;;;;N;THAI LETTER THO THAHAN;;;; 0E18;THAI CHARACTER THO THONG;Lo;0;L;;;;;N;THAI LETTER THO THONG;;;; 0E19;THAI CHARACTER NO NU;Lo;0;L;;;;;N;THAI LETTER NO NU;;;; 0E1A;THAI CHARACTER BO BAIMAI;Lo;0;L;;;;;N;THAI LETTER BO BAIMAI;;;; 0E1B;THAI CHARACTER PO PLA;Lo;0;L;;;;;N;THAI LETTER PO PLA;;;; 0E1C;THAI CHARACTER PHO PHUNG;Lo;0;L;;;;;N;THAI LETTER PHO PHUNG;;;; 0E1D;THAI CHARACTER FO FA;Lo;0;L;;;;;N;THAI LETTER FO FA;;;; 0E1E;THAI CHARACTER PHO PHAN;Lo;0;L;;;;;N;THAI LETTER PHO PHAN;;;; 0E1F;THAI CHARACTER FO FAN;Lo;0;L;;;;;N;THAI LETTER FO FAN;;;; 0E20;THAI CHARACTER PHO SAMPHAO;Lo;0;L;;;;;N;THAI LETTER PHO SAMPHAO;;;; 0E21;THAI CHARACTER MO MA;Lo;0;L;;;;;N;THAI LETTER MO MA;;;; 0E22;THAI CHARACTER YO YAK;Lo;0;L;;;;;N;THAI LETTER YO YAK;;;; 0E23;THAI CHARACTER RO RUA;Lo;0;L;;;;;N;THAI LETTER RO RUA;;;; 0E24;THAI CHARACTER RU;Lo;0;L;;;;;N;THAI LETTER RU;;;; 0E25;THAI CHARACTER LO LING;Lo;0;L;;;;;N;THAI LETTER LO LING;;;; 0E26;THAI CHARACTER LU;Lo;0;L;;;;;N;THAI LETTER LU;;;; 0E27;THAI CHARACTER WO WAEN;Lo;0;L;;;;;N;THAI LETTER WO WAEN;;;; 0E28;THAI CHARACTER SO SALA;Lo;0;L;;;;;N;THAI LETTER SO SALA;;;; 0E29;THAI CHARACTER SO RUSI;Lo;0;L;;;;;N;THAI LETTER SO RUSI;;;; 0E2A;THAI CHARACTER SO SUA;Lo;0;L;;;;;N;THAI LETTER SO SUA;;;; 0E2B;THAI CHARACTER HO HIP;Lo;0;L;;;;;N;THAI LETTER HO HIP;;;; 0E2C;THAI CHARACTER LO CHULA;Lo;0;L;;;;;N;THAI LETTER LO CHULA;;;; 0E2D;THAI CHARACTER O ANG;Lo;0;L;;;;;N;THAI LETTER O ANG;;;; 0E2E;THAI CHARACTER HO NOKHUK;Lo;0;L;;;;;N;THAI LETTER HO NOK HUK;;;; 0E2F;THAI CHARACTER PAIYANNOI;Lo;0;L;;;;;N;THAI PAI YAN NOI;paiyan noi;;; 0E30;THAI CHARACTER SARA A;Lo;0;L;;;;;N;THAI VOWEL SIGN SARA A;;;; 0E31;THAI CHARACTER MAI HAN-AKAT;Mn;0;NSM;;;;;N;THAI VOWEL SIGN MAI HAN-AKAT;;;; 0E32;THAI CHARACTER SARA AA;Lo;0;L;;;;;N;THAI VOWEL SIGN SARA AA;;;; 0E33;THAI CHARACTER SARA AM;Lo;0;L;<compat> 0E4D 0E32;;;;N;THAI VOWEL SIGN SARA AM;;;; 0E34;THAI CHARACTER SARA I;Mn;0;NSM;;;;;N;THAI VOWEL SIGN SARA I;;;; 0E35;THAI CHARACTER SARA II;Mn;0;NSM;;;;;N;THAI VOWEL SIGN SARA II;;;; 0E36;THAI CHARACTER SARA UE;Mn;0;NSM;;;;;N;THAI VOWEL SIGN SARA UE;;;; 0E37;THAI CHARACTER SARA UEE;Mn;0;NSM;;;;;N;THAI VOWEL SIGN SARA UEE;sara uue;;; 0E38;THAI CHARACTER SARA U;Mn;103;NSM;;;;;N;THAI VOWEL SIGN SARA U;;;; 0E39;THAI CHARACTER SARA UU;Mn;103;NSM;;;;;N;THAI VOWEL SIGN SARA UU;;;; 0E3A;THAI CHARACTER PHINTHU;Mn;9;NSM;;;;;N;THAI VOWEL SIGN PHINTHU;;;; 0E3F;THAI CURRENCY SYMBOL BAHT;Sc;0;ET;;;;;N;THAI BAHT SIGN;;;; 0E40;THAI CHARACTER SARA E;Lo;0;L;;;;;N;THAI VOWEL SIGN SARA E;;;; 0E41;THAI CHARACTER SARA AE;Lo;0;L;;;;;N;THAI VOWEL SIGN SARA AE;;;; 0E42;THAI CHARACTER SARA O;Lo;0;L;;;;;N;THAI VOWEL SIGN SARA O;;;; 0E43;THAI CHARACTER SARA AI MAIMUAN;Lo;0;L;;;;;N;THAI VOWEL SIGN SARA MAI MUAN;sara ai mai muan;;; 0E44;THAI CHARACTER SARA AI MAIMALAI;Lo;0;L;;;;;N;THAI VOWEL SIGN SARA MAI MALAI;sara ai mai malai;;; 0E45;THAI CHARACTER LAKKHANGYAO;Lo;0;L;;;;;N;THAI LAK KHANG YAO;lakkhang yao;;; 0E46;THAI CHARACTER MAIYAMOK;Lm;0;L;;;;;N;THAI MAI YAMOK;mai yamok;;; 0E47;THAI CHARACTER MAITAIKHU;Mn;0;NSM;;;;;N;THAI VOWEL SIGN MAI TAI KHU;mai taikhu;;; 0E48;THAI CHARACTER MAI EK;Mn;107;NSM;;;;;N;THAI TONE MAI EK;;;; 0E49;THAI CHARACTER MAI THO;Mn;107;NSM;;;;;N;THAI TONE MAI THO;;;; 0E4A;THAI CHARACTER MAI TRI;Mn;107;NSM;;;;;N;THAI TONE MAI TRI;;;; 0E4B;THAI CHARACTER MAI CHATTAWA;Mn;107;NSM;;;;;N;THAI TONE MAI CHATTAWA;;;; 0E4C;THAI CHARACTER THANTHAKHAT;Mn;0;NSM;;;;;N;THAI THANTHAKHAT;;;; 0E4D;THAI CHARACTER NIKHAHIT;Mn;0;NSM;;;;;N;THAI NIKKHAHIT;nikkhahit;;; 0E4E;THAI CHARACTER YAMAKKAN;Mn;0;NSM;;;;;N;THAI YAMAKKAN;;;; 0E4F;THAI CHARACTER FONGMAN;Po;0;L;;;;;N;THAI FONGMAN;;;; 0E50;THAI DIGIT ZERO;Nd;0;L;;0;0;0;N;;;;; 0E51;THAI DIGIT ONE;Nd;0;L;;1;1;1;N;;;;; 0E52;THAI DIGIT TWO;Nd;0;L;;2;2;2;N;;;;; 0E53;THAI DIGIT THREE;Nd;0;L;;3;3;3;N;;;;; 0E54;THAI DIGIT FOUR;Nd;0;L;;4;4;4;N;;;;; 0E55;THAI DIGIT FIVE;Nd;0;L;;5;5;5;N;;;;; 0E56;THAI DIGIT SIX;Nd;0;L;;6;6;6;N;;;;; 0E57;THAI DIGIT SEVEN;Nd;0;L;;7;7;7;N;;;;; 0E58;THAI DIGIT EIGHT;Nd;0;L;;8;8;8;N;;;;; 0E59;THAI DIGIT NINE;Nd;0;L;;9;9;9;N;;;;; 0E5A;THAI CHARACTER ANGKHANKHU;Po;0;L;;;;;N;THAI ANGKHANKHU;;;; 0E5B;THAI CHARACTER KHOMUT;Po;0;L;;;;;N;THAI KHOMUT;;;; The first few semicolon-separated fields are code (in hex), Unicode name, type of character, combining class, left-to-right property (for mixing with Arabic or Hebrew). Edited September 21, 2005 by Richard W Link to comment Share on other sites More sharing options...
Mr. Farang Posted September 21, 2005 Share Posted September 21, 2005 Why? Because with an XLS spreadsheet, one can readily experiment with sorting and macros, and look at the co-relationships of how the Unicode sorts, etc. An experimenter cannot easily "do that" with what you posted Khun Richard because what you posted in not in an Excel spreadsheet .... to answer your question "why?" Cheers! Link to comment Share on other sites More sharing options...
Richard W Posted September 21, 2005 Share Posted September 21, 2005 ...I concede that it is surprising (but generally convenient) that Thai dictionaries ignore consonant clustering in their sort order. Sorry Richard, but can you explain what you mean by this, perhaps with a couple of examples? <{POST_SNAPBACK}> The general rule in sorting Indic languages is that one sorts by the order of the phonetic elements, so เพลา [M]phlau 'axis' would be sorted as พ + ล + เ-า. However, in Thai, this would be very inconvenient if you don't know how the word was pronounced, because you could be looking at เพลา [M]phee[M]laa '(meal) time', which would be sorted as พ + เ- + ล + า. (This word is actually a doublet of เวสา 'time'.) The rule that is applied is to ignore clusters, so both words เพลา are sorted as พ + เ- + ล + า. Similarly, แหน [RL]haen 'keep for oneself' and [RL]nae 'duckweed' are sorted as ห + แ + น, and appear next to one another in the dictionary. As tonemarks (and maitaikhu and karan) are used to sort only when the consonants and vowels are the same, the four words แ้หง [RL]ngae 'sheepishly', แหง่ [LL]ngae 'buffalo calf', แห่ง [LS]haeng 'place' and แห้ง [FL]haeng 'dry, hoarse' appear together and in that order in the dictionary, all sorted as ห + แ + ง and then ordered on the basis of the presence and type or absence of a tonemark. In most Indic languages the problem does not arise, because the absence of a vowel is marked. However, the vowel absence markers in the Thai script, phinthu and yamakkan, are rarely used. Link to comment Share on other sites More sharing options...
Mr. Farang Posted September 22, 2005 Share Posted September 22, 2005 (edited) (This word is actually a doublet of เวสา 'time'.) <{POST_SNAPBACK}> Dear Khun Richard, Finally! I get a chance to improve your excellent posts - an honor, Sir. Time == เวลา not your accidental typo เวสา above. Just a tiny typo, ส (saw so) v. ล (law ling) Khrap Phom! Mr. Farang Edited September 22, 2005 by Mr. Farang Link to comment Share on other sites More sharing options...
Tywais Posted September 22, 2005 Share Posted September 22, 2005 (This word is actually a doublet of เวสา 'time'.) Dear Khun Richard, Finally! I get a chance to improve your excellent posts - an honor, Sir. Time == เวลา not your accidental typo เวสา above. Just a tiny typo, ส (saw so) v. ล (law ling) Khrap Phom! Mr. Farang Khun Mr. Farang. Curious as to why you say "saw so" for ส. All my books including Thai school books use "saw seua" (tiger) as the example? Is there a standardized set of words used for the support words for the characters? However my books are really old, like me. Link to comment Share on other sites More sharing options...
Richard W Posted September 22, 2005 Share Posted September 22, 2005 (This word is actually a doublet of เวสา 'time'.) <{POST_SNAPBACK}> Finally! I get a chance to improve your excellent posts - an honor, Sir. Time == เวลา not your accidental typo เวสา above. <{POST_SNAPBACK}> But you missed a real clanger! I said 'sorting Indic languages', but I should have said 'sorting in Indic scripts'. Thai is very definitely not an Indo-European language, let alone an Indic language! Link to comment Share on other sites More sharing options...
Mr. Farang Posted September 22, 2005 Share Posted September 22, 2005 (edited) Khun Mr. Farang.Curious as to why you say "saw so" for ส. All my books including Thai school books use "saw seua" (tiger) as the example? Is there a standardized set of words used for the support words for the characters? However my books are really old, like me. <{POST_SNAPBACK}> Dear Khun Tywais, I made a mistake! (Was watching Hurricane Rita heading toward the coast of Texas - third "biggest" hurricane in the history of US hurricanes!!) Yours sincerely, Mr. Farang Edited September 22, 2005 by Mr. Farang Link to comment Share on other sites More sharing options...
wolf5370 Posted September 22, 2005 Share Posted September 22, 2005 Is there a standardized set of words used for the support words for the characters? Yep there is. The words are fixed (although some do no fit the words anymore - Kor Khun for example as it is becoming obsolete the spelling for Khun has changed). Other than that (and maybe the extra couple of consonents - if your books say 46 instead of 44) your 'old' books should serve you just as well. PS: To get a list in Excel as requested earlier try pasting this into an empty workbook's VBA (Macro) and run it. It will list all the characters with their ASCII (pointless, but for information) and Unicode in both hex and decimal. Some vowels will be preceeded by Ohr Ang (Zero Char) as they will not print to the screen otherwise. Public Sub MakeList() Dim lngLoop As Long Dim lngRowNumber As Long Dim strChar As String 'Set Initial Row Number lngRowNumber = 1 'Clear down earlier run Cells.Select Selection.ClearContents 'Put in headers Cells(1, 1).Value = "Char" Cells(1, 2).Value = "ASCII (Hex)" Cells(1, 3).Value = "ASCII (Dec)" Cells(1, 4).Value = "UniCode (Hex)" Cells(1, 5).Value = "UniCode (Dec)" 'Loop through valid UniCode Chars For lngLoop = CLng(&HE01) To CLng(&HE5B) 'Ignore the missing parts of the Unicode set (no characters defined) If lngLoop < &HE3B Or lngLoop > &HE3E Then lngRowNumber = lngRowNumber + 1 Cells(lngRowNumber, 1).NumberFormat = "@" 'Make it text so no's will not show as digits strChar = ChrW(lngLoop) 'Get the character from the UniCode Value If (lngLoop = &HE31) Or (lngLoop > &HE33 And lngLoop < &HE3F) Or _ (lngLoop > &HE46 And lngLoop < &HE4F) Then ' place zero character (Ohr Ang) for these vowels/marks so they will print to screen Cells(lngRowNumber, 1).Value = ChrW(&HE2D) & strChar ' Ohr Ang + Char Else Cells(lngRowNumber, 1).Value = strChar ' Character End If Cells(lngRowNumber, 2).Value = Hex(Asc(strChar)) ' Hex Ascii Cells(lngRowNumber, 3).Value = Asc(strChar) ' Decimal Ascii Cells(lngRowNumber, 4).Value = Hex(lngLoop) ' Hex UniCode Cells(lngRowNumber, 5).Value = lngLoop ' Decimal Unicode End If Next lngLoop End Sub Link to comment Share on other sites More sharing options...
Mr. Farang Posted September 23, 2005 Share Posted September 23, 2005 (edited) PS: To get a list in Excel as requested earlier try pasting this into an empty workbook's VBA (Macro) and run it. It will list all the characters with their ASCII (pointless, but for information) and Unicode in both hex and decimal.Some vowels will be preceeded by Ohr Ang (Zero Char) as they will not print to the screen otherwise. <{POST_SNAPBACK}> Dear Khun Wolf, Excellent, thank you. The Macro worked great! You are a talented macro programmer - ging ging na krap... :-) . I experimented with the spreadsheet created by the macro and sorted on "Char". Sample partial results are attached in a screenshot. Note the vowels without "Awe Ang"come before the consonants. Interesting. Any ideas why anyone? Yours sincerely, Mr. Farang Edited September 23, 2005 by Mr. Farang Link to comment Share on other sites More sharing options...
Richard W Posted September 23, 2005 Share Posted September 23, 2005 I experimented with the spreadsheet created by the macro and sorted on "Char". Sample partial results are attached in a screenshot. Note the vowels without "Awe Ang"come before the consonants. Interesting. Any ideas why anyone? <{POST_SNAPBACK}> Yes! The sorting for Thai has been miscoded. Someone somewhere has misunderstood the Thai vowels. In most Indic languages, there are two types of vowel symbols. There are the 'independent vowels', which are used without consonants, and the 'dependent vowels', which are used with consonants. Roughly speaking, the independent vowels are the ones that appear at the start of a word. In Devanagari, the most important of the Indic alphabets, the independent vowels come before the consonants in alphabetic order. It may also be relevant that in Devanagari, only one of the vowels appears before its consonant. Thai has ditched the independent vowels, except for o ang, which it combines with the dependent vowels to make up for the lack of independent vowels. (Burmese and Khmer are moving in the same direction - independent vowels are chiefly used for Pali/Sanskrit loans.) I think someone thought that the Thai vowel symbols that can appear at the start of a word were independent vowels, and carefully ordered them between the digits and the consonants. Note that sara aa and sara am, which don't need a leading o ang, aren't misordered, but appear after the consonants. Incidentally, ignore the 'ASCII' columns generated by the spreadsheet. What is happening is that Asc of a Thai string effectively returns a question mark - 3F is the ASCII code for '?'. Link to comment Share on other sites More sharing options...
Bryan in Isaan Posted September 23, 2005 Author Share Posted September 23, 2005 (edited) Hi,Are you using a Thai or English version of Windows ME? I found that some programs eg Pirch would only sort Thai in alphabetical order under Thai Windows. Secondly, I seem to remember that getting Office XP to display Thai properly on English Windows ME was not straightforward and involved hacking the registry and installing extra files etc. <{POST_SNAPBACK}> I got on my old computer with Thai version Win98 and Thai Office2000 and it sorted perfectly. Apparently the Thai version has an algorithm that will handle the vowels which occur before consonants. I don't have the latest version of Thai XP windows and office to try the sort. Hopefully it will work as well. Again, thanks everyone for all the excellent posts. Bryan Edited September 23, 2005 by Bryan in Isaan Link to comment Share on other sites More sharing options...
Richard W Posted September 23, 2005 Share Posted September 23, 2005 One solution to the problem is to generate a sort key in another column from the text and sort on that. Normally you will want to keep this extra column hidden. The function key() defined by the following macro seems to work: Public Function key(raw As String) key = Len(raw) pending = "" proc = "" follower = ChrW(&HE45) 'Character after last of preposed. For pos = 1 To Len(raw) onch = Mid(raw, pos, 1) code = AscW(onch) If code = follower Then proc = proc & follower End If If Len(pending) = 0 Then If &HE40 <= code And code <= &HE44 Then 'Preposed vowel pending = onch Else proc = proc & onch End If Else proc = proc & onch & follower & pending pending = "" End If Next pos If Len(pending) = 0 Then proc = proc & pending End If key = proc Exit Function End Function Swapping round preposed vowels and the following consonant does not quite work, as เลข would still sort before ลม. The trick I use to get round the preposed vowels' being misordered (between the digits and the consonants) is to insert a lakkhangyao (ๅ) between the characters, the next character in the normal sequence, and for good measure double original lakkhangyaos. I've tested the macro on Windows 2000 (Excel 2000? - I forgot to check the version number) and on Excel 2002 with Visual Basic 6.3 under Windows XP. It works with the original demonstration list, and it seems to work with tones. Link to comment Share on other sites More sharing options...
skraach Posted May 12, 2019 Share Posted May 12, 2019 To sort using Thai dictionary alphabetical order in MS Word and Excel, you need to set Window's format to Thai. This is how you do it in Windows 10: Control Panel > Region > "Formats" tab > Format: Thai (Thailand) > "Apply" button This setting will also change other things such as the default currency to baht and the date to Thai language and Thai years. So after you have done the sorting, you might want to change back to your previous format setting. 1 Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now