Jump to content

List Of 3000 Most Common Thai Words


Grover

Recommended Posts

In english they have a list like this - like the one for the Oxford Advanced learners dictionary, which has a careful selection of common use words. I've been searching for a list like this in Thai for years but never found one. Any pointers would be appreciated.

  • Like 1
Link to comment
Share on other sites

How should I be reading this list? What do the column headers Haas, Links, Orchid and Tax represent?

What do the numbers mean and why does each list have different numbers?

For example the first row, why do three lists have การ but one has เป็น?

Haas Links Orchid Tax

366 เป็น 15978 การ 11888 การ 9861 การ

I've put it in a slightly cleaner Excel Format attached here.

frequency.xls

Edited by wasabi
Link to comment
Share on other sites

Grover, the best I can do is I have a list of the 1000 most common words according to four sources of language corpora. I've attached a spreadsheet that I converted to HTML.

The best one is the Mary Haas list. Not sure about Haas, but the other three I know are all computed automatically, so the digits 0 to 9, among other things, count as "words" in their list, as well as some other things that aren't common Thai at all, but appear frequently in their corpora because of a large number of technical texts.

Hope this is helpful.

This is interesting, thanks Rikker. I did a similar thing a while back using all the text that people paste into thai2english.com, and for comparison the top 100 results in order were :

ที่ , และ , จะ , การ , มี , ใน , ได้ , ของ , เป็น , ให้ , ไป , ก็ , ไม่ , ว่า , แล้ว , มา , กับ , คุณ , ใจ , คน , เรา , ฉัน , แต่ , นะ , นี้ , ครับ , อยู่ , เธอ , กัน , ผม , โดย , มัน , จาก , ต้อง , ด้วย , เลย , ยัง , หรือ , ทำ , ใช้ , คือ , เขา , มาก , ผู้ , บอก , พี่ , ดู , เมื่อ , วัน , อะไร , เรื่อง , ถ้า , ดี , เพราะ , อยาก , ค่ะ , ไม่ได้ , ปี , อีก , เพื่อ , พระ , รัก , นั้น , ตัว , ถึง , งาน , สามารถ , หน้า , เวลา , ใคร , ไทย , เพลง , แบบ , ซึ่ง , ไว้ , ขอ , ส่ง , ต่อ , ความ , ท่าน , อย่าง , ใหม่ , เล่น , ก่อน , หา , บ้าน , ตาม , ทาง , สำหรับ , หนึ่ง , เอา , เค้า , คะ , ทำให้ , ขึ้น , ไม่มี , อ่าน , บาท , ราย , ชื่อ

ที่ was the most common by miles (about twice the count of และ), whereas all the others were relatively close. :o

  • Like 1
Link to comment
Share on other sites

Grover, the best I can do is I have a list of the 1000 most common words according to four sources of language corpora. I've attached a spreadsheet that I converted to HTML.

The best one is the Mary Haas list. Not sure about Haas, but the other three I know are all computed automatically, so the digits 0 to 9, among other things, count as "words" in their list, as well as some other things that aren't common Thai at all, but appear frequently in their corpora because of a large number of technical texts.

Hope this is helpful.

This is interesting, thanks Rikker. I did a similar thing a while back using all the text that people paste into thai2english.com, and for comparison the top 100 results in order were :

ที่ , และ , จะ , การ , มี , ใน , ได้ , ของ , เป็น , ให้ , ไป , ก็ , ไม่ , ว่า , แล้ว , มา , กับ , คุณ , ใจ , คน , เรา , ฉัน , แต่ , นะ , นี้ , ครับ , อยู่ , เธอ , กัน , ผม , โดย , มัน , จาก , ต้อง , ด้วย , เลย , ยัง , หรือ , ทำ , ใช้ , คือ , เขา , มาก , ผู้ , บอก , พี่ , ดู , เมื่อ , วัน , อะไร , เรื่อง , ถ้า , ดี , เพราะ , อยาก , ค่ะ , ไม่ได้ , ปี , อีก , เพื่อ , พระ , รัก , นั้น , ตัว , ถึง , งาน , สามารถ , หน้า , เวลา , ใคร , ไทย , เพลง , แบบ , ซึ่ง , ไว้ , ขอ , ส่ง , ต่อ , ความ , ท่าน , อย่าง , ใหม่ , เล่น , ก่อน , หา , บ้าน , ตาม , ทาง , สำหรับ , หนึ่ง , เอา , เค้า , คะ , ทำให้ , ขึ้น , ไม่มี , อ่าน , บาท , ราย , ชื่อ

ที่ was the most common by miles (about twice the count of และ), whereas all the others were relatively close. :D

Mike thats interisting to se how commomly used..

cheers :o

Link to comment
Share on other sites

How should I be reading this list? What do the column headers Haas, Links, Orchid and Tax represent?

What do the numbers mean and why does each list have different numbers?

For example the first row, why do three lists have การ but one has เป็น?

Haas Links Orchid Tax

366 เป็น 15978 การ 11888 การ 9861 การ

I've put it in a slightly cleaner Excel Format attached here.

Thanks for doing that. My original is in Excel, I just wanted to make sure everyone could access it.

The four columns are four different text collections/corpora. One from Mary Haas, another from NECTEC's Linguistics and Knowledge Science Laboratory (LINKS), Chula University's Orchid Corpus (appears to be offline right now), and the one labeled Tax I'm not clear on the exact source, but I think it might be the Thai tax code or a corpus of legal documents of some kind, given the high frequency of tax-related terms in their top 1000 words.

The number next to each word is the number of times that word appears in the corpus. The number at the top of each column is just a sum of the total number of occurrences of top 1000 words.

As for why the lists have different words in the top spots, well, that has to do with at least three things: [a] the size of the corpus, the variety (or lack of it) of the subject matter collected in the corpus, [c] the method used to count occurrences.

The line you've quoted is the top word in each of the four corpora. You can see the Haas corpus is a much smaller corpus, with its top word only occurring 366 times. The other three, all much larger, agree that การ is more common. Orchid is largest at 416,000, but I don't know what constitutes a "word" for the purposes of counting in the Orchid corpus. While English "words" don't correspond to the collections of letters between spaces as much as we tend to think they do, it makes it easy for establishing a clear meaning of "word" for the purpose of gathering corpora (and that is easily countable via automatic means). Thai... a bit trickier. I know the corpora on thai.sealang.net are all counted via number of characters, not words.

Also, one telltale sign that Tax is a very narrow corpus subject-matter-wise is the fact that while it is 269,000 words large, it only has 2100 distinct words in it, while even in Haas there are 4000 distinct words out of 27000 total words.

Edited by Rikker
  • Like 1
Link to comment
Share on other sites

Thanks for the detailed reply Rikker,

Where are you coming up with Tax having 2100 and Haas having 4000. I see each list having 1000 words? And can you further define what you mean by corpus. Is this some underlying body of work the statistics are based on? What is this body of work for each.

Link to comment
Share on other sites

Where do I get the translation for the words?

Paste them into www.thai2english.com or buy a dictionary. The process of looking words up is good for the memorization process.

The most common words have many different functions. เป็น for example - while it often just means 'be' or 'is', as in

ผมเป็นหมอ I am a doctor,

...it can also be a grammatical function word indicating result or manner:

หั่นเนื้อเป็นชิ้น Cut beef into slices. เป็นเวลาสองวัน ...for two days

...as well as ability:

เล้นกีตาร์ไม่เป็น can not play guitar...

  • Like 1
Link to comment
Share on other sites

In english they have a list like this - like the one for the Oxford Advanced learners dictionary, which has a careful selection of common use words. I've been searching for a list like this in Thai for years but never found one. Any pointers would be appreciated.

Not sure if this would help, but there's a great vocabulary builder from a company called Unforgettable Languages that uses easy memory aids for commonly used words. This is a great addition to your language learning IMHO. It is an easy way to pick up, in this case, about 230 commonly used words. I used it for Thai and Mandarin.

It can be found at: www.unforgettablelanguages.com

Link to comment
Share on other sites

In english they have a list like this - like the one for the Oxford Advanced learners dictionary, which has a careful selection of common use words. I've been searching for a list like this in Thai for years but never found one. Any pointers would be appreciated.

Not sure if this would help, but there's a great vocabulary builder from a company called Unforgettable Languages that uses easy memory aids for commonly used words. This is a great addition to your language learning IMHO. It is an easy way to pick up, in this case, about 230 commonly used words. I used it for Thai and Mandarin.

It can be found at: www.unforgettablelanguages.com

that is a good link indeed. a good system for vocab building.

eg. imagine a fat GUY (gai) eating a large chicken. and so on.

Edited by Grover
Link to comment
Share on other sites

  • 2 years later...

I realize this thread is getting old, but this has been really helpful to me. Why waste time learning a whole dictionary right away? Start with the 100 most common words and work up to the 1000 most common then perhaps 2500. That's a good vocabulary!

Using Rikkors lists along with thai2eng, and some others on ThailandQA.com, I have massaged the data trying to find concensus or at least trends. I will try to attach the spreadsheets I am using here, but I really don't spend much time using forums, so I may not succeed.

Two spreadsheets. All the lists were included, sorted, duplicates removed, then trimmed. Frequency table provided showing the degree of correlation between the lists.

As Rikkor kindly pointed out, the tax list seems to contain tax related terms and numbers and special characters were deleted.

All errors are mine alone and suggestions and corrections are gratefully accepted.

Happy new year.

100_Most_Common___Combined.xls

1000_Most_Common___Combined.xls

Link to comment
Share on other sites

Two spreadsheets. All the lists were included, sorted, duplicates removed, then trimmed. Frequency table provided showing the degree of correlation between the lists.

Nice. I went at it from a slightly different angle. Also, I'm going for 3000 as (I might be wrong) 1000 doesn't have enough word combos. I didn't grab the whole frequency list as you did, only Mary's. Then I added from AUA, Byki (not all), AWL, Thai-language.com starred, etc. Mike from Thai2English.com also has a frequency list that will come in at some point. Then there's a dictionary with the supposedly top 3000 but I found what I believe are 3 mistakes just in the first couple of pages, so I backed off from seriously checking it against mine.

My eventual aim is to put each with phrases as words on their own don't work with the way I learn. Then when I get to a certain point, I'll have someone in the know look at as there are sure to be a ton of iffy words. But right now, I'm just nibbling away and enjoying the finding of new words as I go.

Link to comment
Share on other sites

I'd forgot about pinning this topic. Great thing that you brought it back again.

My eventual aim is to put each with phrases as words on their own don't work with the way I learn.

It's not just you - it won't work for anyone who wants to speak anything resembling intelligible Thai. The example I posted above about how เป็น is used, is just a brief introduction to the word, and can be extended, not to mention the same thing could [should!] be repeated for most of the most common words.

In other words, these words can have completely different functions depending on the context.

If one doesn't learn grammatical patterns as well as idioms too, the words by themselves, with just one translation in English and no usage examples, won't do you much more good than getting 50 tons of bricks and mortar and an order to reconstruct Wat Benjamabophit, Suvarnabhumi airport, or Baiyoke 2.

A question for the advanced members, how well do you think this list relates to spoken Thai as opposed to the written Thai that provides the source ?

I would say that ก็ has to be number 1 in terms of spoken Thai surely !

For one, I think you'll find much more particles. Especially if you properly distinguish between ครับ อะ ฮะ ค่ะ จ้ะ วะ หว่า etc... you're right about ก็ too - it's a hesitation word.

  • Like 1
Link to comment
Share on other sites

  • 4 weeks later...
  • 1 month later...
  • 3 weeks later...
  • 2 months later...

I just wanted to say that I have not forgotten this exercise - the search to find the top 3000 Thai words. I do have a growing list in an Excel spreadsheet, but after throwing everything in, I realised awhile back that it has to be tackled from a different angle. A more practical angle.

Some Thai words are created by joining two words together - compound nouns, compound verbs, compound noun + verb, compound noun + adjective, etc - so while knowing the top word frequencies is handy, word frequencies do not work on their own.

I personally believe that you need to start from the meaning and work backwards. I now know that not all English words have an exact duplicate in Thai, so there is that problem. Another problem: sentences to go with each word = a must...

So that's what I'm doing now. Last year, with three Thais, a vocabulary list from a generic book on learning languages was put together. This past month I dragged it out of the mothballs. In the coming months I'll sort it to suit. Right now it matches the book (page by page) so it's pretty basic. And like I mentioned, it's generic to learning all languages so words special to Thai are not listed (yet). So I plan on deleting what doesn't belong, tweaking, and then I'll start adding the must know Thai words from my excel spread sheet until... well, until it is right. Or close to right. Or at least to the point where it generates less of an argument.

The file, such as it is, can be downloaded from this post. Rikker did a quick look at the file, so PLEASE read his disclaimer in the comments.

My dream, for each word, is to have a sentence for: proper Thai (the Thai we get in our course books), street Thai, a more polite Thai than street, and the Issan Thai we hear in taxis and up north.

Yes, I do realise the amount of work this will take, but I am not in any rush...

Btw - For what it's worth, I'm armed with a whole heap of Thai courses, dictionaries, grammar books, phrase books, online resources, etc. I have Rikker's top frequencies, as well as thai2english.com's and others. Seems to me, some sense can be made from all of the resources combined. Or at least I'll have fun trying :)

  • Like 1
Link to comment
Share on other sites

  • 3 weeks later...
I just wanted to say that I have not forgotten this exercise - the search to find the top 3000 Thai words. I do have a growing list in an Excel spreadsheet, but after throwing everything in, I realised awhile back that it has to be tackled from a different angle. A more practical angle.

Some Thai words are created by joining two words together - compound nouns, compound verbs, compound noun + verb, compound noun + adjective, etc - so while knowing the top word frequencies is handy, word frequencies do not work on their own.

I personally believe that you need to start from the meaning and work backwards. I now know that not all English words have an exact duplicate in Thai, so there is that problem. Another problem: sentences to go with each word = a must...

So that's what I'm doing now. Last year, with three Thais, a vocabulary list from a generic book on learning languages was put together. This past month I dragged it out of the mothballs. In the coming months I'll sort it to suit. Right now it matches the book (page by page) so it's pretty basic. And like I mentioned, it's generic to learning all languages so words special to Thai are not listed (yet). So I plan on deleting what doesn't belong, tweaking, and then I'll start adding the must know Thai words from my excel spread sheet until... well, until it is right. Or close to right. Or at least to the point where it generates less of an argument.

The file, such as it is, can be downloaded from this post. Rikker did a quick look at the file, so PLEASE read his disclaimer in the comments.

My dream, for each word, is to have a sentence for: proper Thai (the Thai we get in our course books), street Thai, a more polite Thai than street, and the Issan Thai we hear in taxis and up north.

Yes, I do realise the amount of work this will take, but I am not in any rush...

Btw - For what it's worth, I'm armed with a whole heap of Thai courses, dictionaries, grammar books, phrase books, online resources, etc. I have Rikker's top frequencies, as well as thai2english.com's and others. Seems to me, some sense can be made from all of the resources combined. Or at least I'll have fun trying :)

I ran across a website that I like very much and have been using for a few weeks. You can try 15 or so free lessons and then subscribe to all the courses for $6.99 per month. It is at www.its4thai.com If you know of a better one, I would appreciate knowing about it. I am also looking for a list of the 1000 most frequently used English words with their Thai meanings. Any help there?

Tom

Link to comment
Share on other sites

I ran across a website that I like very much and have been using for a few weeks. You can try 15 or so free lessons and then subscribe to all the courses for $6.99 per month. It is at www.its4thai.com If you know of a better one, I would appreciate knowing about it. I am also looking for a list of the 1000 most frequently used English words with their Thai meanings. Any help there?

Tom

its4thai.com is a decent program as Stuart has put in a lot of time and effort into making it easy to use.

As for knowing a better one... truthfully, it does not matter which method or program you use, only that you do.

http://learn-thai-podcast.com/

http://www.pimsleurapproach.com/learn-thai.asp

http://www.linguaphone.co.uk/language/thai.cfm

http://langhub.com/en-th/

http://www.byki.com/fls/free-thai-software...oad.html?l=thai

http://www.thaiforbeginners.com

And a new one - http://www.thai-flashcards.com/

Also, there are fantastic books out there for learning to read Thai (I have two favs).

And tons more in the free resource url I posted in an earlier comment...

They all work.

For the top 1000... I couldn't stop at just 1000... but there is a course proposing to teach with just a handful of Thai words -->> http://www.letstalkthai.com.au/

  • Like 1
Link to comment
Share on other sites

Here is my plan. This link (sealang) is most likely in here already somewhere, but I didn’t see it and

stumbled on it by myself. It is words by category and most of them have a commonality rating by web

count. In my case, I think I should start learning some government terms if I am ever going to comprehend

the news. There are some categories such as politics, bureaucracy, international relations, King and Royal

family. I’ll make a list of the most common in those and start listening for them on newscasts. It will be

boring but something I think I must do. Until recently I thought Mon Dtree was the last name of a prominent

politician.After government I can move on to crime words.

There was a football fan thread looking for sports terms, the same could be done with

that. They have football, sports verbs etc.

A nice feature if someone were to do a spread sheet of common words, is assign a code letter to each for category

so they could sorted that way if so desired.

JMO

http://www.sealang.net/thai/vocabulary/

  • Like 1
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    • No registered users viewing this page.





×
×
  • Create New...