Jump to content

HowTo: Learning With Text (LWT) on AWS


Atlan

Recommended Posts

Hello everyone.

 

I wanted to learn to read and write Thai for a while now, but I always failed. I always liked the concept of lingq but unfortunately it does not support Thai. But I stumbled upon Learning with Text (LWT) and created a small Server on the Amazon Cloud Servers which in the smallest version as of right now can be used free of charge for 12 month. So here is a small tutorial for everyone who also wants to use it. I will also add a small Python script I created to separate the Thai text as LWT requires spaces between the words to know where a word starts and where it ends. But there is a setting which will delete the spaces for the learning, so you can get used to the text without spaces in between. It is also possible to add 2 dictionaries to LWT, so it automatically searches for a word if you click on it. Also sound files can be added. As of right now it is not as convenient as lingq but it works and is free of charge. I added the the 22nd chapter of the Maanii books to it, so as long as no one deletes it you can try it. One short notice, all the pictures in the tutorial are black instead of white. I use a browser extension to change it and forgot to deactivate it for the creation of the pictures.

 

Main Installation:

 

Creating the Virtual Server in AWS

 

  • visite https://aws.amazon.com and sign in with your amazone account or create a new one.

     

  • after the signin just click on Start virtual mashine (EC2)

     

  • Select: Ubuntu Server 18.04 LTS (x86 versino)

 

 

image.png.730ef4d515f6dc22b9efb8db033fc900.png

 

 

  • Select: General purpose (t2.micro) which is free for the first 12 month

image.png.ad832c95ae0ae2b291848a7cdea74258.png

 

 

  • click on „Next: Review and Launch“ and on the 2nd page click on „Launch“

     

  • now you see a dialog to create a key pair which is necessary to connect to the Virtual server afterwards. Please make sure you never loose it or otherwise you will not be able to connect so the server.

 

image.png.4459bea64d458e00e3eb83d043084c2e.png

 

 

 

  • now the virtual server is starting up which can take a few minutes

     

  • now click on „View Instances“

     

  • if you plan to use more instances you can now name the instance by clicking on empty space in the first row

 

image.png.5aa8e66b26f60f6e5fc024ee9139fea7.png

 

 

  • mark the newly created virtual server so you see the configuration of it down below

     

  • at first we need to configure an inbound firewall rule so we will later be able to access the LWT website

     

  • click on launch- wizard at the security groups entry

image.png.e5507b1d7f15596ff9afaea464d9ca52.png

 

 

 

  • now we need to click on the inbound tab and on edit to add a new rule for the Type HTTP. and as a source we select anywhere. You can click on save when you are done.

image.png.357101fd9c4f537294401bb5b471897a.png

 

 

  • now please select Instances on the lift site and mark our LWT Virtual Server

     

  • Our last step on the AWS website is to copy our Public DNS(IPv4) address so we can finally connect to our server and start with the configuration.

image.png.9cd1498b5c6e935235e240dfe2bb4c2b.png

 

 

 

Connecting to the Server

 

Now the priviewsly created and securely saved keyfile will get important because without it we an not connect to the server. If you do not know anymore where you saved it you can search for it with „*.pem“. If you still can not find it you have to start from the beginning.

You also need to add an „ubuntu@“ infront of our DNS- Address. In our example it would be:

[email protected]

 

Linux:

To connect from a linux Server is fairly simple. We just need to insert the folowing command:

chmod 600 /path/to/key/file/key.pem
ssh -i /path/to/key/file [email protected]


 

Windows:

On windows we will use PuTTY to connect to our Server. You can download it from here:

https://putty.org/

 

After the installation we first need to convert the key.pem file into a key.ppk file.

Therefor you need to start the PuTTY key generator. Just writ putty in the start menu and you should see an „PuTTYgen“ entry. Just open it.

image.png.a11ae8c7be15cc539ff3cfa74d8845b7.png

 

 

  • now you need to click on Load to load your key.pem

     

  • now you need to select „All Files (*.*)“ and now you are able to select our „key.pem“ file

image.png.3c38ca1d4e9ac3811d74beea7ebbb60b.png

 

 

  • now you see a short notification that the file has been sucessfully imported and you can click on OK

     

  • now you can create the new file by clicking on „Save Private Key“. Click on yes that you want to save the file without an additional passprase and save it. Now you can close PuTTYgen.

     

  • Now we can start PuTTY

     

  • The first thing we have to do is to tell Putty where to find the „key.ppk“ file. Therefore plese extend on the left Connection > SSH > Auth and click on Browse on the right side to locate our key.ppk file

image.png.d4fd3d6601d21ba6a92d6f6ac85d251d.png

 

 

  • Now click back on Session and enter the Server and the username

    [email protected]

     

  • now enter a Session name and save it for the future. For example „LWT“

image.png.7f7029f225dffceacced2a23ffe457c1.png

 

 

  • now you can just couble click on the LWT- Entry and PuTTY will turn into a black onsole and connect to the server.

     

  • the first time you connect to the server you will get a wraring massage which you have to accept

image.png.c34b59b5c407ecc8ba7594f01009ca01.png

 

 

 

  • After connection you will see something like this:

image.png.6b176e2098a325c3bf397cc28496e3de.png

 

 

 

 

LWT- Server Configuration

 

You can find the original tutorial under „http://lwt.sourceforge.net/“ which was the basis to create this tutorial. I just needed to adjusted it a little bit to work with the AWS environment.

  • at first we should update the system, as ther are in our example right now 64 securty updates missing. This can take some minutes

    sudo apt-get update
    sudo apt-get upgrade
  • Now we can install the web- server and the database to install. Mind the caret (^) at the end of the second command. We also need to install the PHP mbstring extention otherwise we would just get a blank window for most of the LWT website

     

    sudo apt-get install lamp-server^
    sudo apt-get install php-mbstring
     

    Here are now some additional steps compared to the original tutorial.

    We need to create a new MySQL User and a Database for our new LWT environment. Please replace 'newuser' and 'password' with your desired one. Please keep the (') as they belong to the command. Please be aware that the permissions we grant full rights to the newly created database user so he can edit everything and we will also use it to connect to the databse. This is not really secure but I do not think that we have a high risk for our small LWT Server.

     

    sudo mysql
    CREATE USER 'newuser'@'localhost' IDENTIFIED BY 'password';
    CREATE DATABASE LWT;
    GRANT ALL PRIVILEGES ON *.* TO 'newuser'@'localhost' WITH GRANT OPTION;
    FLUSH PRIVILEGES;
  • Download the newest LWT version. You can download find the newest version under:

    https://sourceforge.net/projects/lwt/files/

     

  • right click the current version, as of right now it is LWT Version 1.6.2. In the context menu click on copy link location. Enter the following command:

    wget https://sourceforge.net/projects/lwt/files/lwt_v_1_6_2.zip/download

     

  • To extract the files, we first need to install „unzip“ and unzip it right afterwards. The command down below will create the folder lwt and removes the zip file.

    sudo apt-get install unzip
    unzip download -d lwt && rm download
  • Rename the file connect_xampp.inc.php to connect.inc.php.

     

    cd lwt
    mv connect_xampp.inc.php connect.inc.php
  • Now we have to edit the file connect.inc.php. We need to replace the entries for $userid, $password and the $dbname with the above created ones The entry $server stays as it is.

    nano connect.inc.php
    	$server = "localhost";
    	$userid = "newuser";
    	$passwd = "password";
    	$dbname = "LWT";

     

 

  • The last thing we need to do is to copy it in the apache directory.

    sudo mv /home/ubuntu/lwt /var/www/html
    sudo chmod -R 755 /var/www/html/lwt
    sudo rm /var/www/html/index.html
    sudo /etc/init.d/apache2 restart

     

 

 

Configure LWT

 

image.png.da361757ea21ba46d07a4a74f6edb808.png

 

 

  • The first thing we do is to install the „LWT demo database“ just by clicking on the entry ontop of the website.

 

  • As we do not have a database yet, you can click on „Install LWT demo database“ on the next site

image.png.aa399e45548cc8f5d88fc435e289f98b.png

 

 

  • Now click on LWT on the top of the page and you are back at the main window

image.png.21892d5c8bfd82937cd727f5ecd70ae9.png

 

 

  • The default Database already has Thai as a configured language, so we just need to deleate the rest of the languages to free up our statistic window. To delete a language you first have to delete all texts for it and then you can delete the language. So just click on „My Texts“ here you can select the language you want to delete (red rectangle) and delete all the texts for it (blue rectangle). After you deleted all texts for all languages but Thai, you can click on LWT on the top. Now you need to click on „My Terms (Words and Expressions“ and delete all the words for the languages. This works nearly the same as before but you should just use the „Mark All“ (green rectangle button and then choose „Delete Marked Terms“ in the drop down menue (yellow rectangle Fig2)

 

 

 

 

Fig1

image.png.20c4dc614fe0fde4b3bea82c4aff13d0.png

 

 

Fig2

image.png.6d410f36a3f93be501361d44298e058a.png

 

 

  • Have fun with LWT on every device, everywhere!

 

 

 

 

 

Text seperator

 

I am not a great programmer but I also created this short python script to insert spaces between the Thai words. It is not perfect and sometimes seperates a bit too. So it takes a bit fixing afterwards but it is a start when you have longer texts from a website. As of right now I do not have a website created for it, so you can only use it with a python Shell or I would suggest to PyCharm which in my oppinion is very convinient.

You need to install the following:

  • Python3.7

  • PyCharm, or the IDE of your choice

  • the following Python libraries:

    • matplotlib

    • deepcut

 

Installation:

Linux:

  • check your python version and make sure it is Python 3.7

    python –version

     

 

  • install Pycharm with the Packet manager of your distro

 

Windows:

 

  • Download Pycharm from https://www.jetbrains.com/pycharm/download/#section=windows

  • Make Sure you download the Community version

  • Start the installation and follow the instructions the default values should be fine

  • After the installation start it up.

  • Do not import anything

  • Accapt the licence

  • Choose the Theme you like and klick on „Next: Featured plugins

  • Optional: install of the BashSupport. It is not necessary but I like to have the shell

  • click on „Start using PyCharm

     

 

Configuration:

 

  • Create a new Project

image.png.51624a64d0f0298a2757baa10c04a56c.png

 

 

  • Now choose the directory and name it. In this example I call it ThaiTextParser

  • expand on „Project Interpreter: New Virtual environement

  • make sure that you crate a new virtual environment

  • click on „Create

image.png.121574b0bf5c18f4854eefc9eed6951f.png

 

 

  • Now you see the PyCharm default window

  • on the left site you can extend the ThaiTextParser folder and create two new files by right clicking it

image.png.f9db70568fb5c77596ac12f95b230205.png

 

 

  • the first file we create is a Python file which will contain the source coe

    • just call it Parser

    • you will get a new Enty called Parser.py

  • the second file will be a normal text file which will contain the Thai text we want to separate

    • called ThaiSource

    • on the next window choose Text and click on OK

 

  • now make sure that the Parser.py Tab is activated

image.png.3d74fe90a06d667c5a4b3fe6dc4a3cb6.png

 

 

  • now we need to install the libraries we mentioned above. We can do that via the Settings of PyChart. Just click on File > Settings

  • now we expand „Project: ThaiTextParser“ and select „Project Interpreter

  • To install packets we just click on the „+“ on the right site and install the following Packets:

    • tensorflow

    • deepcut

image.png.338e5fb73d688f87d69548a9cd797fae.png

 

image.png.c5a6f9bb637ef46e535128f6c271247a.png

 

 

 

  • Now insert the following source code into the file Parser.py:

    import deepcut as dp
    
    
    # Read in Thai Text and also replaces the space which is usually used in thai as a sentence seperater with a dot
    text = open(r"ThaiSource", encoding="utf8").read()
    text = text.replace(' ', '. ')
    
    # Separate te Text with Deepcut
    separated = dp.tokenize(text)
    separated = str(separated)
    
    # Format the text for Deepcut
    separated = separated.replace("'", "")
    separated = separated.replace(",", "")
    separated = separated.replace('\\n', '\n')
    separated = separated.replace('\\t', '\t')
    separated = separated.replace('   ', '')
    separated = separated.replace(' .', '. ')
    
    # Print the separated Text
    print("------------")
    print(separated)

     

 

  • Now we can paste our Thai text (chapter 21 of Maanii) in the ThaiSource file and separate it by running the script.

    • File > Run

image.png.ce0e3735d2e08f2feb9857e285f9c1ec.png

 

 

 

  • Now you can copy your parsed text and paste it into LWT and start learning. Unfortunatly the beginning of the text starts with „\ufeff“ which needs to be ignored when we copy the text. I did not find a way yet to remove it automaticly.The errors occure, I guess, because deepcut uses commands which are soon to be removed from tensorflow but as of right now it works. Just dont update the virtual environment.

 

 

 

image.png.5163480641205874feb5c9938a35b6be.png

 

 

 

 

 

Link to comment
Share on other sites

  • 1 month later...

Archived

This topic is now archived and is closed to further replies.

  • Recently Browsing   0 members

    • No registered users viewing this page.




×
×
  • Create New...