Skip to content

Commit

Permalink
fix tables in Ziang's post (remove empty columns and rows)
Browse files Browse the repository at this point in the history
  • Loading branch information
hertelm committed Apr 26, 2021
1 parent 8df53c8 commit b9003fe
Showing 1 changed file with 8 additions and 9 deletions.
17 changes: 8 additions & 9 deletions content/post/project-android-keyboard.md
Original file line number Diff line number Diff line change
Expand Up @@ -187,19 +187,18 @@ To train n-gram models, I used corpuses about web text and tweets from <a>https:
<br>

||__number of characters__|__number of words__|__number of sentences__|__number of documents__|__topics__|
| --- | --- | --- | --- | --- | --- | --- |
| --- | --- | --- | --- | --- | --- |
|corpus from tweets|1264807|223201|50070|3|negative, positive and politic tweets|
|corpus from web |1469355|255328|57425|4|firefox, overheard, singles, wine|
<br>

<center> Grams Info (after deleting some grams that appear only once or twice in corpus) </center>
<br>

|__gram__|__amount__|
|---|---|---|
|---|---|
|unigram|9295|
|bigram |21561|
|trigram|10091|
<br>

Before we have talked about the aim and construction of a **q-gram**. Given a word **w**, we need to find all words from a dictionary whose **PED** fulfills the threshold **delta**. To reduce the response time, we should compute **q-grams** of all words in advance from the dictionary, and once a query is executed, we will compute the number of common grams between **w** and all other words from the dictionary. <br> <br>
To minimize the intern storage of app and its startup delay, the total number of words from a dictionary has been limited into 10000. Hence, for this dictionary I used 10000 most common used English words from <a> www.mit.edu/~ecprice/wordlist.10000 </a>. Another issue is that some words from corpus may not be included in the dictionary. Therefore, after keeping the words which appear both in dictionary and corpus, I removed 4400 words from the dictionary which never appeared in the corpus and added 4400 new words by their frequencies into the dictionary from the corpus.
Expand Down Expand Up @@ -287,7 +286,7 @@ At last, I want to show how the trainset from a corpus could adapt for different


| __ALPHA (punishment)__ | __Reduced steps in web (5%)__ |
| --- | --- | --- |
| --- | --- |
| 0.0 |27.10%|
| 0.0005|36.04%|
| 0.005 |41.16%|
Expand All @@ -309,7 +308,7 @@ Before we have talked about how the punishment value alpha could help filtering


| |__Web(small)__ | __Web (5%)__ | __Tweets(small)__|__Tweets(5%)__|
| --- | --- | --- |--- | --- | ---|
| --- | --- | --- |--- | --- |
|__API30__ |43.19% | -- |40.93% |-- |
|__ZKeyboard__|43.00% |41.20% |43.72% |38.59% |

Expand All @@ -326,7 +325,7 @@ Test set: 5% contents from web, 5% contents from tweets, 100 sentences from 5% o
<br>

| |__Web(small)__ | __Web (5%)__ | __Tweets(small)__|__Tweets(5%)__|
| --- | --- | --- |--- | --- | ---|
| --- | --- | --- |--- | --- |
|__API30__ | 21.00% | -- | 18.60% |-- |
|__ZKeyboard__| 21.35% | 23.62% | 24.06% | 20.71% |

Expand All @@ -342,7 +341,7 @@ For the evaluation of autocorrection, the first letter of every word whose lengt
<br>

| | __web (5%)__ |__tweets(5%)__|
| --- | --- | --- |--- |
| --- | --- | --- |
| __95% web__ |41.62% |31.33% |
| __95% tweets__ |33.83% |39.27% |
| __95% tweets + web__|41.20% |38.59% |
Expand All @@ -367,4 +366,4 @@ At last, keyboard should memorize user’s input so that most common typed words

# <a id="Summary"></a> Summary

With the help of n-gram model, Prefix Edit Distance and q-gram Index, we have developed such a smart keyboard (ZKeyboard) which could give relatively accurate corrections and completions. And compared with API30 keyboard, Zkeyboard does not bad not only in completion but also in spelling correction. But we still have seen many aspects which need to be improved such as ignored grammar rules, limit of storage, accuracy of n-gram model and so on. To make a keyboard give more accurate corrections and completions efficiently, we need more complex language models and do everything potential to improve the performance of the keyboard.
With the help of n-gram model, Prefix Edit Distance and q-gram Index, we have developed such a smart keyboard (ZKeyboard) which could give relatively accurate corrections and completions. And compared with API30 keyboard, Zkeyboard does not bad not only in completion but also in spelling correction. But we still have seen many aspects which need to be improved such as ignored grammar rules, limit of storage, accuracy of n-gram model and so on. To make a keyboard give more accurate corrections and completions efficiently, we need more complex language models and do everything potential to improve the performance of the keyboard.

0 comments on commit b9003fe

Please sign in to comment.