GH-22: update doc

undertheseanlp · Jan 15, 2019 · 36248ab · 36248ab
1 parent 7a0fd58
commit 36248ab
Show file tree

Hide file tree

Showing 4 changed files with 58 additions and 17 deletions.
diff --git a/docs/acl2017.sty b/docs/acl2017.sty
@@ -402,14 +402,7 @@
   \vskip 0.3in plus 2fil minus 0.1in
 }}
 
-% margins for abstract
-\renewenvironment{abstract}%
-		 {\centerline{\large\bf Abstract}%
-		  \begin{list}{}%
-		     {\setlength{\rightmargin}{0.6cm}%
-		      \setlength{\leftmargin}{0.6cm}}%
-		   \item[]\ignorespaces}%
-		 {\unskip\end{list}}
+
 
 %\renewenvironment{abstract}{\centerline{\large\bf  
 % Abstract}\vspace{0.5ex}\begin{quote}}{\par\end{quote}\vskip 1ex}
@@ -444,7 +437,6 @@
  \sfcode`\.=1000\relax}
 \let\endthebibliography=\endlist
 
-
 % Allow for a bibliography of sources of attested examples
 \def\thesourcebibliography#1{\vskip\parskip%
 \vskip\baselineskip%

diff --git a/docs/technique_report.bib b/docs/technique_report.bib
@@ -1,3 +1,32 @@
+@inproceedings{DBLP:conf/lrec/NguyenNVDJ18,
+  author    = {Dat Quoc Nguyen and
+               Dai Quoc Nguyen and
+               Thanh Vu and
+               Mark Dras and
+               Mark Johnson},
+  title     = {A Fast and Accurate Vietnamese Word Segmenter},
+  booktitle = {Proceedings of the Eleventh International Conference on Language Resources
+               and Evaluation, {LREC} 2018, Miyazaki, Japan, May 7-12, 2018.},
+  year      = {2018},
+  crossref  = {DBLP:conf/lrec/2018},
+  timestamp = {Tue, 21 Aug 2018 17:09:46 +0200},
+  biburl    = {https://dblp.org/rec/bib/conf/lrec/NguyenNVDJ18},
+  bibsource = {dblp computer science bibliography, https://dblp.org}
+}
+
+@INPROCEEDINGS{7800279,
+author={T. Nguyen and A. Le},
+booktitle={2016 IEEE RIVF International Conference on Computing Communication Technologies, Research, Innovation, and Vision for the Future (RIVF)},
+title={A hybrid approach to Vietnamese word segmentation},
+year={2016},
+volume={},
+number={},
+pages={114-119},
+keywords={linguistics;natural language processing;pattern classification;regression analysis;text analysis;word processing;hybrid approach;Vietnamese word segmentation;Vietnamese language processing;word-segmented text;NLP;Asian languages;word boundary;Vietnamese texts;logistic regression;binary classifier;2-syllable words;Dictionaries;White spaces;Logistics;Particle separators;Hidden Markov models;Classification algorithms;Prediction algorithms},
+doi={10.1109/RIVF.2016.7800279},
+ISSN={},
+month={Nov},}
+
 @inproceedings{Lafferty:2001:CRF:645530.655813,
  author = {Lafferty, John D. and McCallum, Andrew and Pereira, Fernando C. N.},
  title = {Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data},

diff --git a/docs/technique_report.pdf b/docs/technique_report.pdf
diff --git a/docs/technique_report.tex b/docs/technique_report.tex
@@ -9,6 +9,8 @@
 \usepackage{booktabs}
 \usepackage{amsmath}
 \usepackage[utf8]{vietnam}
+\usepackage{titlesec}
+
 
 \aclfinalcopy % Uncomment this line for the final submission
 
@@ -23,10 +25,27 @@
 }
 \date{}
 
+% margins for abstract
+\renewenvironment{abstract}%
+		 {\centerline{\large\bf Tóm tắt}%
+		  \begin{list}{}%
+		     {\setlength{\rightmargin}{0.6cm}%
+		      \setlength{\leftmargin}{0.6cm}}%
+		   \item[]\ignorespaces}%
+		 {\unskip\end{list}}
+
+
+
+
 \begin{document}
+
+
+
 \maketitle
 \begin{abstract}
+
 Trong báo cáo này, chúng tôi mô tả chương trình tách từ tiếng Việt, được tích hợp trong phiên bản underthesea phiên bản 1.1.12.
+Các công trình nghiên cứu trước đã rất thành công trong bài toán tách từ, chúng tôi muốn nghiên cứu lại sự hiệu quả của phương pháp Conditional Random Fields trong bài toán này. Sau đó xây dựng hệ thống tách từ hoàn chỉnh.
 Mã nguồn của chương trình được open source tại \href{https://github.com/undertheseanlp/word_tokenize}{github}.
 
 \end{abstract}
@@ -56,9 +75,9 @@ \subsection{Thuật toán Conditional Random Fields}
 trong đó, $f_k (s_{t-1},s_t,o,t)$ làm một hàm đặc trưng ứng với trọng số $\lambda_k$, được học thông qua quá trình huấn luyện.
 
 \subsection{Features}
-We propose conditional random fields for this problem.
 
-Our final features
+Các đặc trưng được đề xuất
+
 \begin{center}
 \begin{tabular}{ |c|c| }
  \hline
@@ -74,23 +93,24 @@ \subsection{Features}
 
 \section{Thực nghiệm}
 
-\subsection{Data sets}
+\subsection{Dữ liệu}
+
+Để so sánh độ chính xác của chương trình. Chúng tôi sử dụng sử dụng bộ dữ liệu đã được sử dụng trong \citet{DBLP:conf/lrec/NguyenNVDJ18} và \citet{7800279}. Dữ liệu huấn luyện gồm 75 nghìn câu được lấy từ dữ liệu huấn luyện của bài toán tách từ trong VLSP 2013. Dữ liệu kiểm thử gồm 2120 câu lấy từ bộ dữ liệu gán nhãn từ loại trong VLSP 2013.
 
-Dữ liệu huấn luyện gồm 75 nghìn câu được lấy từ dữ liệu huấn luyện của bài toán tách từ trong VLSP 2013. Dữ liệu kiểm thử gồm 2120 câu lấy từ bộ dữ liệu gán nhãn từ loại trong VLSP 2013.
+\subsection{Chỉ số đánh giá}
 
-\subsection{Evaluation Measures}
+Chúng tôi sử dụng precision, recall và f1 làm các chỉ số đánh giá.
 
-We used Precision, Recall, F1 score as evaluation measures.
 
 $$F_1 = \frac{2*P*R}{P + R}$$
 
-where P (Precision), and R (Recall) are determined as follows:
+trong đó P (Precision), và R (Recall) được định nghĩa như sau:
 
 $$P = \frac{{NE}_{true}}{NE_{sys}}$$
 
 $$R = \frac{{NE}_{true}}{NE_{ref}}$$
 
-where
+với
 
 $NE_{true}$: The number of NEs in gold data