-
Notifications
You must be signed in to change notification settings - Fork 0
/
sentiment_introduction.tex
517 lines (464 loc) · 28.7 KB
/
sentiment_introduction.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
% FILE: sentiment.tex Version 0.0.1
% AUTHOR: Uladzimir Sidarenka
% This is a modified version of the file main.tex developed by the
% University Duisburg-Essen, Duisburg, AG Prof. Dr. Günter Törner
% Verena Gondek, Andy Braune, Henning Kerstan Fachbereich Mathematik
% Lotharstr. 65., 47057 Duisburg entstanden im Rahmen des
% DFG-Projektes DissOnlineTutor in Zusammenarbeit mit der
% Humboldt-Universitaet zu Berlin AG Elektronisches Publizieren Joanna
% Rycko und der DNB - Deutsche Nationalbibliothek
% \part{Sentiment Analysis}
\chapter{Introduction to Sentiment Analysis}\label{chap:introduction}
Interpersonal communication is not only a way to share objective
information with other people but also a vibrant channel to convey
one's subjective feelings, impressions, and attitudes. It is, in
fact, this latter use that provides a personal touch to our
conversations, making them more grasping, more entertaining, and more
living. And it is often this use that significantly influences our
preferences, decisions, and choices in everyday life. Therefore, a
high-quality mining of people's opinions is often as important as
retrieval of objective facts.
The field of knowledge that deals with the analysis of emotions,
sentiments, evaluations, and attitudes is called \emph{sentiment
analysis} (SA) \citep{Liu:12}. The definition of this discipline,
however, much like the definition of the term \emph{sentiment} itself,
is neither complete nor universally accepted. The main reasons for
this are
\begin{inparaenum}[(i)]
\item a frequently blurred boundary between subjective and objective
parts of information and
\item the heterogeneity of the language system itself, to which the
SA methods are applied.
\end{inparaenum}
The first factor, for instance, makes it difficult to delimit which
statements actually belong to the jurisdiction of opinion mining and
which ones should be ignored by its systems. A prominent example of
such borderline cases are the so-called subjective facts, such as
``terrorist attacks'' or ``anti-cancer drugs,'' which some people
consider as polar terms, while others regard them as objective
expressions.
The second factor complicates a precise definition of sentiment
analysis because different language levels have their own notions of
subjectivity (\eg{} a positive word is not the same as a positive
text), which in turn necessitate different approaches. Depending on
the analyzed linguistic level, researchers typically distinguish three
main types of SA\@:
\begin{itemize}
\item\emph{subsentential}, whose task is to determine polarities of
single words and find opinions within a sentence,
\item\emph{sentential}, which tries to predict the semantic
orientation of a statement,
\item and, finally, \emph{suprasentential}, which analyzes the
polarity of the whole text.
\end{itemize}
Each of these types has its own specifics, and each of them needs to
be addressed with its unique methods. Therefore, speaking of general
difficulty of opinion mining for a specific domain is in the same way
wrong as judging about the amenability of this domain to the whole
natural language processing: one needs to specify a particular task
and evaluate it with its own metrics. Thus, to estimate the
complexity of sentiment analysis for German Twitter, we will address
all three levels of SA\@: subsentential, sentential, and
suprasentential.
\section{Prehistory of the Field}
But before we delve into the depths of contemporary sentiment
research, let us first make a digression into the history of this
field in order to understand its modern trends and theories better.
Like many other scientific disciplines, opinion mining has emerged
from several other areas of research including philosophy, psychology,
cognitive sciences, narratology, and linguistics. In
\emph{philosophy}, the questions about the nature of emotions, their
interaction with human consciousness, and the influence on people's
deeds have occupied the minds of many great scholars, starting from
Plato and Aristotle. Plato, for instance, argued that the human soul
consists of three fundamental parts: the rational, the appetitive, and
the passionate \citep[see][Book~IV]{Plato:91}. The last part (the one
by which we become angry or fly into a temper) determines our notion
of justice by favoring either the rational or the appetitive aspect.
\citet{Aristotle:54}, the most prominent student of Plato, extended
this idea by providing a precise taxonomy of feelings that, in his
opinion, constitute the passionate part of the mind.
As noted by \citet{Sousa:14}, the variety and complexity of phenomena
covered by the term ``emotion'' discouraged tidy philosophical ideas
and was daunting the researchers for many hundred years since
antiquity. A real renaissance of emotional studies happened in the
late 19-th century in \emph{psychology} with the introduction of the
James-Lange theory~\cite{James:1884,Lange:1885}, which argued that
biological processes were the main and only reason for people's
subjective opinions. This theory, however, was later criticized by
\citet{Schachter:62}, who objected that bodily means alone were
insufficient to express the full range of possible feelings, and that
cognitive factors were a key determinant of emotional states.
%% In the 1920s, this theory was severely criticized by Philip~Bard
%% and Walter~B.~Cannon \citep{Bard:28,Cannon:31}, who proposed an
%% alternative interpretation of emotions, refuting their connection
%% to the viscera and arguing that both bodily reactions and
%% subjective feelings were generated simultaneously in the talamic
%% region of the brain. Later on, their conception was in turn put
%% into question by Stanley Schachter and Jerome E. Singer
%% \citep{Schachter:62}, who claimed that bodily means alone were
%% insufficient to express the full range of possible feelings and
%% that cognitive factors could be major determinants of emotional
%% states. To prove this hypothesis, the authors conducted a series of
%% experiments in which they administered epinephrine to human
%% subjects and put them into a room with a specially instructed
%% stooge. Their study showed that the emotional reaction of the
%% participants crucially depended on the behavior of the stooge
%% during the stay.
These advances in psychology, reinforced by the nascent appraisal
theory \citep{Arnold:60}, have significantly influenced many other
scientific fields including \emph{literary studies} and
\emph{linguistics}. Among the most prominent representatives of this
direction in the former discipline were \citet{Rorty:80} and
\citet{Banfield:82}, who analyzed how opinions were expressed by
direct and indirect speech. \citet{Wiebe:90a,Wiebe:94} adapted
\citeauthor{Banfield:82}'s theory to the needs of \emph{computational
linguistics} by proposing an algorithm that identified subjective
sentences in text and inferred the main characters of a narrative from
such sentences. This work was presumably the first attempt to
automatically detect sentiments on the sentential level.
A real breakthrough in the opinion mining field, however, happened
with the introduction of the first sufficiently big corpora.
Important contributions in this regard were made by
\citet{Pang:04,Pang:05}, who released a dataset of $\approx2,000$
movie reviews with their star ratings; \citet{Hu:04}, who presented a
manually labeled set of Amazon and C|Net product comments;
\citet{Thomas:06}, who automatically labeled a collection of
congressional debates; and, finally, \citet{Wiebe:05}, who developed a
manually annotated sentiment corpus of 535 news articles.
The availability of these resources has given rise to a plethora of
new methods for both subsentential and sentential SA, making opinion
mining one of the most challenging and competitive branches of
computational linguistics. Fundamental cornerstones in this field
have already been set by the works of \citet{Pang:02},
\citet{Wiebe:05}, \citet{Wilson:05}, \citet{Breck:07},
\citet{Choi:09,Choi:10}, and \citet{Socher:11, Socher:12}.
Nevertheless, many challenges of sentiment research, such as domain
adaptation or analysis of non-English texts, still pose considerable
difficulties.
\section{Sentiment Analysis of Social Media}
One of the main problems that people working on opinion mining are
usually confronted with in the first place is the choice of the domain
to deal with. Since sentiment analysis is a highly domain-dependent
task \citep[see][]{Aue:05,Blitzer:07,Li:08}---\ie{} systems trained on
one text genre can hardly generalize to other linguistic
variations---a natural question that arises in this context is which
of the domains should be addressed first in this case.
While earlier sentiment works were primarily concerned with narratives
\citep{Wiebe:90a,Wiebe:94} or newspaper texts
\citep{Wiebe:03,Wiebe:05,Bautin:08}, it soon became clear that social
media provide a much more fertile ground for mining people's
attitudes. Among the first who tried to extract users' opinions from
online forums were \citet{Das:01}. For this purpose, they manually
annotated a collection of 500 messages from an economic chat board
with three polarity classes (\emph{buy}, \emph{sell}, and \emph{null})
and then trained five different classifiers on these data. Using the
trained systems, the authors classified the semantic orientation of
the remaining 25,000 forum posts, trying to predict stock prices based
on the polarities of these snippets.
%% This study revealed that user chats strongly correlated with stock
%% developments in a reactive way: changes in stock prices had
%% typically lead to fluctuations of public opinions but not vica
%% versa.
Automatic business intelligence has rapidly won the ground in the
opinion mining field, with further notable works introduced by
\citet{Glance:05}, who analyzed users' opinions on Usenet with
hand-crafted rules; \citet{Antweiler:04}, who investigated how
postings on message boards correlated with stock volatility;
\citet{Ghose:07}, who examined the effect of opinions on pricing in
online marketplaces; and, finally, \citet{Turney:02}, who classified
Epinions reviews into \emph{recommended} (thumbs up) and \emph{not
recommended} (thumbs down) ones based on the pointwise mutual
information of their adjectives.
%% \citet{Glance:05} also used sentiment analysis for business
%% intelligence purposes by mining users' opinions on Usenet newsgroups
%% with a set of hand-crafted rules. Further notable works in this
%% direction include those of \citet{Antweiler:04}, who investigated how
%% postings on stock message boards correlated with stock volatility;
%% \citet{Gruhl:05}, who studied the cross-correlation between online
%% mentions of products and sales spikes for that commodities, also
%% proposing an algorithm for tracking such correlations and making
%% predictions about products' future sales automatically; finally,
%% \citet{Ghose:07} examined the effect of opinions on the pricing power
%% of merchants in online marketplaces, claiming that more positive
%% opinions on the Web entitled traders to increase their selling prices.
%% A separate line of research that is closely related to both commercial
%% intelligence and sentiment analysis of social media involves opinion
%% mining of product reviews. A pioneering work in this direction was
%% done by \citet{Turney:02}, who performed a two-class classification of
%% 410 Epinions comments, dividing customers' reviews of automobiles,
%% banks, movies, and travel destinations into \emph{recommended} (thumbs
%% up) and \emph{not recommended} (thumbs down) ones. In order to make
%% these decisions, the proposed system first computed the sum of the
%% pointwise mutual information (PMI) scores between the adjectival and
%% adverbial phrases occurring in a review text and the word
%% ``excellent,'' and then subtracted from this the PMIs of the found
%% phrases in conjunction with the word ``poor.'' Reviews with negative
%% results were subsequently considered as \emph{thumbs down}, and
%% comments with non-negative total scores were correspondingly
%% classified as \emph{thumbs up}.
%% According to \citet{Turney:02}, the most challenging type of reviews
%% to analyze in these experiments were the movie critiques (the accuracy
%% on this subset only reached 66\%, whereas, for banks and automobiles,
%% it attained 80 and 84\% respectively). This finding was also
%% confirmed by \citet{Pang:02} who tried out several machine learning
%% classifiers on a manually annotated corpus of $\approx2,000$ write-ups
%% from the Internet Movie Database (IMDb). The best results (82.9\%
%% accuracy) for the two-class classification task (\emph{positive}
%% versus \emph{negative}) were attained by an SVM system which used the
%% presence of unigrams as classification features.
Due to its high commercial impact, sentiment analysis of customer
feedback soon became one of the most popular topics in natural
language processing. \citet{Dave:03}, for example, classified users'
comments on Amazon as positive or negative with the help of SVM and
Na\"{\i}ve Bayes systems. \citet{Hu:04} developed a three-stage
application that produced concise summaries of positive and negative
opinions about each particular product feature. \citet{Funk:08}
proposed a supervised SVM classifier that predicted the polarity of
product reviews and then used these results in a business intelligence
application. Other important contributions were made
\citet{Popescu:05}, \citet{Ding:09}, \citet{Wei:10},
\citet{Mukherjee:12}, etc.
%% The authors found that replacing certain entities, such as product
%% names, abbreviations, or numerals, with special meta-tokens and
%% using token n-grams of different lengths had a significant positive
%% effect on the accuracy of both classifiers.
Although opinion analysis of product reviews still plays an important
role in e-commerce, the increased popularity of the blogosphere and
social networks has motivated many sentiment researchers to shift the
focus of their work to these new Web genres. Among the first who
followed the new trend were \citet{Mishne:05} and \citet{Mishne:07},
who tried to predict the moods of LiveJournal blogs (\eg{}
\emph{amused}, \emph{tired}, \emph{happy}), achieving 58\% accuracy
with an SVM classifier.
%% The gold labels for their $\approx$~815,000 blog corpus were obtained
%% automatically from blog labels on the Livejournal website. On this
%% corpus, the authors trained a supervised SVM classifier, reaching an
%% average of 8\% improvement over the 50\% baseline.
Another SVM system was used by \citet{Chesley:06}, who classified
users' blogs into positive, negative, and objective ones,
outperforming the majority class baseline by 15\% with this method.
Drawing on these works, \citet{Gill:08} analyzed the agreement of
human experts on blogs' moods, finding that the annotators had a much
better consensus about feelings that were described in longer blogs.
%% This research direction was further continued by \citet{Godbole:07},
%% who applied a lexicon-based method to detect users' opinions in blogs;
Speaking of text length, we should certainly say that the inception of
the micro-blogging service Twitter in 2006 was a real game changer to
the whole opinion mining field. The sudden availability of huge
amounts of data, the presence of all possible social and national
groups, combined with the uniqueness of the language on this service,
have given rise to numerous scientific projects, studies, and
publications, which we will briefly summarize in the next section.
\section{Sentiment Analysis of Twitter}\label{snt:subsec:intro:saot}
One of the first attempts to automatically classify users' opinions on
Twitter was made by \citet{Go:09}, who collected a set of 1.6~M
microblogs containing emoticons. Considering these smileys as noisy
labels (positive or negative), the authors trained three different
classifiers (Na\"{\i}ve Bayes, Maximum Entropy, and SVM), obtaining
the best results (0.82~\F{}) with the SVM system. Similar approaches
were taken by \citet{Pak:10}, who performed a three-class prediction
with a Na\"{\i}ve Bayes system, and \citet{Davidov:10}, who trained a
$k$-NN--like classifier on weakly supervised data. Another way to
create a sentiment corpus was proposed by \citet{Barbosa:10}, who
analyzed a set of 200,000 microblogs with three publicly available
opinion mining services and then used the majority votes of these
systems as silver labels for their dataset.
Some time later, \citet{Kouloumpis:11} experimented with the AdaBoost
classifier on the noisy collection of \citet{Go:09} and the Edinburgh
Twitter corpus of \citet{Petrovic:10}, coming to the conclusion that
microblog-specific features (such as presence of intensifiers,
abbreviations, or emoticons) were the most reliable attributes for
this classification. \citet{Agarwal:11}, however, questioned this
finding, arguing that POS-specific polarity features were a better
alternative.
As in the case of opinion mining of product reviews, sentiment
analysis of Twitter could not go unnoticed by the economic and
sociological communities. An attempt to address political issues
using this platform was made by~\citet{Tumasjan:10}, who analyzed
users' opinions about German federal elections in 2009 by
automatically translating 100,000 tweets into English and subsequently
classifying these messages with the proprietary LIWC
software~\cite{Pannebaker:07}. This study showed that not only
sentiments but even the mere numbers of microblogs mentioning
political parties strongly correlated with the results of election
polls.
%% One of the first attempts to analyze tweets from these perspectives
%% was made by \citet{Jansen:09}, who studied microblogs' power as
%% online word of mouth, finding that 20\% of the tweets mentioning
%% renowned companies expressed some polar opinions about the products
%% of these brands. A manual inspection of messages mentioning
%% renowned company brands revealed that almost 20\% of these tweets
%% expressed some polar opinions about their products. More than a
%% half of these opinionated microblogs were positive judgements,
%% whereas 35\% of the posts represented negative sentiments. These
%% findings strongly correlated with the results of an automatic
%% analysis, suggesting that social media in general and microblogging
%% in particular could be a valuable source of economic evidence.
Nevertheless, the real rise of interest in this domain happened with
the release of the Sem\-Eval corpus~\cite{Nakov:13}, a collection of
15,000 tweets that have been manually annotated by Amazon mechanical
turkers with their message-level polarities and contextual semantic
orientations of their polar terms. The best performing system in the
first run of the SemEval competition was a supervised SVM classifier
of \citet{Mohammad:13}, which won three out of four subtasks
(message-level classification of SMSs and tweets and contextual
polarity prediction of polar terms in Twitter). This solution relied
on an extensive set of hand-crafted features, powered by multiple
manually- and automatically-generated sentiment lexicons. The authors
emphasized the crucial importance of lexical resources, which
increased the classification scores by almost $8.5\%$. Other
competing submissions~\cite{Becker:13,Guenther:13,Kokciyan:13} used a
similar approach, but had considerably fewer feature attributes.
The success of this shared task, which had more than 40 participants,
motivated the organizers to continue the competition. With slight
modifications (addition of new tweets, inclusion of sarcastic
microblogs and LiveJournal sentences), they rerun both subtasks in the
following four
years~\cite{Rosenthal:14,Rosenthal:15,Nakov:16,Rosenthal:17},
attracting more and more competitors every time.\footnote{A detailed
overview of these iterations is provided in
Chapter~\ref{chap:cgsa}.}
%% In \citeyear{Rosenthal:15}, three more subtasks were included in the
%% competition:
%% \begin{inparaenum}[(i)]
%% \item polarity classification of messages pertaining to specific
%% topics,
%% \item topic-based trend prediction, and
%% \item determining prior polarity of lexical terms found in tweets.
%% \end{inparaenum}
%% A much harder subtask in this run, on which only three of seven
%% participants could outperform the majority class baseline, turned out
%% to be the classification of message polarities towards specific topics
%% (track C). The winner of this subtask -- a system proposed by
%% \citet{Boag:15} -- attained a macro-averaged \F{}-score of 50.51\%.
%% This approach again relied on a supervised SVM classifier and used an
%% extensive set of hand-crafted features, including raw and normalized
%% bag-of-words (BOW), the number of positive and negative emoticons, the
%% number of question and exclamation marks, average and maximum
%% sentiment scores of single words occurring in a tweet, and many
%% others. The BOW features along with the lexicon-derived information
%% were found to be the most useful classification attributes in these
%% experiments. A dedicated preprocessing pipeline with spelling
%% correction and Twitter-specific tokenization also proved to have a
%% positive effect on the final results.
%% \todo[inline]{incorporate related work from corpus chapter}
%% This scarceness of suitable resources becomes even more aggravated
%% when speaking about Twitter corpora for non-English languages. A few
%% attempts at creating opinion datasets for microblogs that we are aware
%% of are the sentiment subset of the TWITA corpus \cite{Basile:13} and
%% the Senti-TUT corpus of \citet{Bosco:13} created for Italian, as well
%% as the TASS shared task data \cite{Villena-Roman:13} developed for
%% Spanish.
%% The TWITA collection was used to create a smaller subset of 2,000
%% tweets, which were subsequently manually labeled with their
%% message-level polarities by three human coders. The Senti-TUT
%% tweebank comprises 3,288 microblogs pertaining to the election of
%% Mario Monti and 1,159 messages obtained from the Twitter section of
%% the popular Italian web portal \url{http://www.spinoza.it}, which were
%% labeled by five annotators with the message-level polarity classes:
%% positive, negative, ironic, mixed, or none. A different set of
%% sentiment valences (viz., strong negative, negative, neutral,
%% positive, and strong positive) was used for 70,000 messages of the
%% Spanish TASS corpus.
%% To the best of our knowledge, the only sentiment corpus for German
%% Twitter that existed at the time of creating our dataset was the
%% collection of 100,000~messages pertaining the German federal elections
%% 2009 gathered by \citet{Tumasjan:10}. These tweets were automatically
%% translated into English and annotated with 12 psychological categories
%% (future orientation, past orientation, positive emotions, negative
%% emotions, sadness, anxiety, anger, etc.) using the LIWC program
%% \cite[Linguistic Inquiry and Word Count;][]{Pannebaker:07}.
As you can see, despite its relatively short history, sentiment
analysis of Twitter has already received much attention from NLP
researchers. But, with a few
exceptions~\cite[\eg{}][]{Basile:13,Bosco:13,Araque:15,Cesteros:15},
most of existing works were primarily concerned either with English
messages or with automatically translated microblogs. In the
following chapters, we will explore whether conclusions that have been
drawn from English data (the difficulty of the Twitter domain for
automatic SA, the utility of sentiment lexicons, and the need to
normalize the text input) apply to German tweets as well.
%% However, in order to prove or disprove these hypotheses, a vital
%% prerequisite that we need is the existence of a substantial labeled
%% corpus, on which these conjectures could be checked. Since there were
%% no moderately-sized manually annotated sentiment data for German that
%% we were aware of at the time of writing this thesis, we had to create
%% our own dataset, which we will introduce in the next section of this
%% chapter.
%% After describing the data gathering procedure used to compile our
%% corpus, detailing its annotation scheme, and analyzing the inter-rater
%% agreement, we turn our attention to the evaluation and analysis of
%% existing sentiment resources used for German. To do so, we first
%% evaluate three German polar lexica which have been semi-automatically
%% obtained from English sources. We estimate the precision and recall
%% of these dictionaries on our data and also check whether current
%% state-of-the-art approaches for generating such lexica without
%% translation would produce better results.
%% After determining the best-performing lexicon, we look how beneficial
%% this polarity information might be to fine-grained SA objectives, such
%% as determining sources, targets, and the actual spans of subjective
%% opinions in microblogs. We also evaluate the effects of text
%% preprocessing on this task. Finally, we summarize our findings and
%% draw further conclusions in Section \ref{sec:snt:concl} of this
%% chapter.
%% \citet{Sarker:15}, \citet{Dalmia:15}, \citet{Han:15},
%% \citet{Chikersal:15}, \citet{Hagen:15}, \citet{Fernandez:15}
%% Works attempting a more fine-grained sentiment analysis on Twitter
%% usually try to derive a common polarity class for each message with
%% respect to a particular target that is mentioned in that microblog.
%% \citet{Jiang:11}, for instance, tried to classify the polarity of
%% microblogs pertaining to a predefined set of specific topics, like
%% \emph{Obama}, \emph{Google}, \emph{iPad} etc. To this end, the
%% authors manually labeled a corpus of 1,939 messages and trained a
%% binary SVM model in order to predict the subjectivity and the polarity
%% of the tweets with respect to the given subjects.
%% This classifier could achieve an accuracy of 68.2\% for the
%% subjectivity classification and 85.6\% for the polarity prediction.
%% The $F$-score of this system for the latter task could further be
%% improved from 66\% to 68.3\% by incorporating the information about
%% the predicted polarity class of the re-tweets, replies, and other
%% microblogs posted by the same author.
%% \citet{Mitchell:13} broadened the set of possible targets by allowing
%% any named entities found in microblogs to be associated with a
%% specific polarity. For that purpose, the authors combined a CRF-based
%% NER system with a sentiment predicting CRF by considering three
%% different possibilities of such combination: a pipeline approach, a
%% joint multi-layer model, and a single classifier with a combined
%% tagset. The best scores on their corpora of 7,105 Spanish and 2,350
%% English tweets could be achieved with the joint and pipeline
%% approaches. The accuracy of recognizing the opinionated named
%% entities amounted to 31\% for Spanish and 30.4\% for English.
%% Other notable works in this direction include \citet{Chunping:14} who
%% first applied a Na\"{\i}ve Bayes classifier to predict the
%% subjectivity class of microblogs and then sequentially used two CRF
%% models to predict the particular type of subjectivity (such as anger,
%% fear, happiness etc.) for message sentences.
%% Other notable works in this direction include \citet{Dong:14} who used
%% a recurrent neural network to predict the polarity class associated
%% with the opinion targets. They, however, assumed the targets of
%% sentiments to be apriori known and only were interested whether a
%% positive or a negative judgement was made about them.
%% The \texttt{SentiStrength} system proposed by \cite{Thelwall:12} used
%% an extensive list of 763 polar terms in order to predict positive and
%% negative scores for MySpace comments. The manually assigned scores of
%% these terms were automatically fine-tuned during training using a
%% perceptron-like technique. In addition to the core lexicon, the final
%% implementation of this system also utilized a set of heuristic methods
%% and auxiliary modules such as spelling correction algorithm,
%% dictionaries of booster words and negations as well as special rules
%% for emoticons, repeated letters, and exclamation marks. It correctly
%% predicted positive emotions in 60.6~\% of the cases and attained
%% 73.5~\% accuracy at predicting negative sentiment scores. All
%% predictions were made at the level of complete messages.
%% From its very onset in 2006, Twitter has constantly attracted the
%% attention of many NLP-practitioners and scientists, with sentiment
%% researchers being arguably one of the most active of these groups.
%% This trend seems ever exapanding in recent years: For instance, in
%% 2015, a whole section of the international ACL Workshop on
%% Computational Approaches to Subjectivity, Sentiment and Social Media
%% Analysis (WASSA 2015) was solely dedicated to the peculiarities of the
%% opinion mining on Twitter. Sentiment detection on Twitter also
%% remains an active track of the SegEval Shared Task TODO: cite Nakov.
%% But despite its great popularity, most of this work on this topic
%% still concentrates on English data only.