Skip to content

source and data for Product Review Summarization by Exploiting Phrase Properties

Notifications You must be signed in to change notification settings

atone/ReviewSummarizer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Product Review Summarization by Exploiting Phrase Properties

This repository contains data and source code in the paper "Product Review Summarization by Exploiting Phrase Properties".

Aspect Keyword List

In the experiment, we use a list of aspect keyword, as follows:

  • a1: 外观 外形 设计 外型 外壳 外表
  • a2: 质量 材质 手感 质感 作工 做工
  • a3: 屏幕 触摸屏 显示屏 分辨率 led 触摸板 液晶屏 电阻屏 显示 触屏
  • a4: 性价比 价位 价钱 价格 售价
  • a5: 系统 稳定性 性能 速度 操作系统 兼容性
  • a6: 软件 导航 wifi
  • a7: 操控 操控性 操作性 操作 触控
  • a8: 电池 待机 电量 续航 耗电
  • a9: 键盘 按键 功能键 按钮
  • a10: 信号 网络 蓝牙 通话 天线 通信 通讯
  • a11: 短信 彩信
  • a12: 界面 画面 画质 ui
  • a13: 输入法 手写 输入
  • a14: 机型 机身 款式 样式
  • a15: 照相 摄像 照像 相机 拍照 镜头 像素 闪光灯 摄像头 照相机 录音
  • a16: 音效 音色 音质 话筒 听筒 扬声器 喇叭 话音 音响 语音 立体声
  • a17: 存储 内存 内存卡 存储卡 储存卡 扩展卡

The aspect keyword list can also be retrieved in summarizer.model.Aspect.

Data

The original review data is available at data/all_reviews/. Each file corresponds to a cell phone.

The data of phrases with sentiment polarities is available at data/phrases_new/. Each file corresponds a cell phone.

The summaries which are generated by 3 baselines and our system are available at data/summary/. xxxx_reviewSum_summary.txt is generated by our system and xxxx_lexrank_summary.txt, xxxx_opinosis_summary.txt, xxxx_basicSum_summary.txt are generated by the other 3 baseline systems in our paper.

Evaluation Data

Task 1 is pairwise user preference evaluation and Task 2 is user scoring evaluation. In Task 1, we run 6 pairwise comparisons of 4 summaries generated by our system and baseline. In Task 2, we ask annotators to evaluate 4 aspects of each summary.

We asked 20 annotators to do the evaluation task, 10 annotators are assigned to Task 1 and 10 annotators are assigned to Task 2. All of the annotators are native Chinese speakers with experiences of product review writing. We construct the evaluation dataset using customer reviews of 10 cell phones. For each annotator, at least 5 products are annotated. For each product of each task, at least 5 annotations are performed.

The annotation data is available at data/evaluation_data/. task1 subfolder contains annotation data for Task 1 and task2 subfolder contains annotation data for Task 2, respectively. Since the exact summarization algorithm name is hidden to annotators, each task item is assigned with an UUID. evaluation.log is used to store the map between the task item and the UUID.

Code Explanation

summarizer.summarizer.ReviewSummarizer

public String getSummary()

get the summary generated by our system.

summarizer.evaluation.EvaluationDataGen

public static List<Pair> evaluationPairGenerator(int productID)

generate evaluation file for Task 1, where productID denotes the ID of the product in the original review data.

public static Map<String, String> evaluationGenTask2(int productID)

generate evaluation file for Task 2, where productID denotes the ID of the product in the original review data.

public void printTask1Statics(String task1ResultDir)

print the evaluation result of Task 1 on the console. task1ResultDir denotes the directory where the annotation files of Task 1 are contained, e.g., data/evaluation_data/task1/.

public void printTask2Statics(String task2ResultDir)

print the evaluation result of Task 2 on the console, task2ResultDir denotes the directory where the annotation files of Task 2 are contained, e.g., data/evaluation_data/task2/.

About

source and data for Product Review Summarization by Exploiting Phrase Properties

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages