[1]彭小刚,明仲,王海涛,等.基于wordNet的类别可拓展网页分类系统[J].深圳大学学报理工版,2009,26(2):116-120.
 PENG Xiao-gang,MING Zhong,WANG Hai-tao,et al.WordNet based webpage classification system with category expansion[J].Journal of Shenzhen University Science and Engineering,2009,26(2):116-120.
点击复制

基于wordNet的类别可拓展网页分类系统()
分享到:

《深圳大学学报理工版》[ISSN:1000-2618/CN:44-1401/N]

卷:
第26卷
期数:
2009年2期
页码:
116-120
栏目:
电子与信息工程
出版日期:
2009-04-30

文章信息/Info

Title:
WordNet based webpage classification system with category expansion
文章编号:
1000-2618(2009)02-0116-05
作者:
彭小刚明仲王海涛周景洲
深圳大学计算机与软件学院,深圳518060
Author(s):
PENG Xiao-gangMING ZhongWANG Hai-taoand ZHOU Jing-zhou
College of Computer Science and Software Engineering,Shenzhen University,Shenzhen 518060,P.R.China
关键词:
信息提取网页分类wordNet基于词义分类类别拓展
Keywords:
information retrieval webpage classification wordNet sense based classification category expansion
分类号:
TP 319
文献标志码:
A
摘要:
基于文本写作常采用一个意思由多个不同写法的单词来表述,研究词义文本分类法被用来替代使用关键词分类算法以提高分类准确率.分析wordNet内Synset架构,认为一个兼顾词义以及词义间关系的词义文本分类系统可应用到网页分类中.该系统同时注意到固定的文本类别结构以及结构内不断增长的文件数目间的区别,加入了基于类别信息聚类方法的类别拓展的功能.仿真实验证明,该分类系统与现有的基于语义的分类系统相比,在分类准确度性能上能提高13%.基于类别信息类聚的文本拓展功能与采用基于相似度的类聚方法的系统相比获得了一个质量更高的新增类别.
Abstract:
Since different key words might be used to express the same meaning in text,many sense-based webpage classification algorithms have been presented to facilitate the process of retrieving online information instead of keyword based algorithms.A sense based webpage classification system using synsets in wordNet as well as the whole synset structure was developed to improve the classification accuracy.A category-based clustering algorithm for category expansion was also used in the system to address the problems caused by the conflict between the fixed number of categories and the growing number of documents added to the system.Experimental results show that the semantic hierarchy classification algorithm increases the classification accuracy by 13% compared with existing sense-based classification algorithms.The category-based clustering algorithm achieves a higher quality cluster than other existing methods that use similarity measure only.

参考文献/References:

[1]Choi B,彭小刚.动态层次化的网页分类[J].在线信息评论,2004,28(2):139-147(英文版).
[2]ZHANG Y,LIU B.语义类别分析于疾病报告中的应用[C].//ACM SIGIR.纽约:ACM,2007:747-748(英文版).
[3]Dayanik A,Lewis D,Madigan D,等.文本分类中的区域知识构建信息预先分布[C].//ACM SIGIR.纽约:ACM,2006(英文版).
[4]Kehagias A,Petridis V,Kaburlasos V G.基于词和基于意义的几种分类算法的对比[J].智能信息系统,2003,21(3):227-247(英文版).
[5]Scott S,Matwin S.使用wordNet的上位词进行文本分类[C]//Coting-ACL98会议论文集:WordNet在自然语言处理中的应用.纽约:ACM,1998:45-52(英文版).
[6]Miller G A.WordNet中的名词:词汇继承系统[J].词典学国际期刊,1990,3(4):245-264(英文版).
[7]Paolo Rosso,Edgardo Ferretti,Daniel Jiménez,and Vicente Vidal.基于wordnet的文本信息分类及信息提取[C]//全球wordNet会议论文集. 纽约:施普林格出版社,2004:299-304.

[1]Choi B,PENG Xiao-gang.Dynamic and hierarchical classification of web pages[J].Online Information Review,2004,28(2):139-147.
[2]ZHANG Y,LIU B.Semantic text classification of disease reporting[C].//ACM.SIGIR.NY:ACM,2007:747-748.
[3]Dayanik A,Lewis D,Madigan D,et al. Constructing informative prior distributions from domain knowledge in text classification[C].//ACM SIGIR.NY:ACM,2006.
[4]Kehagias A,Petridis V,Kaburlasos V G,et al.A comparison of word- and sense-based text categorization using several classification algorithms[J].Journal of Intelligent Information Systems,2003,21(3):227-247.
[5]Scott S,Matwin S.Text classification using wordNet hypernyms[C]//Coling-ACL98 workshop:usage of wordNet in natural language processing systems.NY:ACM,1998:45-52.
[6]Miller G A.Nouns in wordNet:a lexical inheritance system[J].International Journal of Lexicography,1990,3(4):245-264.
[7]Rosso P,Ferretti E,Jiménez D,et al.Text categorization and information retrieval using wordNet senses//[C]Global WordNet Conference.NY:Springer-Verlag Press,2004:299-304.

备注/Memo

备注/Memo:
收稿日期:2009-02-20;修回日期:2009-03-13
基金项目:国家自然科学基金资助项目(60673122);深圳市科技基金资助项目(200740)
作者简介:彭小刚(1976-),男(汉族),广东省深圳市人,深圳大学讲师、博士.E-mail:patrickpeng@126.com
通讯作者:明仲(1967-),男(汉族),深圳大学教授、博士.E-mail:mingz@szu.edu.cn
更新日期/Last Update: 2009-05-14