Ranking-Constrained Keyword Sequence Extraction from Web Documents

15 years 11 months ago

Download crpit.com

Given a large volume of Web documents, we consider problem of ﬁnding the shortest keyword sequences for each of the documents such that a keyword sequence can be rendered to a given search engine, then the corresponding Web document can be identiﬁed and is ranked at the ﬁrst place within the results. We call this system as an Inverse Search Engine (ISE). Whenever a shortest keyword sequence is found for a given Web document, the corresponding document can be returned as the ﬁrst document by the given search engine. The resulting keyword sequence is search-engine dependent. The ISE therefore can be used as a tool to manage Web content in terms of the extracted shortest keyword sequences. In this way, a traditional keyword extraction process is constrained by the document ranking method adopted by a search engine. The signiﬁcance is that the whole Web-searchable documents on the World Wide Web can then be partitioned according to their keyword phrases. This paper discusses the...

Ding-Yi Chen, Xue Li, Jing Liu, Xia Chen

Real-time Traffic