Characterizing large text corpora using a maximum variation sampling genetic algorithm

15 years 11 months ago

Download aser.ornl.gov

An enormous amount of information available via the Internet exists. Much of this data is in the form of text-based documents. These documents cover a variety of topics that are vitally important to the scientific, business, and defense/security communities. Currently, there are a many techniques for processing and analyzing such data. However, the ability to quickly characterize a large set of documents still proves challenging. Previous work has successfully demonstrated the use of a genetic algorithm for providing a representative subset for text documents via adaptive sampling. In this work, we further expand and explore this approach on much larger data sets using a parallel Genetic Algorithm (GA) with adaptive parameter control. Experimental results are presented and discussed. Categories and Subject Descriptors I.7.0 [Document and Text Processing]: General General Terms Algorithms, Performance, Design, Experimentation Keywords Text analysis, parallel genetic algorithm, intellig...

Robert M. Patton, Thomas E. Potok

Real-time Traffic

Document | GECCO 2006 | Genetic Algorithm | Optimization | Parallel Genetic Algorithm |

claim paper

Added	23 Aug 2010
Updated	23 Aug 2010
Type	Conference
Year	2006
Where	GECCO
Authors	Robert M. Patton, Thomas E. Potok

Sciweavers

Characterizing large text corpora using a maximum variation sampling genetic algorithm

Document | GECCO 2006 | Genetic Algorithm | Optimization | Parallel Genetic Algorithm |

Explore & Download

Productivity Tools

Sciweavers