Large linear classification when data cannot fit in memory

15 years 6 months ago

Download www.csie.ntu.edu.tw

Recent advances in linear classification have shown that for applications such as document classification, the training can be extremely efficient. However, most of the existing training methods are designed by assuming that data can be stored in the computer memory. These methods cannot be easily applied to data larger than the memory capacity due to the random access to the disk. We propose and analyze a block minimization framework for data larger than the memory size. At each step a block of data is loaded from the disk and handled by certain learning methods. We investigate two implementations of the proposed framework for primal and dual SVMs, respectively. As data cannot fit in memory, many design considerations are very different from those for traditional algorithms. Experiments using data sets 20 times larger than the memory demonstrate the effectiveness of the proposed method. Categories and Subject Descriptors I.5.2 [Pattern Recognition]: Design Methodology--Classifier des...

Hsiang-Fu Yu, Cho-Jui Hsieh, Kai-Wei Chang, Chih-J

Real-time Traffic

Block Minimization Framework | Certain Learning Methods | Data Mining | KDD 2010 | Memory Capacity |

claim paper

» The distributed boosting algorithm

» pplacer linear time maximumlikelihood and Bayesian phylogenetic placement of sequences ont...

» IDRQR an incremental dimension reduction algorithm via QR decomposition

» Binary Searching with Nonuniform Costs and Its Application to Text Retrieval

» Scaling Up Inductive Logic Programming by Learning from Interpretations

Post Info
More Details (n/a)

Added	13 Oct 2010
Updated	13 Oct 2010
Type	Conference
Year	2010
Where	KDD
Authors	Hsiang-Fu Yu, Cho-Jui Hsieh, Kai-Wei Chang, Chih-Jen Lin

Comments (0)

Sciweavers

Large linear classification when data cannot fit in memory

Block Minimization Framework | Certain Learning Methods | Data Mining | KDD 2010 | Memory Capacity |

Explore & Download

Productivity Tools

Sciweavers