3D models have many applications, but automatically building a 3D model from a video is a challenge in practice. Many methods exist for outdoor scenes, but indoor scenes are more difficult. Due to the limited movement, the input is very often close to degeneracy for which making a model is impossible without smart input processing. This paper presents our work toward a framework for modeling of indoor scenes. We first analyze the video to segment it into general and degenerate parts. From there specific auto-calibration methods can be applied to effectively solve the problem. Rather than ignoring degenerate segments, we develop a frame filtering method that preserves all the information of the input in order to achieve a more complete model. Results show that the remaining frames, significantly smaller in number, are nearly as informative as the original input and are suitable for the later steps of the modeling process.