Publication Date



Latent topic models such as Latent Dirichlet Allocation (LDA) and probabilistic Latent Semantic Analysis (pLSA) have demonstrated success in computer vision tasks. Most existing approaches train LDA and pLSA in an unsupervised manner, where the training data does not include any class label information. However, the class labels in training data are very important for the task of classification. In this paper, we propose to train a pLSA model in a supervised manner for the task of human motion analysis using the bag-of-words representation. Each frame in a video is treated as a word, and all the frames in the training videos are clustered to construct a codebook. The class label information is used to learn the pLSA model in a supervised manner, which not only makes the training more efficient, but also improves the overall recognition accuracy significantly. In addition, we employ the pyramid Histogram of orientation Gradient (HoG) to encode a human figure in each frame. The pyramid HoG descriptor does not require extraction of silhouettes, and is invariant to translations and rotations to some extent. The method is validated using two standard datasets. The experimental results show that our method can accurately recognize human motion in video sequences. Moreover, the overall recognition accuracy is rather stable with respect to the codebook size.


Institute for Learning Sciences and Teacher Education

Document Type

Journal Article

Access Rights

ERA Access

Access may be restricted.