基于机器学习和计算机视觉的人体动作识别方法

The adopted technical methods in the two programs are based on machine learning, computer vision, and related domain knowledge, leveraging modern algorithms and frameworks to effectively address many problems in human motion recognition. Specifically, the 'mediapipe' framework is utilized, which is dedicated to model construction for posture estimation, gesture recognition, and other tasks in machine learning. This framework has been widely applied in computer vision, especially in fields such as robotics, computer-aided diagnosis, and human-computer interaction.

The program for training and processing videos requires supervised learning using a known dataset to train the machine model for action classification. In this program, the K-Nearest Neighbor (KNN) algorithm is applied to the training dataset and compared with the test data for classification. KNN is a supervised machine learning algorithm that treats each data sample point as a point in an n-dimensional space. The closer the distance between samples, the more likely they belong to the same class. Therefore, the algorithm calculates the k-nearest points (a constant) between the test sample and the sample data, and then classifies the test data based on the labels of these points. The labels can be generated using various methods, such as Euclidean distance, cosine distance, and Manhattan distance.

In the program for extracting training data, the OpenCV library is used to process video file streams to obtain keypoint information. OpenCV is an open-source library widely used for image and video processing in computer vision, providing many computer vision algorithms and tools. Specifically, OpenCV uses the 'mediapipe' Pose model to detect key points of the human body, obtaining accurate human motion data.

However, calculating the position of key points is only the first step in extracting human motion. Based on this, further analysis and processing are required. In this program, many processing and calculation techniques are adopted, such as calculating the distance between two key points, calculating the angle, and calculating the time interval of key actions. This allows for the extraction of higher-dimensional information, forming a complete feature vector that facilitates subsequent training and classification.

In summary, the technical methods adopted in these two programs are based on machine learning, computer vision, and related domain knowledge, leveraging modern algorithms and frameworks to effectively address many problems in human motion recognition, especially in expressing and recognizing the details of human posture changes, which significantly improves accuracy. In future research, we have reason to believe that these technical methods can be applied to broader fields such as video, health, and entertainment, promoting the development of human intelligence and technology.