Deep learning (DL) has been a primary security concern of the data leak. Several studies have reported that laptop keystrokes can be classified by DL and a microphone that records the keystroke acoustics. However, much uncertainty still exists about the practical deployment and cross-device classification. This paper attempts to develop a real-time DL model for various devices on the embedded system. Our results prove the concept is highly possible in practice yet we need further data to verify.
A. Dataset
We collected data from a 2021 13-inch Macbook Pro to gather the acoustics from the multiple configurations we desired. Fig. 1 shows the collection setup. Only the number keys shown in the red box are recorded to prove the leaking threats of confidential information (e.g. credit number and government ID number) without adding too much training effort. The acoustics of the number keystroke were collected by the built-in microphone from the inference Arduino to eliminate the microphone variation.
The data are collected, labeled, and split on the Edge Impulse platform. Following the collecting method from recent studies [1], we pressed 25-30 times with different typing methods [2], strength, and interval in a continuous 19-second recording for each target keystroke. The example recording After labeling the entire recording on the Egde Impulse, we identified the keystroke acoustics by the peak in the waveform and split it individually. The window size is 150 ms, which is the average acoustics interval by seeing the waveform manually.
B. Inference Pipline
udio MFE provided by Edge Impulse is used for the preprocessing block. It extracts non-vocal audio information by applying mel-scale on the frequency domain [3]. The frequency domain also separates the signal from the white noises.
The convolutional neural network (CNN) is used to provide high accuracy in limited size. Recent studies have demon- strated that many models can achieve accuracy rates exceeding 80% in this domain. The CNN, in particular, has shown consistently high and stable performance [4]. Additionally, several CNN models have been proven effective for this specific application [1], [5]. The model in this project is shown in Fig. 3. The dropout layers stop some intermediate features from dominating the result and overfitting by randomly drop- ping. The inference has 11 output features indicating the 10 numbers and no keypress.
C. Training
he computation limit on Edge Impulse results in a difficult problem finding proper hyperparameters for the training pro- cess. The 20-minute and CPU-only limit from Edge Impulse free user provides extremely low computation compared to other deep learning training. The epoch is set as high as possible and the learning rate is set to a proper constant to fully utilize the resource on the platform. The learning rate needs to be high enough to avoid underfitting and low enough to avoid divergence. We train 1400 epochs with a learning rate 5 ∗ 10−4 in this research.
The CNN model has high accuracy and great performance to classify the keystroke. Fig. 4 shows the performance of test data. The 88% average accuracy and 0.87 average F1 source indicate the strong potential for deep learning model as a keystroke classifier.
The physical distance between the keys determines the output inference. Fig. 5 shows the output distribution of the model. The black line connects all the classes in the same order as the physical keyboard from left to right. The classes are closer if their corresponding keys are closer on the keyboard layout.
For more information, please contact the authors:
- chengdel@andrew.cmu.edu
- tbana@andrew.cmu.edu