1. Introduction
The encryption-then-compression (ETC) technique is a kind of technique that performs compression on encrypted digital media such as images and videos. It adapts to scenarios of cloud computing, distributed processing, etc., and is in contrast to the conventional compression-then-encryption (ETC) system that conducts compression prior to encryption. In an ETC system, the content owner only encrypts digital media for secrecy but does not compress them because of computing resource limitations. After the encrypted digital media have been uploaded to the cloud side, the cloud side compresses, without access to the decryption key, the encrypted digital media in order to save storage space and communication bandwidth. After the compressed encrypted sequence is received, the receiver finally performs joint decompression and decryption to recover the original image in a lossless or lossy way.
With the content of digital media masked by encryption, it seems to be counter-intuitive to compress the encrypted digital media. By formulating this problem as distributed coding with side information at the decoder, however, Johnson et al. [
1] demonstrated via the information theory that, compared with the conventional CTE system, the lossless ETC system neither degrades security nor sacrifices compression efficiency, while the lossy ETC system only approximates to a CTE system. In addition, they designed two toy examples to illustrate the feasibility of an ETC system.
Since then, a number of ETC schemes have been developed in the literature, most of which mainly focus on the compression and reconstruction of encrypted images and a few concentrate on that of encrypted videos. According to the reconstruction quality, they can be categorized into lossless [
2,
3,
4,
5,
6,
7,
8,
9] and lossy [
10,
11,
12,
13,
14,
15,
16,
17,
18,
19,
20,
21,
22,
23,
24,
25,
26] approaches. The lossless approaches reconstruct the original image or video perfectly, whereas the lossy ones recover a degraded version of the original image or video.
As the lossy ETC method achieves larger compression ratios at the cost of tolerable reconstruction distortion, it has been well investigated in the literature. According to the compression method, lossy compression approaches for encrypted images and videos can be roughly classified into three types, i.e., the compressive sensing (CS)- [
10,
11,
12,
13], scalar quantization-[
14,
15,
16,
17,
18,
19,
20,
21,
22,
23], and uniform downsampling-based ones [
24,
25,
26]. In implementing the encrypted image compression, they use a compressive sensing matrix [
27], a scalar quantizer with a fixed or adaptive quantization step, and a uniform downsampling technique, respectively.
As is well known, the key to improving the compression ratio of unencrypted images and videos is to remove redundant content by exploiting statistical, human visual, or audio characteristics, etc. This is similar for the compression of encrypted images and videos. As analyzed in [
20], in the ETC scenario, it is preferable to fully leverage the statistical characteristics of images and videos at the receiver. This is because the receiver has both the decryption key and sufficient computing resources, while the content owner has limited computing resources, and the cloud side cannot access the decryption key or may exploit statistical characteristics at the risk of statistical information leakage. For example, the two-dimensional Markovian property [
2,
3], resolution progressive strategy [
6], and Markov random field (MRF) [
9] are integrated in the reconstruction process of a lossless ETC system to improve the compression efficiency remarkably. In the situation of a lossy ETC system, the technique of content-adaptive interpolation (CAI) is generally incorporated at the receiver side to facilitate the reconstruction of missing pixels discarded in the compression process [
24,
25,
26].
Guided by these notions and analysis, in this paper, we develop a novel ETC scheme for the lossy compression of gray images using CAI and MRF. Specifically, to better exploit statistical correlations between neighboring pixels, we adopt both pixel- and bitplane-level downsampling for encrypted image compression, and deploy MRF and CAI to recover missing bitplanes and thus pixels. At the pixel level, we perform uniform downsampling with a scaling factor of two in the horizontal and vertical directions on a stream-ciphered gray image, yielding four sub-images, namely , , , and , each of which is a quarter of the size of the original image. If the given compression ratio is smaller than 1/4, some of the bitplanes of are selected to meet the given compression ratio and the compression process is finished; otherwise, is reserved and the bitplane-level downsampling is further conducted. At the bitplane level, partial bitplanes of these sub-images are preserved while the others are discarded. To achieve better reconstruction quality, we formulate the bitplane-level downsampling as an optimal problem, maximizing the peak signal-to-noise ratio (PSNR) on the condition of a given compression rate. As higher bitplanes (e.g., MSB) are more important than lower bitplanes (e.g., LSB) in image reconstruction, we are motivated to solve this optimization problem through a heuristic strategy, e.g., maintaining the first, most significant bitplane (MSB) of these three sub-images successively, then preserving the second MSBs, and so on. In other words, we exploit the asymmetric characteristics of different bitplanes in optimal bitplane selection.
As there exist clear statistical correlations both within any bitplane of the first three MSBs and between adjacent bitplanes, we further leverage an MRF to characterize these statistical correlations [
8,
9]. Based on this, we then use the low-density parity-check (LDPC) code according to the framework of Johnson et al. [
1] to compress the first three MSBs of sub-images
,
,
, and
. This would further enhance the compression efficiency of the encrypted images.
After obtaining the downsampled encrypted image, the receiver conducts de-compression and decryption to recover the original gray image in a lossy way, which is actually symmetric to the process of encrypted image compression. In particular, LDPC-coded bitplanes are first reconstructed perfectly via the joint LDPC decoding, decryption, and MRF exploitation [
8,
9], and CAI [
24] is then deployed to generate missing bitplanes and thus pixels discarded in the compression process.
With the stream cipher for encryption, the pixel- and bitplane-level downsampling for compression, and the MRF and CAI for lossy reconstruction, we thus propose a novel ETC scheme for the lossy compression of encrypted gray images. Experimental results show that the proposed scheme results in desirable visual quality for a reconstructed image and achieves a significant improvement over the state-of-the-art, indicating the feasibility and effectiveness of the proposed scheme.
In summary, the contributions of the paper are two-fold: (1) We formulate the task that maximizes the PSNR of an image reconstructed by a downsampling-based ETC system on the condition of a given compression ratio as an optimization problem, and develop a heuristic bitplane allocation strategy to solve this optimization problem; (2) We propose a novel ETC scheme for lossy compression on encrypted gray images through heuristic optimization of bitplane allocation, achieving a remarkable improvement over the state-of-the-art.
The remainder of the paper is arranged as follows. Related works are introduced in
Section 2. The proposed ETC scheme is described in
Section 3. Experimental results and conclusions are given in
Section 4 and
Section 5, respectively.
3. Proposed ETC Scheme
In this section, we present the proposed ETC scheme through heuristic optimization of bitplane allocation, as illustrated in
Figure 4. It includes gray image encryption via a stream cipher, encrypted image compression with heuristic bitplane allocation optimization, and lossy reconstruction using MRF and content-adaptive interpolation (CAI). Below are details for these parts.
3.1. Stream Cipher-Based Encryption
Suppose that
is a gray image of size
. Then, the
kth bitplane of image
, namely
, is obtained as
It is clear that and denote the most significant bitplane (MSB) and least significant bitplane (LSB), respectively.
Next, generate a pseudorandom bit sequence of length
, say
, via the
kth secret key
, where
is a one-time-pad initial secret key. Afterwards, encrypt each bitplane with
as
where
denotes the
kth bitplane of image
and ⊕ is an exclusive or operation.
After encryption bitplane-by-bitplane, all encrypted bitplanes are merged in the following way to yield the stream ciphered gray image
, i.e.,
where
represents the
kth bitplane of image
. The encrypted
is sent through a public channel to the cloud side, and the secret key
is transmitted via a secure channel to the receiver.
3.2. Compression via Heuristic Optimization of Bitplane Allocation
To save storage space and communication bandwidth, the cloud side chooses to compress the encrypted gray image
on the condition of no decryption key. As the encryption has masked the image content, the cloud side without access to the decryption key cannot exploit the statistical characteristics of the original unencrypted image to conduct compression. As introduced in
Section 2, to achieve feasible compression, the LDPC syndrome can be used for the lossless compression of a stream ciphered binary image, and the compressive sensing matrix, scalar quantizer, or uniform downsampling can be applied for the lossy compression of a stream ciphered binary or gray image.
As uniform downsampling generally achieves relatively large compression ratios at tolerable distortions and is rather suitable for the cloud side without the decryption key to conduct compression on images encrypted by random permutation, stream cipher, cryptography, etc., it is adopted in our work as the compression method. To facilitate CAI at the receiver, we employ the uniform downsampling with a scaling factor of two. In particular, for a stream ciphered image
, we divide it into four sub-images, namely
,
,
, and
, according to
Figure 5, i.e.,
Without loss of generality, we take
as the uniformly downsampled result. This leads to an initial compression ratio of
.
If the target compression ratio is larger than 0.25, then more bitplanes are needed to choose from , , and and sent to the receiver. If is less than 0.25, then more bitplanes are required to discard from . This thus gives rise to a key problem, i.e., how to select suitable bitplanes from sub-images so that the reconstruction performance in terms of PSNR can be maximized on the condition of a given compression ratio .
This problem is essentially a rate-distortion optimization problem. As the statistical characteristics of the original image generally cannot be exploited at the cloud side, this rate-distortion optimization cannot be directly implemented at the cloud side. In an alternative way, the cloud side can choose to design a preferable bitplane selection method, with which the receiver can well exploit the selected bitplanes to recover discarded parts. As a result, the reconstruction quality may be maximized at a given compression ratio.
For a given larger than 0.25, it is clear that the more important the bitplane is for reconstruction, the higher the priority in bitplane selection should be. As MSB is the most important bitplane in image reconstruction and LSB is the least important one, we choose to optimize bitplane allocation in a heuristic way. Specifically, we conduct the bitplane selection as follows.
- (1)
Select the MSB of and compute the compression ratio as . If holds, then bits are randomly discarded from the MSB of , where discarding locations can be generated through a secret key . Otherwise, go to Step 2.
- (2)
Choose the MSB of and calculate . Then, process it in a way similar to Step 1 to either discard some bits or go to Step 3.
- (3)
Extract the MSB of and obtain . Next, remove some of the bits or go to Step 4.
- (4)
Sample the second MSBs of , , and sequentially in a manner similar to Steps 1–3. Even sample their third MSBs, fourth MSBs, and so on. Repeat this process until is met.
For a given smaller than 0.25, bitplanes in should be discarded. In this situation, the LSB bitplane of is first removed, and the compression ratio is updated as . If holds, then bits in LSB are randomly retained. Otherwise, continue to remove the second MSB, the third one, and so on until is achieved.
Although this heuristic optimization of bitplane allocation can probably maximize PSNR at a given compression ratio, we may further improve the compression efficiency by exploiting the statistical correlations that exist within a bitplane and between adjacent bitplanes, where bitplanes belong to the first three MSBs. Specifically, by incorporating uniform downsampling and statistical correlation exploitation, we conduct the compression of an encrypted gray image as follows.
- (1)
Divide the stream ciphered gray image
into four sub-images
,
,
, and
via the method in
Figure 5.
- (2)
Compress the MSB of
through LDPC coding [
8], i.e.,
where
is a parity-check matrix,
is a one-dimensional (
D) vector converted from the 2-D matrix
, and
is the syndrome used as the compressed sequence for
. If the second and third MSBs of
are selected, they are then compressed similarly, yielding Syndrome
and
, respectively.
As there exist weak statistical correlations within and between the other five bitplanes, these bitplanes are not compressed via the LDPC coding and directly sent to the receiver. For notational convenience, these five bitplanes are also denoted .
- (3)
Compute the compression ratio,
, for sub-image
as
where
denotes a function counting the bit number. If the given compression ratio
is smaller than
, then determine the minimum
k via the following optimization, i.e.,
Next, transmit the syndrome sequence of to the receiver and terminate the compression process. Otherwise, send the syndrome sequence of to the receiver and go to the next step.
- (4)
Compress the MSB of by LDPC coding and generate syndrome . If holds, then further send to the receiver and end the compression. Otherwise, go to the next step.
- (5)
Impose the LDPC coding on the MSB of and yield syndrome . If holds, then transmit to the receiver and complete the compression. Otherwise, go to the next step.
- (6)
Conduct similar processing on the MSB of . If holds, then send to the receiver and finish the compression. Otherwise, go to the next step.
- (7)
Similar to Steps 4–7, condense the second and third MSBs of , , and successively until the given compression ratio is approximately satisfied.
- (8)
Similar to the compression of bitplanes , the other five bitplanes of , , and are not LDPC-coded and denoted , , and for notational convenience, respectively. If is sufficiently large, , , and are chosen sequentially from to until is nearly achieved.
It is noted that this compression procedure leads to a compression ratio approximating but not exactly the same as , aiming to trade-off the complexity in bitplane selection and the exact compression ratio. If an exact compression ratio is required, optimal bit selection is required, which needs much more investigation and thus can be considered an interesting topic for future work.
After the compression is completed, the selected syndrome sequence, say , is transmitted through a public channel to the receiver, while the secret key is sent via a secret channel. It is worth pointing out that the can be used to generate all related coordinates of randomly discarded or supplemented bits by yielding sub-keys such as , where denotes the label of a sub-image and stands for the number of bitplanes.
In addition, to facilitate LDPC decoding at the receiver, we need to tell the receiver about the number of sent bitplanes in each sub-image, and the number of syndrome bits for the three MSBs of each sub-image. As each sub-image contains eight bitplanes, three bits are enough to represent the bitplane number. As the size of each bitplane of a sub-image is
, a total of
bits is sufficient to record the number of syndrome bits of each bitplane of a sub-image. Therefore, by incorporating this auxiliary information, the final compression ratio
R is calculated as
where
is the number of LDPC-coded MSBs of all four sub-images. Moreover, as bitplanes other than the three MSBs are not compressed, their sizes are fixed to
and thus do not need to be recorded.
3.3. Reconstruction via MRF Exploitation and Interpolation
After receiving the compressed encrypted sequence , the auxiliary information indicating the transmitted bitplane number and the LDPC-coded sequence length, and secret keys and , the receiver performs decompression and decryption to reconstruct the original image in a lossy way.
First of all, from the received auxiliary information, the receiver determines the number of received bitplanes of each sub-image and the length of LDPC-coded bitplanes. Via these parameters, the syndrome sequence of each LDPC-coded bitplane is extracted from
, and the corresponding bitplane is then recovered in a lossless way via the joint LDPC decoding, decryption, and MRF exploitation (see also
Section 2.2.2) [
9]. Similarly, the non-compressed bitplanes (i.e., the first to fifth LSBs) are obtained from
and then recovered accordingly. For the other bitplanes that are not sent from the cloud side, all are set with zeros. After bitplanes have been recovered, they are merged to yield four reconstructed sub-images:
,
,
, and
.
As there exist clear statistical correlations between pixels in the neighborhood, we further employ CAI [
29] to reconstruct missing parts. In more detail, we use
to interpolate
,
, and
successively.
Figure 6 illustrates the interpolation in our work. Specifically, at the first stage, we deploy
to interpolate
. Suppose that
are neighboring pixels in the diagonal directions of pixel
, where
belongs to pixels in
whereas
is located in
. Then, we use
to predict
as
where
,
,
, and
are functions generating the mean, maximum, minimum, and median values of a given vector.
We proceed to improve the prediction accuracy for . Assume that MSBs of are recovered in a lossless way from the received sequence . Then, they can be exploited to improve the prediction result . In more detail, first, obtain the first N MSB bits from pixel magnitude and , and then converge them to decimal values and , respectively, where with . If meets , then the prediction value deviates from the original value significantly.
To tackle this issue, another two prediction values are generated as
where
sums up all elements of a given vector. The
N MSB bits of
and
are also extracted to yield the corresponding decimal values
and
, respectively. Value
leading to the minimum difference is taken as the optimum one,
, i.e.,
The value yielding is thus considered the practically optimum interpolation value.
Via the
, the difference with respect to
is calculated as
. Thus, the final interpolation result for
is obtained as
where
and
is a modulo function.
After generating
, we then use it as well as
to interpolate
and
successively. This is rather similar to the interpolation of
, as shown in
Figure 6.
When sub-images
,
, and
are reconstructed, they are merged according to
Figure 6 to yield the reconstructed gray image
, which is an inverse process of sub-image extraction, as illustrated in
Figure 5.
5. Conclusions
In this paper, we present a lossy compression scheme on encrypted gray images through heuristic optimization of bitplane allocation. For a stream ciphered gray image, we develop a pixel- and bitplane-level downsampling method to perform the compression. At the pixel level, the uniform downsampling technique is employed, which yields four sub-images with a quarter of the size and may discard part of the sub-images according to the given compression ratio. At the bitplane level, the task of bitplane selection-based compression is first formulated as an optimization problem that chooses suitable bitplanes to maximize the PSNR of the reconstructed image subject to a given compression ratio, and then a heuristic strategy is developed to approximately solve this optimization problem. For the LDPC-coded bitplanes among the first three MSBs, we further exploit an LDPC code to compress them. In the reconstruction stage, bitplanes among the first three MSBs are recovered in a lossless way via the joint LDPC decoding, decryption, and MRF exploitation. They are then merged to form sub-images, and content-adaptive interpolation is deployed to reconstruct missing bitplanes and thus discarded pixels. Experimental simulation results show that the proposed scheme outperforms the state-of-the-art ETC methods, indicating the feasibility and effectiveness of the proposed scheme.
Although the proposed ETC scheme is promising, it could be further improved in future investigation. For example, each sub-image generated in this work may be further downsampled in a multi-scale way, aiming to better exploit the statistical correlations between neighboring pixels to improve the compression efficiency. In addition, as some of the MSBs of the sub-images are sent to the receiver, they could be fully used to improve the accuracy of content-adaptive interpolation. Furthermore, the proposed scheme may be extended to the efficient compression of encrypted color images.