image processing applications research paper

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

View all journals

Image processing articles from across Nature Portfolio

Image processing is manipulation of an image that has been digitised and uploaded into a computer. Software programs modify the image to make it more useful, and can for example be used to enable image recognition.

Latest Research and Reviews

image processing applications research paper

Histopathology imaging and clinical data including remission status in pediatric inflammatory bowel disease

Chloe Martin-King
Kenneth Grant

Morphological classification of neurons based on Sugeno fuzzy integration and multi-classifier fusion

Guanglian Li
Haixing Song

Preoperative prediction of MGMT promoter methylation in glioblastoma based on multiregional and multi-sequence MRI radiomics analysis

Noninvasive, label-free image approaches to predict multimodal molecular markers in pluripotency assessment

Ryutaro Akiyoshi
Takeshi Hase
Ayako Yachie

A prospective multi-center study quantifying visual inattention in delirium using generative models of the visual processing stream

Ahmed Al-Hindawi
Marcela Vizcaychipi
Yiannis Demiris

Annotated Pap cell images and smear slices for cell classification

David Kupas
Andras Hajdu
Balazs Harangi

News and Comment

Omega — harnessing the power of large language models for bioimage analysis

Loïc A. Royer

DL4MicEverywhere: deep learning for microscopy made flexible, shareable and reproducible

Iván Hidalgo-Cenalmor
Joanna W. Pylvänäinen
Estibaliz Gómez-de-Mariscal

Big data for everyone

Henrietta Howells

Creating a universal cell segmentation algorithm

Cell segmentation currently involves the use of various bespoke algorithms designed for specific cell types, tissues, staining methods and microscopy technologies. We present a universal algorithm that can segment all kinds of microscopy images and cell types across diverse imaging protocols.

Where imaging and metrics meet

When it comes to bioimaging and image analysis, details matter. Papers in this issue offer guidance for improved robustness and reproducibility.

EfficientBioAI: making bioimaging AI models efficient in energy and latency

Jianxu Chen

Quick links

Explore articles by subject
Guide to authors
Editorial policies

Open access
Published: 05 December 2018

Application research of digital media image processing technology based on wavelet transform

Lina Zhang 1 ,
Lijuan Zhang 2 &
Liduo Zhang 3

EURASIP Journal on Image and Video Processing volume 2018 , Article number: 138 ( 2018 ) Cite this article

7204 Accesses

Metrics details

With the development of information technology, people access information more and more rely on the network, and more than 80% of the information in the network is replaced by multimedia technology represented by images. Therefore, the research on image processing technology is very important, but most of the research on image processing technology is focused on a certain aspect. The research results of unified modeling on various aspects of image processing technology are still rare. To this end, this paper uses image denoising, watermarking, encryption and decryption, and image compression in the process of image processing technology to carry out unified modeling, using wavelet transform as a method to simulate 300 photos from life. The results show that unified modeling has achieved good results in all aspects of image processing.

1 Introduction

With the increase of computer processing power, people use computer processing objects to slowly shift from characters to images. According to statistics, today’s information, especially Internet information, transmits and stores more than 80% of the information. Compared with the information of the character type, the image information is much more complicated, so it is more complicated to process the characters on the computer than the image processing. Therefore, in order to make the use of image information safer and more convenient, it is particularly important to carry out related application research on image digital media. Digital media image processing technology mainly includes denoising, encryption, compression, storage, and many other aspects.

The purpose of image denoising is to remove the noise of the natural frequency in the image to achieve the characteristics of highlighting the meaning of the image itself. Because of the image acquisition, processing, etc., they will damage the original signal of the image. Noise is an important factor that interferes with the clarity of an image. This source of noise is varied and is mainly derived from the transmission process and the quantization process. According to the relationship between noise and signal, noise can be divided into additive noise, multiplicative noise, and quantization noise. In image noise removal, commonly used methods include a mean filter method, an adaptive Wiener filter method, a median filter, and a wavelet transform method. For example, the image denoising method performed by the neighborhood averaging method used in the literature [ 1 , 2 , 3 ] is a mean filtering method which is suitable for removing particle noise in an image obtained by scanning. The neighborhood averaging method strongly suppresses the noise and also causes the ambiguity due to the averaging. The degree of ambiguity is proportional to the radius of the field. The Wiener filter adjusts the output of the filter based on the local variance of the image. The Wiener filter has the best filtering effect on images with white noise. For example, in the literature [ 4 , 5 ], this method is used for image denoising, and good denoising results are obtained. Median filtering is a commonly used nonlinear smoothing filter that is very effective in filtering out the salt and pepper noise of an image. The median filter can both remove noise and protect the edges of the image for a satisfactory recovery. In the actual operation process, the statistical characteristics of the image are not needed, which brings a lot of convenience. For example, the literature [ 6 , 7 , 8 ] is a successful case of image denoising using median filtering. Wavelet analysis is to denoise the image by using the wavelet’s layering coefficient, so the image details can be well preserved, such as the literature [ 9 , 10 ].

Image encryption is another important application area of digital image processing technology, mainly including two aspects: digital watermarking and image encryption. Digital watermarking technology directly embeds some identification information (that is, digital watermark) into digital carriers (including multimedia, documents, software, etc.), but does not affect the use value of the original carrier, and is not easily perceived or noticed by a human perception system (such as a visual or auditory system). Through the information hidden in the carrier, it is possible to confirm the content creator, the purchaser, transmit the secret information, or determine whether the carrier has been tampered with. Digital watermarking is an important research direction of information hiding technology. For example, the literature [ 11 , 12 ] is the result of studying the image digital watermarking method. In terms of digital watermarking, some researchers have tried to use wavelet method to study. For example, AH Paquet [ 13 ] and others used wavelet packet to carry out digital watermark personal authentication in 2003, and successfully introduced wavelet theory into digital watermark research, which opened up a new idea for image-based digital watermarking technology. In order to achieve digital image secrecy, in practice, the two-dimensional image is generally converted into one-dimensional data, and then encrypted by a conventional encryption algorithm. Unlike ordinary text information, images and videos are temporal, spatial, visually perceptible, and lossy compression is also possible. These features make it possible to design more efficient and secure encryption algorithms for images. For example, Z Wen [ 14 ] and others use the key value to generate real-value chaotic sequences, and then use the image scrambling method in the space to encrypt the image. The experimental results show that the technology is effective and safe. YY Wang [ 15 ] et al. proposed a new optical image encryption method using binary Fourier transform computer generated hologram (CGH) and pixel scrambling technology. In this method, the order of pixel scrambling and the encrypted image are used as keys for decrypting the original image. Zhang X Y [ 16 ] et al. combined the mathematical principle of two-dimensional cellular automata (CA) with image encryption technology and proposed a new image encryption algorithm. The image encryption algorithm is convenient to implement, has good security, large key amount, good avalanche effect, high degree of confusion, diffusion characteristics, simple operation, low computational complexity, and high speed.

In order to realize the transmission of image information quickly, image compression is also a research direction of image application technology. The information age has brought about an “information explosion” that has led to an increase in the amount of data, so that data needs to be effectively compressed regardless of transmission or storage. For example, in remote sensing technology, space probes use compression coding technology to send huge amounts of information back to the ground. Image compression is the application of data compression technology on digital images. The purpose of image compression is to reduce redundant information in image data and store and transmit data in a more efficient format. Through the unremitting efforts of researchers, image compression technology is now maturing. For example, Lewis A S [ 17 ] hierarchically encodes the transformed coefficients, and designs a new image compression method based on the local estimation noise sensitivity of the human visual system (HVS). The algorithm can be easily mapped to 2-D orthogonal wavelet transform to decompose the image into spatial and spectral local coefficients. Devore R A [ 18 ] introduced a novel theory to analyze image compression methods based on wavelet decomposition compression. Buccigrossi R W [ 19 ] developed a probabilistic model of natural images based on empirical observations of statistical data in the wavelet transform domain. The wavelet coefficient pairs of the basis functions corresponding to adjacent spatial locations, directions, and scales are found to be non-Gaussian in their edges and joint statistical properties. They proposed a Markov model that uses linear predictors to interpret these dependencies, where amplitude is combined with multiplicative and additive uncertainty and indicates that it can interpret statistical data for various images, including photographic images, graphic images, and medical images. In order to directly prove the efficacy of the model, an image encoder called Embedded Prediction Wavelet Image Coder (EPWIC) was constructed in their research. The subband coefficients use a non-adaptive arithmetic coder to encode a bit plane at a time. The encoder uses the conditional probability calculated from the model to sort the bit plane using a greedy algorithm. The algorithm considers the MSE reduction for each coded bit. The decoder uses a statistical model to predict coefficient values based on the bits it has received. Although the model is simple, the rate-distortion performance of the encoder is roughly equivalent to the best image encoder in the literature.

From the existing research results, we find that today’s digital image-based application research has achieved fruitful results. However, this kind of results mainly focus on methods, such as deep learning [ 20 , 21 ], genetic algorithm [ 22 , 23 ], fuzzy theory, etc. [ 24 , 25 ], which also includes the method of wavelet analysis. However, the biggest problem in the existing image application research is that although the existing research on digital multimedia has achieved good research results, there is also a problem. Digital multimedia processing technology is an organic whole. From denoising, compression, storage, encryption, decryption to retrieval, it should be a whole, but the current research results basically study a certain part of this whole. Therefore, although one method is superior in one of the links, it is not necessary whether this method will be suitable for other links. Therefore, in order to solve this problem, this thesis takes digital image as the research object; realizes unified modeling by three main steps of encryption, compression, and retrieval in image processing; and studies the image processing capability of multiple steps by one method.

Wavelet transform is a commonly used digital signal processing method. Since the existing digital signals are mostly composed of multi-frequency signals, there are noise signals, secondary signals, and main signals in the signal. In the image processing, there are also many research teams using wavelet transform as a processing method, introducing their own research and achieving good results. So, can we use wavelet transform as a method to build a model suitable for a variety of image processing applications?

In this paper, the wavelet transform is used as a method to establish the denoising encryption and compression model in the image processing process, and the captured image is simulated. The results show that the same wavelet transform parameters have achieved good results for different image processing applications.

2.1 Image binarization processing method

The gray value of the point of the image ranges from 0 to 255. In the image processing, in order to facilitate the further processing of the image, the frame of the image is first highlighted by the method of binarization. The so-called binarization is to map the point gray value of the image from the value space of 0–255 to the value of 0 or 255. In the process of binarization, threshold selection is a key step. The threshold used in this paper is the maximum between-class variance method (OTSU). The so-called maximum inter-class variance method means that for an image, when the segmentation threshold of the current scene and the background is t , the pre-attraction image ratio is w0, the mean value is u0, the background point is the image ratio w1, and the mean value is u1. Then the mean of the entire image is:

The objective function can be established according to formula 1:

The OTSU algorithm makes g ( t ) take the global maximum, and the corresponding t when g ( t ) is maximum is called the optimal threshold.

2.2 Wavelet transform method

Wavelet transform (WT) is a research result of the development of Fourier transform technology, and the Fourier transform is only transformed into different frequencies. The wavelet transform not only has the local characteristics of the Fourier transform but also contains the transform frequency result. The advantage of not changing with the size of the window. Therefore, compared with the Fourier transform, the wavelet transform is more in line with the time-frequency transform. The biggest characteristic of the wavelet transform is that it can better represent the local features of certain features with frequency, and the scale of the wavelet transform can be different. The low-frequency and high-frequency division of the signal makes the feature more focused. This paper mainly uses wavelet transform to analyze the image in different frequency bands to achieve the effect of frequency analysis. The method of wavelet transform can be expressed as follows:

Where ψ ( t ) is the mother wavelet, a is the scale factor, and τ is the translation factor.

Because the image signal is a two-dimensional signal, when using wavelet transform for image analysis, it is necessary to generalize the wavelet transform to two-dimensional wavelet transform. Suppose the image signal is represented by f ( x , y ), ψ ( x , y ) represents a two-dimensional basic wavelet, and ψ a , b , c ( x , y ) represents the scale and displacement of the basic wavelet, that is, ψ a , b , c ( x , y ) can be calculated by the following formula:

According to the above definition of continuous wavelet, the two-dimensional continuous wavelet transform can be calculated by the following formula:

Where \( \overline{\psi \left(x,y\right)} \) is the conjugate of ψ ( x , y ).

2.3 Digital water mark

According to different methods of use, digital watermarking technology can be divided into the following types:

Spatial domain approach: A typical watermarking algorithm in this type of algorithm embeds information into the least significant bits (LSB) of randomly selected image points, which ensures that the embedded watermark is invisible. However, due to the use of pixel bits whose images are not important, the robustness of the algorithm is poor, and the watermark information is easily destroyed by filtering, image quantization, and geometric deformation operations. Another common method is to use the statistical characteristics of the pixels to embed the information in the luminance values of the pixels.

The method of transforming the domain: first calculate the discrete cosine transform (DCT) of the image, and then superimpose the watermark on the front k coefficient with the largest amplitude in the DCT domain (excluding the DC component), usually the low-frequency component of the image. If the first k largest components of the DCT coefficients are represented as D =, i = 1, ..., k, and the watermark is a random real sequence W =, i = 1, ..., k obeying the Gaussian distribution, then the watermark embedding algorithm is di = di(1 + awi), where the constant a is a scale factor that controls the strength of the watermark addition. The watermark image I is then obtained by inverse transforming with a new coefficient. The decoding function calculates the discrete cosine transform of the original image I and the watermark image I * , respectively, and extracts the embedded watermark W * , and then performs correlation test to determine the presence or absence of the watermark.

Compressed domain algorithm: The compressed domain digital watermarking system based on JPEG and MPEG standards not only saves a lot of complete decoding and re-encoding process but also has great practical value in digital TV broadcasting and video on demand (VOD). Correspondingly, watermark detection and extraction can also be performed directly in the compressed domain data.

The wavelet transform used in this paper is the method of transform domain. The main process is: assume that x ( m , n ) is a grayscale picture of M * N , the gray level is 2 a , where M , N and a are positive integers, and the range of values of m and n is defined as follows: 1 ≤ m ≤ M , 1 ≤ n ≤ N . For wavelet decomposition of this image, if the number of decomposition layers is L ( L is a positive integer), then 3* L high-frequency partial maps and a low-frequency approximate partial map can be obtained. Then X k , L can be used to represent the wavelet coefficients, where L is the number of decomposition layers, and K can be represented by H , V , and D , respectively, representing the horizontal, vertical, and diagonal subgraphs. Because the sub-picture distortion of the low frequency is large, the picture embedded in the watermark is removed from the picture outside the low frequency.

In order to realize the embedded digital watermark, we must first divide X K , L ( m i , n j ) into a certain size, and use B ( s , t ) to represent the coefficient block of size s * t in X K , L ( m i , n j ). Then the average value can be expressed by the following formula:

Where ∑ B ( s , t ) is the cumulative sum of the magnitudes of the coefficients within the block.

The embedding of the watermark sequence w is achieved by the quantization of AVG.

The interval of quantization is represented by Δ l according to considerations of robustness and concealment. For the low-level L th layer, since the coefficient amplitude is large, a larger interval can be set. For the other layers, starting from the L -1 layer, they are successively decremented.

According to w i = {0, 1}, AVG is quantized to the nearest singular point, even point, D ( i , j ) is used to represent the wavelet coefficients in the block, and the quantized coefficient is represented by D ( i , j ) ' , where i = 1, 2,. .., s ; j = 1,2,..., t . Suppose T = AVG /Δ l , TD = rem(| T |, 2), where || means rounding and rem means dividing by 2 to take the remainder.

According to whether TD and w i are the same, the calculation of the quantized wavelet coefficient D ( i , j ) ' can be as follows:

Using the same wavelet base, an image containing the watermark is generated by inverse wavelet transform, and the wavelet base, the wavelet decomposition layer number, the selected coefficient region, the blocking method, the quantization interval, and the parity correspondence are recorded to form a key.

The extraction of the watermark is determined by the embedded method, which is the inverse of the embedded mode. First, wavelet transform is performed on the image to be detected, and the position of the embedded watermark is determined according to the key, and the inverse operation of the scramble processing is performed on the watermark.

2.4 Evaluation method

Filter normalized mean square error.

In order to measure the effect before and after filtering, this paper chooses the normalized mean square error M description. The calculation method of M is as follows:

where N 1 and N 2 are Pixels before and after normalization.

Normalized cross-correlation function

The normalized cross-correlation function is a classic algorithm of image matching algorithm, which can be used to represent the similarity of images. The normalized cross-correlation is determined by calculating the cross-correlation metric between the reference map and the template graph, generally expressed by NC( i , j ). If the NC value is larger, it means that the similarity between the two is greater. The calculation formula for the cross-correlation metric is as follows:

where T ( m , n ) is the n th row of the template image, the m th pixel value; S ( i , j ) is the part under the template cover, and i , j is the coordinate of the lower left corner of the subgraph in the reference picture S.

Normalize the above formula NC according to the following formula:

Peak signal-to-noise ratio

Peak signal-to-noise ratio is often used as a measure of signal reconstruction quality in areas such as image compression, which is often simply defined by mean square error (MSE). Two m × n monochrome images I and K , if one is another noise approximation, then their mean square error is defined as:

Then the peak signal-to-noise ratio PSNR calculation method is:

Where Max is the maximum value of the pigment representing the image.

Information entropy

For a digital signal of an image, the frequency of occurrence of each pixel is different, so it can be considered that the image digital signal is actually an uncertainty signal. For image encryption, the higher the uncertainty of the image, the more the image tends to be random, the more difficult it is to crack. The lower the rule, the more regular it is, and the more likely it is to be cracked. For a grayscale image of 256 levels, the maximum value of information entropy is 8, so the more the calculation result tends to be 8, the better.

The calculation method of information entropy is as follows:

Correlation

Correlation is a parameter describing the relationship between two vectors. This paper describes the relationship between two images before and after image encryption by correlation. Assuming p ( x , y ) represents the correlation between pixels before and after encryption, the calculation method of p ( x , y ) can be calculated by the following formula:

3 Experiment

3.1 image parameter.

The images used in this article are all from the life photos, the shooting tool is Huawei meta 10, the picture size is 1440*1920, the picture resolution is 96 dbi, the bit depth is 24, no flash mode, there are 300 pictures as simulation pictures, all of which are life photos, and no special photos.

3.2 System environment

The computer system used in this simulation is Windows 10, and the simulation software used is MATLAB 2014B.

3.3 Wavelet transform-related parameters

For unified modeling, the wavelet decomposition used in this paper uses three layers of wavelet decomposition, and Daubechies is chosen as the wavelet base. The Daubechies wavelet is a wavelet function constructed by the world-famous wavelet analyst Ingrid Daubechies. They are generally abbreviated as dbN, where N is the order of the wavelet. The support region in the wavelet function Ψ( t ) and the scale function ϕ ( t ) is 2 N-1, and the vanishing moment of Ψ( t ) is N . The dbN wavelet has good regularity, that is, the smooth error introduced by the wavelet as a sparse basis is not easy to be detected, which makes the signal reconstruction process smoother. The characteristic of the dbN wavelet is that the order of the vanishing moment increases with the increase of the order (sequence N), wherein the higher the vanishing moment, the better the smoothness, the stronger the localization ability of the frequency domain, and the better the band division effect. However, the support of the time domain is weakened, and the amount of calculation is greatly increased, and the real-time performance is deteriorated. In addition, except for N = 1, the dbN wavelet does not have symmetry (i.e., nonlinear phase), that is, a certain phase distortion is generated when the signal is analyzed and reconstructed. N = 3 in this article.

4 Results and discussion

4.1 results 1: image filtering using wavelet transform.

In the process of image recording, transmission, storage, and processing, it is possible to pollute the image signal. The digital signal transmitted to the image will appear as noise. These noise data will often become isolated pixels. One-to-one isolated points, although they do not destroy the overall external frame of the image, but because these isolated points tend to be high in frequency, they are portable on the image as a bright spot, which greatly affects the viewing quality of the image, so to ensure the effect of image processing, the image must be denoised. The effective method of denoising is to remove the noise of a certain frequency of the image by filtering, but the denoising must ensure that the noise data can be removed without destroying the image. Figure 1 is the result of filtering the graph using the wavelet transform method. In order to test the wavelet filtering effect, this paper adds Gaussian white noise to the original image. Comparing the white noise with the frequency analysis of the original image, it can be seen that after the noise is added, the main image frequency segment of the original image is disturbed by the noise frequency, but after filtering using the wavelet transform, the frequency band of the main frame of the original image appears again. However, the filtered image does not change significantly compared to the original image. The normalized mean square error before and after filtering is calculated, and the M value before and after filtering is 0.0071. The wavelet transform is well protected to protect the image details, and the noise data is better removed (the white noise is 20%).

Image denoising results comparison. (The first row from left to right are the original image, plus the noise map and the filtered map. The second row from left to right are the frequency distribution of the original image, the frequency distribution of the noise plus the filtered Frequency distribution)

4.2 Results 2: digital watermark encryption based on wavelet transform

As shown in Fig. 2 , the watermark encryption process based on wavelet transform can be seen from the figure. Watermarking the image by wavelet transform does not affect the structure of the original image. The noise is 40% of the salt and pepper noise. For the original image and the noise map, the wavelet transform method can extract the watermark well.

Comparison of digital watermark before and after. (The first row from left to right are the original image, plus noise and watermark, and the noise is removed; the second row are the watermark original, the watermark extracted from the noise plus watermark, and the watermark extracted after denoising)

According to the method described in this paper, the image correlation coefficient and peak-to-noise ratio of the original image after watermarking are calculated. The correlation coefficient between the original image and the watermark is 0.9871 (the first column and the third column in the first row in the figure). The watermark does not destroy the structure of the original image. The signal-to-noise ratio of the original picture is 33.5 dB, and the signal-to-noise ratio of the water-jet printing is 31.58SdB, which proves that the wavelet transform can achieve watermark hiding well. From the second row of watermarking results, the watermark extracted from the image after noise and denoising, and the original watermark correlation coefficient are (0.9745, 0.9652). This shows that the watermark signal can be well extracted after being hidden by the wavelet transform.

4.3 Results 3: image encryption based on wavelet transform

In image transmission, the most common way to protect image content is to encrypt the image. Figure 3 shows the process of encrypting and decrypting an image using wavelet transform. It can be seen from the figure that after the image is encrypted, there is no correlation with the original image at all, but the decrypted image of the encrypted image reproduces the original image.

Image encryption and decryption process diagram comparison. (The left is the original image, the middle is the encrypted image, the right is the decryption map)

The information entropy of Fig. 3 is calculated. The results show that the information entropy of the original image is 3.05, the information entropy of the decrypted graph is 3.07, and the information entropy of the encrypted graph is 7.88. It can be seen from the results of information entropy that before and after encryption. The image information entropy is basically unchanged, but the information entropy of the encrypted image becomes 7.88, indicating that the encrypted image is close to a random signal and has good confidentiality.

4.4 Result 4: image compression

Image data can be compressed because of the redundancy in the data. The redundancy of image data mainly manifests as spatial redundancy caused by correlation between adjacent pixels in an image; time redundancy due to correlation between different frames in an image sequence; spectral redundancy due to correlation of different color planes or spectral bands. The purpose of data compression is to reduce the number of bits required to represent the data by removing these data redundancy. Since the amount of image data is huge, it is very difficult to store, transfer, and process, so the compression of image data is very important. Figure 4 shows the result of two compressions of the original image. It can be seen from the figure that although the image is compressed, the main frame of the image does not change, but the image sharpness is significantly reduced. The Table 1 shows the compressed image properties.

Image comparison before and after compression. (left is the original image, the middle is the first compression, the right is the second compression)

It can be seen from the results in Table 1 that after multiple compressions, the size of the image is significantly reduced and the image is getting smaller and smaller. The original image needs 2,764,800 bytes, which is reduced to 703,009 after a compression, which is reduced by 74.5%. After the second compression, only 182,161 is left, which is 74.1% lower. It can be seen that the wavelet transform can achieve image compression well.

5 Conclusion

With the development of informatization, today’s era is an era full of information. As the visual basis of human perception of the world, image is an important means for humans to obtain information, express information, and transmit information. Digital image processing, that is, processing images with a computer, has a long history of development. Digital image processing technology originated in the 1920s, when a photo was transmitted from London, England to New York, via a submarine cable, using digital compression technology. First of all, digital image processing technology can help people understand the world more objectively and accurately. The human visual system can help humans get more than 3/4 of the information from the outside world, and images and graphics are the carriers of all visual information, despite the identification of the human eye. It is very powerful and can recognize thousands of colors, but in many cases, the image is blurred or even invisible to the human eye. Image enhancement technology can make the blurred or even invisible image clear and bright. There are also some relevant research results on this aspect of research, which proves that relevant research is feasible [ 26 , 27 ].

It is precisely because of the importance of image processing technology that many researchers have begun research on image processing technology and achieved fruitful results. However, with the deepening of image processing technology research, today’s research has a tendency to develop in depth, and this depth is an in-depth aspect of image processing technology. However, the application of image processing technology is a system engineering. In addition to the deep requirements, there are also systematic requirements. Therefore, if the unified model research on multiple aspects of image application will undoubtedly promote the application of image processing technology. Wavelet transform has been successfully applied in many fields of image processing technology. Therefore, this paper uses wavelet transform as a method to establish a unified model based on wavelet transform. Simulation research is carried out by filtering, watermark hiding, encryption and decryption, and image compression of image processing technology. The results show that the model has achieved good results.

Abbreviations

Cellular automata

Computer generated hologram

Discrete cosine transform

Embedded Prediction Wavelet Image Coder

Human visual system

Least significant bits

Video on demand

Wavelet transform

H.W. Zhang, The research and implementation of image Denoising method based on Matlab[J]. Journal of Daqing Normal University 36 (3), 1-4 (2016)

J.H. Hou, J.W. Tian, J. Liu, Analysis of the errors in locally adaptive wavelet domain wiener filter and image Denoising[J]. Acta Photonica Sinica 36 (1), 188–191 (2007)

Google Scholar

M. Lebrun, An analysis and implementation of the BM3D image Denoising method[J]. Image Processing on Line 2 (25), 175–213 (2012)

Article Google Scholar

A. Fathi, A.R. Naghsh-Nilchi, Efficient image Denoising method based on a new adaptive wavelet packet thresholding function[J]. IEEE transactions on image processing a publication of the IEEE signal processing. Society 21 (9), 3981 (2012)

MATH Google Scholar

X. Zhang, X. Feng, W. Wang, et al., Gradient-based wiener filter for image denoising [J]. Comput. Electr. Eng. 39 (3), 934–944 (2013)

T. Chen, K.K. Ma, L.H. Chen, Tri-state median filter for image denoising.[J]. IEEE Transactions on Image Processing A Publication of the IEEE Signal Processing Society 8 (12), 1834 (1999)

S.M.M. Rahman, M.K. Hasan, Wavelet-domain iterative center weighted median filter for image denoising[J]. Signal Process. 83 (5), 1001–1012 (2003)

Article MATH Google Scholar

H.L. Eng, K.K. Ma, Noise adaptive soft-switching median filter for image denoising[C]// IEEE International Conference on Acoustics, Speech, and Signal Processing, 2000. ICASSP '00. Proceedings. IEEE 4 , 2175–2178 (2000)

S.G. Chang, B. Yu, M. Vetterli, Adaptive wavelet thresholding for image denoising and compression[J]. IEEE transactions on image processing a publication of the IEEE signal processing. Society 9 (9), 1532 (2000)

M. Kivanc Mihcak, I. Kozintsev, K. Ramchandran, et al., Low-complexity image Denoising based on statistical modeling of wavelet Coecients[J]. IEEE Signal Processing Letters 6 (12), 300–303 (1999)

J.H. Wu, F.Z. Lin, Image authentication based on digital watermarking[J]. Chinese Journal of Computers 9 , 1153–1161 (2004)

MathSciNet Google Scholar

A. Wakatani, Digital watermarking for ROI medical images by using compressed signature image[C]// Hawaii international conference on system sciences. IEEE (2002), pp. 2043–2048

A.H. Paquet, R.K. Ward, I. Pitas, Wavelet packets-based digital watermarking for image verification and authentication [J]. Signal Process. 83 (10), 2117–2132 (2003)

Z. Wen, L.I. Taoshen, Z. Zhang, An image encryption technology based on chaotic sequences[J]. Comput. Eng. 31 (10), 130–132 (2005)

Y.Y. Wang, Y.R. Wang, Y. Wang, et al., Optical image encryption based on binary Fourier transform computer-generated hologram and pixel scrambling technology[J]. Optics & Lasers in Engineering 45 (7), 761–765 (2007)

X.Y. Zhang, C. Wang, S.M. Li, et al., Image encryption technology on two-dimensional cellular automata[J]. Journal of Optoelectronics Laser 19 (2), 242–245 (2008)

A.S. Lewis, G. Knowles, Image compression using the 2-D wavelet transform[J]. IEEE Trans. Image Process. 1 (2), 244–250 (2002)

R.A. Devore, B. Jawerth, B.J. Lucier, Image compression through wavelet transform coding[J]. IEEE Trans.inf.theory 38 (2), 719–746 (1992)

Article MathSciNet MATH Google Scholar

R.W. Buccigrossi, E.P. Simoncelli, Image compression via joint statistical characterization in the wavelet domain[J]. IEEE transactions on image processing a publication of the IEEE signal processing. Society 8 (12), 1688–1701 (1999)

A.A. Cruzroa, J.E. Arevalo Ovalle, A. Madabhushi, et al., A deep learning architecture for image representation, visual interpretability and automated basal-cell carcinoma cancer detection. Med Image Comput Comput Assist Interv. 16 , 403–410 (2013)

S.P. Mohanty, D.P. Hughes, M. Salathé, Using deep learning for image-based plant disease detection[J]. Front. Plant Sci. 7 , 1419 (2016)

B. Sahiner, H. Chan, D. Wei, et al., Image feature selection by a genetic algorithm: application to classification of mass and normal breast tissue[J]. Med. Phys. 23 (10), 1671 (1996)

B. Bhanu, S. Lee, J. Ming, Adaptive image segmentation using a genetic algorithm[J]. IEEE Transactions on Systems Man & Cybernetics 25 (12), 1543–1567 (2002)

Y. Egusa, H. Akahori, A. Morimura, et al., An application of fuzzy set theory for an electronic video camera image stabilizer[J]. IEEE Trans. Fuzzy Syst. 3 (3), 351–356 (1995)

K. Hasikin, N.A.M. Isa, Enhancement of the low contrast image using fuzzy set theory[C]// Uksim, international conference on computer modelling and simulation. IEEE (2012), pp. 371–376

P. Yang, Q. Li, Wavelet transform-based feature extraction for ultrasonic flaw signal classification. Neural Comput. & Applic. 24 (3–4), 817–826 (2014)

R.K. Lama, M.-R. Choi, G.-R. Kwon, Image interpolation for high-resolution display based on the complex dual-tree wavelet transform and hidden Markov model. Multimedia Tools Appl. 75 (23), 16487–16498 (2016)

Download references

Acknowledgements

The authors thank the editor and anonymous reviewers for their helpful comments and valuable suggestions.

This work was supported by

Shandong social science planning research project in 2018

Topic: The Application of Shandong Folk Culture in Animation in The View of Digital Media (No. 18CCYJ14).

Shandong education science 12th five-year plan 2015

Topic: Innovative Research on Stop-motion Animation in The Digital Media Age (No. YB15068).

Shandong education science 13th five-year plan 2016–2017

Approval of “Ports and Arts Education Special Fund”: BCA2017017.

Topic: Reform of Teaching Methods of Hand Drawn Presentation Techniques (No. BCA2017017).

National Research Youth Project of state ethnic affairs commission in 2018

Topic: Protection and Development of Villages with Ethnic Characteristics Under the Background of Rural Revitalization Strategy (No. 2018-GMC-020).

Availability of data and materials

Authors can provide the data.

About the authors

Zaozhuang University, No. 1 Beian Road., Shizhong District, Zaozhuang City, Shandong, P.R. China.

Lina, Zhang was born in Jining, Shandong, P.R. China, in 1983. She received a Master degree from Bohai University, P.R. China. Now she works in School of Media, Zaozhuang University, P.R. China. Her research interests include animation and Digital media art.

Lijuan, Zhang was born in Jining, Shandong, P.R. China, in 1983. She received a Master degree from Jingdezhen Ceramic Institute, P.R. China. Now she works in School of Fine Arts and Design, Zaozhuang University, P.R. China. Her research interests include Interior design and Digital media art.

Liduo, Zhang was born in Zaozhuang, Shandong, P.R. China, in 1982. He received a Master degree from Monash University, Australia. Now he works in School of economics and management, Zaozhuang University. His research interests include Internet finance and digital media.

Author information

Authors and affiliations.

School of Media, Zaozhuang University, Zaozhuang, Shandong, China

School of Fine Arts and Design, Zaozhuang University, Zaozhuang, Shandong, China

Lijuan Zhang

School of Economics and Management, Zaozhuang University, Zaozhuang, Shandong, China

Liduo Zhang

You can also search for this author in PubMed Google Scholar

Contributions

All authors take part in the discussion of the work described in this paper. The author LZ wrote the first version of the paper. The author LZ and LZ did part experiments of the paper, LZ revised the paper in different version of the paper, respectively. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Lijuan Zhang .

Ethics declarations

Competing interests.

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/ ), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Cite this article.

Zhang, L., Zhang, L. & Zhang, L. Application research of digital media image processing technology based on wavelet transform. J Image Video Proc. 2018 , 138 (2018). https://doi.org/10.1186/s13640-018-0383-6

Download citation

Received : 28 September 2018

Accepted : 23 November 2018

Published : 05 December 2018

DOI : https://doi.org/10.1186/s13640-018-0383-6

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Image processing
Digital watermark
Image denoising
Image encryption
Image compression

Image processing and pattern recognition in industrial engineering

Sensor Review

ISSN : 0260-2288

Article publication date: 29 March 2011

Du, Z. (2011), "Image processing and pattern recognition in industrial engineering", Sensor Review , Vol. 31 No. 2. https://doi.org/10.1108/sr.2011.08731baa.002

Emerald Group Publishing Limited

Article Type: Viewpoint From: Sensor Review, Volume 31, Issue 2

Along with the information superhighway, digital globe concept’s statement and the internet’s widespread application, image information has become an important source, and an important means, of human access to information. As a result, the demands on image processing and pattern recognition technology grows day by day.

Currently, image processing and pattern recognition have become an object of study and research in areas such as the engineering, computer science, information science, statistics, physics, biology, chemistry, medicine and even in the fields of social science. Therefore, image processing and pattern recognition technology use by other disciplines are inevitably increasing.

Recently, there is a growing demand for image processing and pattern recognition in various application areas, such as remote sensing, multimedia computing, secured image data communication, biomedical imaging, texture understanding, content-based image retrieval, image compression, and so on. As a result, the challenge to scientists, engineers and business people is to quickly extract valuable information from raw image data. This is the primary purpose of image processing and pattern recognition.

In electrical engineering and computer science, image processing is any form of signal processing for which the input is an image, such as a photograph or video frame. The output of image processing may be either an image or a set of characteristics or parameters related to the image. Most image-processing techniques involve treating the image as a two-dimensional signal and applying standard signal-processing techniques to it. A digital image is composed of a grid of pixels and stored as an array. A single pixel represents a value of either light intensity or color. Images are processed to obtain information beyond what is apparent given the image’s initial pixel values.

Image-processing tasks can include any combination of the following: modifying the image view, adding dimensionality to image data, working with masks and calculating statistics, warping images, specifying regions of interest, manipulating images in various domains, enhancing contrast and filtering, extracting and analyzing shapes, and so on.

Pattern recognition techniques are concerned with the theory and algorithms for putting abstract objects, e.g. measurements made on physical objects, into categories. Methods of pattern recognition are useful in many applications such as information retrieval, data mining, document image analysis and recognition, computational linguistics, forensics, biometrics and bioinformatics.

Pattern recognition is the science and art of giving names to the natural objects in the real world. It is often considered part of artificial intelligence. However, the problem here is even more challenging because the observations are not in symbolic form and often contain much variability and noise: another term for pattern recognition is artificial perception. Typical inputs to a pattern recognition system are images or sound signals, out of which the relevant objects have to be found and identified. The pattern recognition solution involves many stages such as making the measurements, processing and segmentation, finding a suitable numerical representation for the object we are interested in, and finally classifying them based on these representation.

Image processing and pattern recognition technology is also a closely linked with the national economy and science; it has brought huge economy and social efficiency to humanity. In the near future, image processing and pattern recognition technology will have not only a more thorough development theoretically, but will also be an indispensable and powerful tool in the application of scientific research, and for our everyday lives. In our information-based society, image processing and pattern recognition have huge potential, both in theory and in practice.

Zhenyu Du Professor at the Information Technology and Industrial Engineering Research Center (ITTE), Hong Kong, China

All feedback is valuable.

Please share your general feedback

Report an issue or find answers to frequently asked questions

Contact Customer Support

Information

Author Services

Initiatives

You are accessing a machine-readable page. In order to be human-readable, please install an RSS reader.

All articles published by MDPI are made immediately available worldwide under an open access license. No special permission is required to reuse all or part of the article published by MDPI, including figures and tables. For articles published under an open access Creative Common CC BY license, any part of the article may be reused without permission provided that the original article is clearly cited. For more information, please refer to https://www.mdpi.com/openaccess .

Feature papers represent the most advanced research with significant potential for high impact in the field. A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for future research directions and describes possible research applications.

Feature papers are submitted upon individual invitation or recommendation by the scientific editors and must receive positive feedback from the reviewers.

Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to readers, or important in the respective research area. The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal.

Original Submission Date Received: .

Active Journals
Find a Journal
Proceedings Series
For Authors
For Reviewers
For Editors
For Librarians
For Publishers
For Societies
For Conference Organizers
Open Access Policy
Institutional Open Access Program
Special Issues Guidelines
Editorial Process
Research and Publication Ethics
Article Processing Charges
Testimonials
Preprints.org
SciProfiles
Encyclopedia

Journal Menu

Electronics Home
Aims & Scope
Editorial Board
Reviewer Board
Topical Advisory Panel
Instructions for Authors
Special Issues
Sections & Collections
Article Processing Charge
Indexing & Archiving
Editor’s Choice Articles
Most Cited & Viewed
Journal Statistics
Journal History
Journal Awards
Society Collaborations
Conferences
Editorial Office

Journal Browser

arrow_forward_ios Forthcoming issue arrow_forward_ios Current issue
Vol. 13 (2024)
Vol. 12 (2023)
Vol. 11 (2022)
Vol. 10 (2021)
Vol. 9 (2020)
Vol. 8 (2019)
Vol. 7 (2018)
Vol. 6 (2017)
Vol. 5 (2016)
Vol. 4 (2015)
Vol. 3 (2014)
Vol. 2 (2013)
Vol. 1 (2012)

Find support for a specific problem in the support section of our website.

Please let us know what you think of our products and services.

Visit our dedicated information section to learn more about MDPI.

Artificial Intelligence (AI) for Image Processing

Print Special Issue Flyer
Special Issue Editors

Special Issue Information

Published Papers

A special issue of Electronics (ISSN 2079-9292). This special issue belongs to the section " Computer Science & Engineering ".

Deadline for manuscript submissions: closed (28 February 2023) | Viewed by 27433

Share This Special Issue

Special issue editor.

Dear Colleagues,

We launch this Special Issue on “Artificial Intelligence (AI) for image Processing”. AI methods are widely adopted by academics and the industry for different applications, including image processing applications, such as image segmentation, classification, and recognition. There are different AI technologies, such as machine learning, metaheuristic optimization algorithms (including swarm intelligence (SI) and bio-inspired algorithms), knowledge, and expert systems.

This Special Issue presents a forum for the publication of articles describing the use of classical and modern artificial intelligence methods in image processing applications.

The main aim of this Special Issue is to capture recent contributions of high-quality papers focusing on advanced image processing and analysis applications, including medical images, remote sensing images, galaxy images, and others. We are pleased to invite our colleagues to contribute original research papers as well as review papers that focus on the applications of artificial intelligence methods, including traditional machine learning methods, advanced deep learning approaches, metaheuristic optimization algorithms, and other AI-based methods for solving image processing problems.

The topics of this Special Issue include (but are not limited to) the following:

Image classification and recognition
Machine learning for image processing
Metaheuristic optimization algorithms for image processing
Remote sensing image classification
Medical image classification
Neural computing for image processing
Evolutionary algorithms for image processing

Dr. Mohammed A. A. Al-qaness Guest Editor

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website . Once you are registered, click here to go to the submission form . Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Electronics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Image processing
Deep learning
Metaheuristic algorithms
Swarm intelligence
Medical image processing
Image segmentation

Published Papers (8 papers)

Graphical abstract

Further Information

Mdpi initiatives, follow mdpi.

Subscribe to receive issue release notifications and newsletters from MDPI journals

IEEE Account

Change Username/Password
Update Address

Purchase Details

Payment Options
Order History
View Purchased Documents

Profile Information

Communications Preferences
Profession and Education
Technical Interests
US & Canada: +1 800 678 4333
Worldwide: +1 732 981 0060
Contact & Support
About IEEE Xplore
Accessibility
Terms of Use
Nondiscrimination Policy
Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

Help | Advanced Search

Computer Science > Artificial Intelligence

Title: vision language models are blind.

Abstract: Large language models with vision capabilities (VLMs), e.g., GPT-4o and Gemini 1.5 Pro are powering countless image-text applications and scoring high on many vision-understanding benchmarks. We propose BlindTest, a suite of 7 visual tasks absurdly easy to humans such as identifying (a) whether two circles overlap; (b) whether two lines intersect; (c) which letter is being circled in a word; and (d) counting the number of circles in a Olympic-like logo. Surprisingly, four state-of-the-art VLMs are, on average, only 56.20% accurate on our benchmark, with \newsonnet being the best (73.77% accuracy). On BlindTest, VLMs struggle with tasks that requires precise spatial information and counting (from 0 to 10), sometimes providing an impression of a person with myopia seeing fine details as blurry and making educated guesses. Code is available at: this https URL

Subjects:	Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Cite as:	[cs.AI]
	(or [cs.AI] for this version)
	Focus to learn more arXiv-issued DOI via DataCite

Submission history

Access paper:.

HTML (experimental)
Other Formats

References & Citations

Google Scholar
Semantic Scholar

BibTeX formatted citation

Bibliographic and Citation Tools

Code, data and media associated with this article, recommenders and search tools.

Institution

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .

Visit www.aminer.cn directly for more AI services.

Go to AMiner

error code: {{errorCode}}

Logging off

Search for peer-reviewed journals with full access.

Discover the SciOpen Platform and Achieve Your Research Goals with Ease.

Stable diffusion for high-quality image reconstruction in digital rock analysis

Digital rock analysis is a promising approach for visualizing geological microstructures and understanding transport mechanisms for underground geo-energy resources exploitation. Accurate image reconstruction methods are vital for capturing the diverse features and variability in digital rock samples. Stable diffusion, a cutting-edge artificial intelligence model, has revolutionized computer vision by creating realistic images. However, its application in digital rock analysis is still emerging. This study explores the applications of stable diffusion in digital rock analysis, including enhancing image resolution, improving quality with denoising and deblurring, segmenting images, filling missing sections, extending images with outpainting, and reconstructing three-dimensional rocks from two-dimensional images. The powerful image generation capability of diffusion models shed light on digital rock analysis, showing potential in filling missing parts of rock images, lithologic discrimination, and generating network parameters. In addition, limitations in existing stable diffusion models are also discussed, including the lack of real digital rock images, and not being able to fully understand the mechanisms behind physical processes. Therefore, it is suggested to develop new models tailored to digital rock images for further progress. In sum, the integration of stable diffusion into digital core analysis presents immense research opportunities and holds the potential to transform the field, ushering in groundbreaking advances.

No abstract is available for this article. Click the button above to view the PDF directly.

Ahuja, V. R., Gupta, U., Rapole, S. R., et al. Siamese-SR: A siamese super-resolution model for boosting resolution of digital rock images for improved petrophysical property estimation. IEEE Transactions on Image Processing, 2022, 31: 3479-3493.

Brooks, T., Peebles, B., Homes, C., et al. Video generation models as world simulators. OpenAI, 2024.

Chang, C., Peng, J., Safari, M., et al. High-resolution MRI synthesis using a data-driven framework with denoising diffusion probabilistic modeling. Physics in Medicine & Biology, 2024, 69(4): 045001.

Chen, H., He, X., Teng, Q., et al. Super-resolution of real-world rock microcomputed tomography images using cycle-consistent generative adversarial networks. Physical Review E, 2020, 101(2): 023305.

Chi, P., Sun, J., Luo, X., et al. Reconstruction of 3D digital rocks with controllable porosity using CVAE-GAN. Geoenergy Science and Engineering, 2023, 230: 212264.

Croitoru, F., Hondru, V., Ionescu, R., et al. Diffusion models in vision: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(9): 10850-10869.

Feng, J., Teng, Q., Li, B., et al. An end-to-end three-dimensional reconstruction framework of porous media from a single two-dimensional image based on deep learning. Computer Methods in Applied Mechanics and Engineering, 2020, 368: 113043.

Gong, K., Johnson, K., El Fakhri, G., et al. PET image denoising based on denoising diffusion probabilistic model. European Journal of Nuclear Medicine and Molecular Imaging, 2023, 51: 358-368.

Goral, J., Panja, P., Deo, M., et al. Confinement effect on porosity and permeability of shales. Scientific Reports, 2020, 10(1): 49.

Ho, J., Jain, A., Abbeel, P. Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 2020, 33: 6840-6851.

Imamverdiyev, Y., Sukhostat, L. Lithological facies classification using deep convolutional neural network. Journal of Petroleum Science and Engineering, 2019, 174: 216-228.

Kazerouni, A., Aghdam, E. K., Heidari, M., et al. Diffusion models in medical imaging: A comprehensive survey. Medical Image Analysis, 2023, 88: 102846.

Li, X., Li, B., Liu, F., et al. Advances in the application of deep learning methods to digital rock technology. Advances in Geo-Energy Research, 2023, 8(1): 5-18.

Li, Z., Deng, S., Hong, Y., et al. A novel hybrid CNN-SVM method for lithology identification in shale reservoirs based on logging measurements. Journal of Applied Geophysics, 2024, 223: 105346.

Liao, Q., Xue, L., Wang, B., Lei, G. A new upscaling method for microscopic fluid flow based on digital rocks. Advances in Geo-Energy Research, 2022, 6(4): 357-358.

Liu, X., Su, S., Gu, W., et al. Super-resolution reconstruction of ct images based on multi-scale information fused generative adversarial networks. Annals of Biomedical Engineering, 2024, 52(1): 57-70.

Lu, C., Zhou, Y., Bao, F., et al. DPM-solver: A fast ODE solver for diffusion probabilistic model sampling in around 10 steps. Advances in Neural Information Processing Systems, 2022, 35: 5775-5787.

Mao, S., He, Y., Chen, H., et al. High-quality and high-diversity conditionally generative ghost imaging based on denoising diffusion probabilistic model. Optics Express, 2023, 31(15): 25104-25116.

Pan, S., Wang, T., Qiu, R. L., et al. 2D medical image synthesis using transformer-based denoising diffusion probabilistic model. Physics in Medicine & Biology, 2023, 68(10): 105004.

Shan, L., Bai, X., Liu, C., et al. Super-resolution reconstruction of digital rock CT images based on residual attention mechanism. Advances in Geo-Energy Research, 2022, 6(2): 157-168.

Wang, Y. D., Armstrong, R. T., Mostaghimi, P. Enhancing resolution of digital rock images with super resolution convolutional neural networks. Journal of Petroleum Science and Engineering, 2019, 182: 106261.

Yan, Z., Zhou, C., Li, X. A survey on generative diffusion model. Computer Science, 2024, 51(1): 273-283. (in Chinese)

Zha, W., Li, X., Li, D., et al. Shale digital core image generation based on generative adversarial networks. Journal of Energy Resources Technology, 2021, 143(3): 033003.

Zhang, K., Li, Y., Liang, J., et al. Practical blind image denoising via Swin-Conv-UNet and data synthesis. Machine Intelligence Research, 2023a, 20(6): 822-836.

Zhao, J., Chen, H., Li, N., et al. Research advance of petrophysical application based on digital core technology. Progress in Geophysics, 2020, 35(3): 1099-1108.

Zhao, J., Wang, F., Cai, J. 3D tight sandstone digital rock reconstruction with deep learning. Journal of Petroleum Science and Engineering, 2021, 207: 109020.

Zheng, Q., Zhang, D. RockGPT: Reconstructing three-dimensional digital rocks from single two-dimensional slice with deep learning. Computational Geosciences, 2022, 26(3): 677-696.

Web of Science

This article is distributed under the terms and conditions of the Creative Commons Attribution (CC BY-NC-ND) license, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

京ICP备 10035462号-42

Enhanced speech emotion recognition using averaged valence arousal dominance mapping and deep neural networks

Original Paper
Published: 10 July 2024

Cite this article

Davit Rizhinashvili 1 ,
Abdallah Hussein Sham 2 &
Gholamreza Anbarjafari 3 , 4 , 5

This study delves into advancements in speech emotion recognition (SER) by establishing a novel approach for emotion mapping and prediction using the Valence-Arousal-Dominance (VAD) model. Central to this research is the creation of reliable emotion-to-VAD mappings, achieved by averaging outcomes from multiple pre-trained networks applied to the RAVDESS dataset. This approach adeptly resolves prior inconsistencies in emotion-to-VAD mappings and establishes a dependable framework for SER. The study also introduces a refined SER model, integrating the pre-trained Wave2Vec 2.0 with Long Short-Term Memory (LSTM) networks and linear layers, culminating in an output layer representing valence, arousal, and dominance. Notably, this model exhibits commendable accuracy across various datasets, such as RAVDESS, EMO-DB, CREMA-D, and TESS, thereby showcasing its robustness and adaptability, an improvement over earlier models susceptible to dataset-specific overfitting. The research further unveils a comprehensive speech analysis application, adept at denoising, segmenting, and profiling emotions in speech segments. This application features interactive emotion tracking and sentiment reports, illustrating its practicality in diverse applications. The study recognizes ongoing challenges in SER, especially in managing the subjective nature of emotion perception and integrating multimodal data. Although the research marks a progression in SER technology, it underscores the need for continuous research and careful consideration of ethical aspects in deploying such technologies. This work contributes to the SER domain by introducing a dependable method for emotion mapping, a robust model for emotion recognition, and a user-friendly application for practical implementations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save.

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

Data availability

Not applicable.

https://github.com/coqui-ai/open-speech-corpora .

https://huggingface.co/ehcalabres/wav2vec2-lg-xlsr-en-speech-emotion-recognition .

Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Kollias, S., Fellenz, W., Taylor, J.G.: Emotion recognition in human-computer interaction. IEEE Signal Process. Mag. 18 (1), 32–80 (2001)

Article Google Scholar

Zhou, Y., Sun, Y., Zhang, J., Yan, Y.: Speech emotion recognition using both spectral and prosodic features. In: International Conference on Information Engineering and Computer Science. IEEE 2009 , 1–4 (2009)

Schneider, S., Baevski, A., Collobert, R., Auli, M.: wav2vec: Unsupervised pre-training for speech recognition, arXiv preprint arXiv:1904.05862 , 2019

Hupka, R.B.: Jealousy: Compound emotion or label for a particular situation? Motiv. Emot. 8 , 141–155 (1984)

Verma, G.K., Tiwary, U.S.: Affect representation and recognition in 3d continuous valence-arousal-dominance space. Multimed. Tools Appl. 76 , 2159–2183 (2017)

Warriner, A.B., Kuperman, V., Brysbaert, M.: Norms of valence, arousal, and dominance for 13,915 English lemmas. Behav. Res. Methods 45 , 1191–1207 (2013)

Baevski, A., Zhou, Y., Mohamed, A., Auli, M.: wav2vec 2.0: a framework for self-supervised learning of speech representations. Adv. Neural Inform. Process. Syst. 33 , 449–460 (2020)

Google Scholar

Frick, R.W.: Communicating emotion: the role of prosodic features. Psychol. Bull. 97 (3), 412 (1985)

Alter, K., Rank, E., Kotz, S.A., Pfeifer, E., Besson, M., Friederici, A.D., Matiasek, J.: On the Relations of Semantic and Acoustic Properties of Emotions . (1999)

Sobin, C., Alpert, M.: Emotion in speech: the acoustic attributes of fear, anger, sadness, and joy. J. Psycholinguist. Res. 28 , 347–365 (1999)

Banse, R., Scherer, K.R.: Acoustic profiles in vocal emotion expression. J. Pers. Soc. Psychol. 70 (3), 614 (1996)

Anagnostopoulos, C., Iliou, T., Giannoukos, I.: Features and classifiers for emotion recognition from speech: a survey from 2000 to 2011. Artif. Intell. Rev. 43 , 155–177 (2015)

Khalil, R.A., Jones, E., Babar, M.I., Jan, T., Zafar, M.H., Alhussain, T.: “Speech emotion recognition using deep learning techniques: A review,” IEEE Access , vol. 7, pp. 117 327–117 345, 2019

Bharti, D., Kukana, P.: “A hybrid machine learning model for emotion recognition from speech signals,” 2020 International Conference on Smart Electronics and Communication (ICOSEC) , pp. 491–496, 2020

Noroozi, F., Sapinski, T., Kaminska, D., Anbarjafari, G.: Vocal-based emotion recognition using random forests and decision tree. Int. J. Speech Technol. 20 , 239–246 (2017)

Anand, N., Verma, P.: Convoluted Feelings Convolutional and Recurrent Nets for Detecting Emotion from Audio Data, (2015)

Chen, M., He, X., Yang, J., Zhang, H.: 3-d convolutional recurrent neural networks with attention model for speech emotion recognition. IEEE Signal Process. Lett. 25 (10), 1440–1444 (2018)

Jiang, P., Fu, H., Tao, H., Lei, P., Zhao, L.: Parallelized convolutional recurrent neural network with spectral features for speech emotion recognition. IEEE Access 7 , 90368–90377 (2019)

Tertychnyi, P., Ozcinar, C., Anbarjafari, G.: Low-quality fingerprint classification using deep neural network. IET Biometrics 7 (6), 550–556 (2018)

Meng, H., Yan, T., Yuan, F., Wei, F.: Speech emotion recognition from 3d log-mel spectrograms with deep learning network. IEEE Access 7 , 125868–125881 (2019)

Sajjad, M., Kwon, S., et al.: Clustering-based speech emotion recognition by incorporating learned features and deep bilstm. IEEE Access 8 , 79861–79875 (2020)

Issa, D., Demirci, M.F., Yazici, A.: Speech emotion recognition with deep convolutional neural networks. Biomed. Signal Process. Control 59 , 101894 (2020)

Livingstone, S.R., Russo, F.A.: The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS), (2018). [Online]. Available: https://doi.org/10.5281/zenodo.1188976

Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W.F., Weiss B. et al.: A database of German emotional speech. In: Interspeech , vol. 5, pp. 1517–1520 (2005)

Paccotacya-Yanque, R.Y., Huanca-Anquise, C.A., Escalante-Calcina, J., Ramos-Lovón, W.R., Cuno-Parari, Á.E.: A speech corpus of Quechua Collao for automatic dimensional emotion recognition. Sci. Data 9 (1), 778 (2022)

Rizhinashvili, D., Sham, A.H., Anbarjafari, G.: Gender neutralisation for unbiased speech synthesising. Electronics 11 (10), 1594 (2022)

Pepino, L., Riera, P., Ferrer, L.: Emotion recognition from speech using wav2vec 2.0 embeddings, arXiv preprint arXiv:2104.03502 , (2021)

Neumann, M., Vu, N.T.: Investigations on audiovisual emotion recognition in noisy conditions. In: IEEE Spoken Language Technology Workshop (SLT). IEEE vol. 2021, pp. 358–364 (2021)

Yannakakis, G.N., Cowie, R., Busso, C.: The ordinal nature of emotions. In: 2017 Seventh International Conference on Affective Computing and Intelligent Interaction (ACII). IEEE, pp. 248–255 (2017)

Cowen, A., Sauter, D., Tracy, J.L., Keltner, D.: Mapping the passions: toward a high-dimensional taxonomy of emotional experience and expression. Psychol. Sci. Public Interest 20 (1), 69–90 (2019)

Buechel, S., Hahn, U.: Representation mapping: a novel approach to generate high-quality multi-lingual emotion lexicons, arXiv preprint arXiv:1807.00775 , 2018

Nandini, D., Yadav, J., Rani, A., Singh, V.: Design of subject independent 3d VAD emotion detection system using EEG signals and machine learning algorithms. Biomed. Signal Process. Control 85 , 104894 (2023)

Dolidze, A., Morozevich, M., Pak, N.: Mapping Speech Intonations to the VAD Model of Emotions. In: Klimov, V.V., Kelley, D.J. (eds.) Biologically Inspired Cognitive Architectures 2021: Proceedings of the 12th Annual Meeting of the BICA Society, pp. 89–95. Springer International Publishing, Cham (2022). https://doi.org/10.1007/978-3-030-96993-6_8

Chapter Google Scholar

Park, S., Kim, J., Ye, S., Jeon, J., Y. H., Park, Oh, A.: Dimensional emotion detection from categorical emotion, arXiv preprint arXiv:1911.02499 , (2019)

Sebe, N., Cohen, I., Huang, T.S.: Multimodal Emotion Recognition. In: Chen, C.H., Wang, P.S.P. (eds.) Handbook of Pattern Recognition and Computer Vision, pp. 387–409. World Scientific, Singapore (2011). https://doi.org/10.1142/9789812775320_0021

S. Haq and P. J. Jackson, Multimodal emotion recognition. In: Machine audition: principles, algorithms and systems . IGI Global, pp. 398–423 (2011)

Gorbova, J., Lusi, I., Litvin, A., Anbarjafari, G.: Automated screening of job candidate based on multimodal video processing. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 29–35 (2017)

Hook, J., Noroozi, F., Toygar, O., Anbarjafari, G.: Automatic speech based emotion recognition using paralinguistics features. In: Bulletin of the Polish Academy of Sciences. Technical Sciences, vol. 67, no. 3, (2019)

Abdullah, S.M.S.A., Ameen, S.Y.A., Sadeeq, M.A., Zeebaree, S.: Multimodal emotion recognition using deep learning. J. Appl. Sci. Technol. Trends 2 (02), 52–58 (2021)

Jaitly, N., Hinton, G.: Learning a better representation of speech soundwaves using restricted boltzmann machines. In: 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp. 5884–5887 (2011)

Lugovic, S., Dundjer, I., Horvat, M.: Techniques and applications of emotion recognition in speech. In: 39th International Convention on Information and Communication Technology, Electronics and Microelectronics (mipro). IEEE 2016 , 1278–1283 (2016)

Cao, H., Cooper, D.G., Keutmann, M.K., Gur, R.C., Nenkova, A., Verma, R.: Crema-d: Crowd-sourced emotional multimodal actors dataset. IEEE Trans. Affect. Comput. 5 (4), 377–390 (2014)

Pichora-Fuller, M.K., Dupuis, K.: Toronto emotional speech set (TESS), (2020). [Online]. Available: https://doi.org/10.5683/SP2/E8H2MF

Wagner, J., Triantafyllopoulos, A., Wierstorf, H., Schmitt, M., Burkhardt, F., Eyben, F., Schuller, B.W.: Dawn of the transformer era in speech emotion recognition: Closing the valence gap. In: IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1–13, (2023)

Download references

Author information

Authors and affiliations.

Institute of Bioengineering, University of Tartu, Tartu, Estonia

Davit Rizhinashvili

Tallinn University, Narva mnt 25, 10120, Tallinn, Estonia

Abdallah Hussein Sham

PwC Advisory, Helsinki, Finland

Gholamreza Anbarjafari

Yildiz Technical University, Istanbul, Turkey

iCV Lab, University of Tartu, Tartu, Estonia

You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Abdallah Hussein Sham .

Ethics declarations

Ethical approval, additional information, publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Rizhinashvili, D., Sham, A.H. & Anbarjafari, G. Enhanced speech emotion recognition using averaged valence arousal dominance mapping and deep neural networks. SIViP (2024). https://doi.org/10.1007/s11760-024-03406-8

Download citation

Received : 11 February 2024

Revised : 11 February 2024

Accepted : 25 June 2024

Published : 10 July 2024

DOI : https://doi.org/10.1007/s11760-024-03406-8

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Speech emotion recognition
Deep neural networks
Speech analysis
Valence arousal dominance
Find a journal
Publish with us
Track your research

Main Navigation

Contact NeurIPS
Code of Ethics
Code of Conduct
Create Profile
Journal To Conference Track
Diversity & Inclusion
Proceedings
Future Meetings
Exhibitor Information
Privacy Policy

NeurIPS 2024, the Thirty-eighth Annual Conference on Neural Information Processing Systems, will be held at the Vancouver Convention Center

Monday Dec 9 through Sunday Dec 15. Monday is an industry expo.

Registration

Pricing » Registration 2024 Registration Cancellation Policy » Certificate of Attendance

Our Hotel Reservation page is currently under construction and will be released shortly. NeurIPS has contracted Hotel guest rooms for the Conference at group pricing, requiring reservations only through this page. Please do not make room reservations through any other channel, as it only impedes us from putting on the best Conference for you. We thank you for your assistance in helping us protect the NeurIPS conference.

Announcements

The call for High School Projects has been released
The Call For Papers has been released
See the Visa Information page for changes to the visa process for 2024.

Latest NeurIPS Blog Entries [ All Entries ]

Jun 19, 2024
Jun 04, 2024
May 17, 2024
May 07, 2024
Apr 17, 2024
Apr 15, 2024
Mar 03, 2024
Dec 11, 2023
Dec 10, 2023
Dec 09, 2023

Important Dates

	Mar 15 '24 11:46 AM PDT *
	Apr 05 '24 (Anywhere on Earth)
	Apr 21 '24 (Anywhere on Earth)
Main Conference Paper Submission Deadline	May 22 '24 01:00 PM PDT *
	May 22 '24 01:00 PM PDT *
	Jun 14 '24 (Anywhere on Earth)
	Aug 02 '24 06:00 PM PDT *
	Sep 05 '24 (Anywhere on Earth)
Main Conference Author Notification	Sep 25 '24 06:00 PM PDT *
Datasets and Benchmarks - Author Notification	Sep 26 '24 (Anywhere on Earth)
Workshop Accept/Reject Notification Date	Sep 29 '24 (Anywhere on Earth)
	Oct 30 '24 (Anywhere on Earth)
	Nov 15 '24 11:00 PM PST *
	Timezone:

If you have questions about supporting the conference, please contact us .

View NeurIPS 2024 exhibitors » Become an 2024 Exhibitor Exhibitor Info »

Organizing Committee

General chair, program chair, workshop chair, workshop chair assistant, tutorial chair, competition chair, data and benchmark chair, affinity chair, diversity, inclusion and accessibility chair, ethics review chair, communication chair, social chair, journal chair, creative ai chair, workflow manager, logistics and it, mission statement.

The Neural Information Processing Systems Foundation is a non-profit corporation whose purpose is to foster the exchange of research advances in Artificial Intelligence and Machine Learning, principally by hosting an annual interdisciplinary academic conference with the highest ethical standards for a diverse and inclusive community.

About the Conference

The conference was founded in 1987 and is now a multi-track interdisciplinary annual meeting that includes invited talks, demonstrations, symposia, and oral and poster presentations of refereed papers. Along with the conference is a professional exposition focusing on machine learning in practice, a series of tutorials, and topical workshops that provide a less formal setting for the exchange of ideas.

More about the Neural Information Processing Systems foundation »

NeurIPS uses cookies to remember that you are logged in. By using our websites, you agree to the placement of cookies.

We are pleased to inform you that our website has recently undergone a relaunch to enhance its functionality and user experience. As we continue to optimize the site for maximum efficiency, we appreciate your understanding and welcome any feedback you may have.

ZEISS is introducing a new integrated system for serial block-face imaging

Understanding cellular ultrastructure in 3d context with zeiss volutome.

The newly introduced ZEISS Volutome is an in-chamber ultramicrotome for ZEISS field emission scanning electron microscopes (FE-SEM). It is designed to image the ultrastructure of biological, resin-embedded samples in 3D in life sciences. Scanning electron microscopy (SEM) in general can be used to explore intricate, ultrastructural 3D information with various methods collectively known as volume EM (vEM). Serial block-face SEM (SBF-SEM) is the vEM technology of choice for researchers who prefer easy sample preparation in combination with a highly automated imaging process enabling the acquisition of large volume datasets. ZEISS Volutome is an end-to-end solution for serial block-face imaging from hardware to software including image processing, segmentation, and visualization. The ultramicrotome can be easily replaced by a conventional SEM stage, converting the 3D FE-SEM into a standard FE-SEM, making the system adaptable to a multi-purpose environment.

Volutome GeminiSEM 6

neural tracing 3neurons

Further press releases.

Inauguration Event, Oxford-ZEISS Centre of Excellence, 21st February 2024

Oxford-ZEISS Centre of Excellence officially opened

Super-resolution microscopy across all scales and applications

IMAGES

😊 Research paper on digital image processing. Digital Image Processing
(PDF) Review Paper on Image Processing Techniques
(PDF) Medical Image Processing-An Introduction
Review Paper on Image Processing Techniques
(PDF) Digital Image Processing Using Machine Learning
Computer Digital Image Processing Application and Research

VIDEO

Introduction to Digital Image Processing (Part-2)
Introduction to Image Processing using OpenCV
Managing Student Research
Digital image processing|| Working || Advantages & disadvantages || Applications|| complete
Applications of Digital Image Processing
Part 1: Introduction to Image Processing, Definition Digital image, Applications

COMMENTS

Image Processing: Research Opportunities and Challenges
Image Processing: Research O pportunities and Challenges. Ravindra S. Hegadi. Department of Computer Science. Karnatak University, Dharwad-580003. ravindrahegadi@rediffmail. Abstract. Interest in ...
Techniques and Applications of Image and Signal Processing : A
This paper comprehensively overviews image and signal processing, including their fundamentals, advanced techniques, and applications. Image processing involves analyzing and manipulating digital images, while signal processing focuses on analyzing and interpreting signals in various domains. The fundamentals encompass digital signal representation, Fourier analysis, wavelet transforms ...
Image processing
Image processing is manipulation of an image that has been digitised and uploaded into a computer. Software programs modify the image to make it more useful, and can for example be used to enable ...
(PDF) A Review on Image Processing
Abstract. Image Processing includes changing the nature of an image in order to improve its pictorial information for human interpretation, for autonomous machine perception. Digital image ...
Deep learning models for digital image processing: a review
Within the domain of image processing, a wide array of methodologies is dedicated to tasks including denoising, enhancement, segmentation, feature extraction, and classification. These techniques collectively address the challenges and opportunities posed by different aspects of image analysis and manipulation, enabling applications across various fields. Each of these methodologies ...
(PDF) Advances in Artificial Intelligence for Image Processing
AI has had a substantial influence on image processing, allowing cutting-edge methods and uses. The foundations of image processing are covered in this chapter, along with representation, formats ...
J. Imaging
In this work, we discuss the main and more recent improvements, applications, and developments when targeting image processing applications, and we propose future research directions in this field of constant and fast evolution. ... there is a clear increase in published research papers targeting image processing and DL, over the last decades. ...
Image Processing Technology Based on Machine Learning
Machine learning is a relatively new field. With the deepening of people's research in this field, the application of machine learning is increasingly extensive. On the other hand, with the advancement of science and technology, graphics have been an indispensable medium of information transmission, and image processing technology is also booming. However, the traditional image processing ...
Deep Learning-based Image Text Processing Research
Deep learning is a powerful multi-layer architecture that has important applications in image processing and text classification. This paper first introduces the development of deep learning and two important algorithms of deep learning: convolutional neural networks and recurrent neural networks. The paper then introduces three applications of deep learning for image recognition, image ...
Trends and Advancements of Image Processing and Its Applications
Includes applications of image processing in remote sensing, astronomy, and manufacturing; Pertains to researchers, academics, students, and practitioners in image processing. ... He published more than 100 research papers in National and International Journals and Conferences. He has published Edited books in Elsevier and Springer.He has also ...
Digital Image Processing: Advanced Technologies and Applications
A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for future research directions and describes possible research applications. Feature papers are submitted upon individual invitation or recommendation by the scientific editors and must receive positive feedback from the ...
Advances in image processing using machine learning techniques
The paper 'Ship Images Detection and Classification Based on Convolutional Neural Network with Multiple Feature Regions', by Zhijing Xu, Jiuwu Sun, and Yuhao Huo (SPR-2021-10-0144.R2), presents an exciting application of image recognition and classification in the maritime industry to cope with significant challenges for intelligent ship ...
Recent trends in image processing and pattern recognition
The Call for Papers of the special issue was initially sent out to the participants of the 2018 conference (2nd International Conference on Recent Trends in Image Processing and Pattern Recognition). To attract high quality research articles, we also accepted papers for review from outside the conference event.
Image Processing
Image processing is any form of signal processing for which the input is an image, such as a photograph or video frame; the output may be either an image or a set of characteristics or parameters related to the image. Most image-processing techniques involve treating the image as a two-dimensional signal and applying standard signal-processing techniques to it.
Application research of digital media image processing ...
With the development of information technology, people access information more and more rely on the network, and more than 80% of the information in the network is replaced by multimedia technology represented by images. Therefore, the research on image processing technology is very important, but most of the research on image processing technology is focused on a certain aspect. The research ...
PDF Digital Image Processing Real Time Applications
The process of analysis using digital image processing can be divided into various phases. The blocks diagram of a digital image processing (DIP) system is shown in Fig 2. The general functioning of different block stages are briefly discussed as followings: 1. Image Acquisition: It is the first step or fundamental step of digital image ...
PDF Applications of Image Processing in Different Fields- A Survey
Abstract—Image processing is one of the fastest growing technologies in engineering field. It has its applications in various fields. This paper is survey of applications of image processing in different fields such as Automation and Robotics, Remote sensing, Biomedical, Defense, Hand gesture recognition, Biological analysis, Document ...
Image Multiplication Using Novel 4:1 Approximate Compressor*
Energy-efficient designs are the need of the hour for signal and image processing applications that are error-tolerant. The paper proposes a new 4:1 approximate arithmetic-based compressor developed by analyzing the probability-based tactics that have a higher degree of precision when implemented.
Image processing and pattern recognition in industrial engineering
Image processing and pattern recognition in industrial engineering. Article Type: Viewpoint From: Sensor Review, Volume 31, Issue 2 Along with the information superhighway, digital globe concept's statement and the internet's widespread application, image information has become an important source, and an important means, of human access to information.
Artificial Intelligence (AI) for Image Processing
This Special Issue presents a forum for the publication of articles describing the use of classical and modern artificial intelligence methods in image processing applications. The main aim of this Special Issue is to capture recent contributions of high-quality papers focusing on advanced image processing and analysis applications, including ...
[2407.08509] Haar Nuclear Norms with Applications to Remote Sensing
Remote sensing image restoration aims to reconstruct missing or corrupted areas within images. To date, low-rank based models have garnered significant interest in this field. This paper proposes a novel low-rank regularization term, named the Haar nuclear norm (HNN), for efficient and effective remote sensing image restoration. It leverages the low-rank properties of wavelet coefficients ...
Digital Image Processing
In this paper we give a tutorial overview of the field of digital image processing. Following a brief discussion of some basic concepts in this area, image processing algorithms are presented with emphasis on fundamental techniques which are broadly applicable to a number of applications. In addition to several real-world examples of such techniques, we also discuss the applicability of ...
Applications of image processing algorithms on the modern digital image
Abstract. Digital image processing technology is one of the most vital areas of computer science discipline. Its application areas involve computer-aided design, Fourier transformation, three ...
[2407.06581] Vision language models are blind
Large language models with vision capabilities (VLMs), e.g., GPT-4o and Gemini 1.5 Pro are powering countless image-text applications and scoring high on many vision-understanding benchmarks. Yet, we find that VLMs fail on 7 visual tasks absurdly easy to humans such as identifying (a) whether two circles overlap; (b) whether two lines intersect; (c) which letter is being circled in a word; and ...
Engineering eco-friendly solvents: An AI approach for carbon ...
The paper is published in the Journal of Chemical Theory and Computation. The research targets a class of solvents known for being nontoxic, biodegradable, highly stable, cost-effective, and reusable.
Stable diffusion for high-quality image reconstruction in digital rock
Digital rock analysis is a promising approach for visualizing geological microstructures and understanding transport mechanisms for underground geo-energy resources exploitation. Accurate image reconstruction methods are vital for capturing the diverse features and variability in digital rock samples. Stable diffusion, a cutting-edge artificial intelligence model, has revolutionized computer ...
Enhanced speech emotion recognition using averaged valence ...
This study delves into advancements in speech emotion recognition (SER) by establishing a novel approach for emotion mapping and prediction using the Valence-Arousal-Dominance (VAD) model. Central to this research is the creation of reliable emotion-to-VAD mappings, achieved by averaging outcomes from multiple pre-trained networks applied to the RAVDESS dataset. This approach adeptly resolves ...
2024 Conference
The Neural Information Processing Systems Foundation is a non-profit corporation whose purpose is to foster the exchange of research advances in Artificial Intelligence and Machine Learning, principally by hosting an annual interdisciplinary academic conference with the highest ethical standards for a diverse and inclusive community.
(PDF) Studies on application of image processing in ...
1. Studies on application of image processing in vario us fields: An. overview. T Prabaharan, P Periasamy,V Mugendiran,Ramanan. 1 Research Scholar, St. Peter's Institute of Higher Education and ...
ZEISS is introducing an integrated system for serial block-face imaging
ZEISS Volutome is an end-to-end solution for serial block-face imaging from hardware to software including image processing, segmentation, and visualization. The ultramicrotome can be easily replaced by a conventional SEM stage, converting the 3D FE-SEM into a standard FE-SEM, making the system adaptable to a multi-purpose environment.

Image processing articles from across Nature Portfolio

Latest Research and Reviews

Histopathology imaging and clinical data including remission status in pediatric inflammatory bowel disease

Morphological classification of neurons based on Sugeno fuzzy integration and multi-classifier fusion

Preoperative prediction of MGMT promoter methylation in glioblastoma based on multiregional and multi-sequence MRI radiomics analysis

Noninvasive, label-free image approaches to predict multimodal molecular markers in pluripotency assessment

A prospective multi-center study quantifying visual inattention in delirium using generative models of the visual processing stream

Annotated Pap cell images and smear slices for cell classification

News and Comment

Omega — harnessing the power of large language models for bioimage analysis

DL4MicEverywhere: deep learning for microscopy made flexible, shareable and reproducible

Big data for everyone

Creating a universal cell segmentation algorithm

Where imaging and metrics meet

EfficientBioAI: making bioimaging AI models efficient in energy and latency

Quick links

Application research of digital media image processing technology based on wavelet transform

1 Introduction

2.1 Image binarization processing method

2.2 Wavelet transform method

2.3 Digital water mark

2.4 Evaluation method

Normalized cross-correlation function

Peak signal-to-noise ratio

Information entropy

Correlation

3 Experiment

3.2 System environment

3.3 Wavelet transform-related parameters

4 Results and discussion

4.2 Results 2: digital watermark encryption based on wavelet transform

4.3 Results 3: image encryption based on wavelet transform

4.4 Result 4: image compression

5 Conclusion

Abbreviations

Acknowledgements

Availability of data and materials

About the authors

Author information

Contributions

Corresponding author

Ethics declarations

Publisher’s Note

Rights and permissions

About this article

Share this article

Image processing and pattern recognition in industrial engineering

Related articles

Report an issue or find answers to frequently asked questions

Information

Initiatives

Journal Menu

Journal Browser

Artificial Intelligence (AI) for Image Processing

Special Issue Information

Share This Special Issue

Published Papers (8 papers)

Further Information

IEEE Account

Purchase Details

Profile Information

Computer Science > Artificial Intelligence

Submission history

References & Citations

BibTeX formatted citation

Bibliographic and Citation Tools

arXivLabs: experimental projects with community collaborators

Logging off

Stable diffusion for high-quality image reconstruction in digital rock analysis

Enhanced speech emotion recognition using averaged valence arousal dominance mapping and deep neural networks

Cite this article

Access this article

Data availability

Author information

Corresponding author

Ethics declarations

Rights and permissions

About this article

Share this article

Main Navigation