1 Introduction 1 引言

One of the most significant tasks in computer vision is human face analysis because it is a highly deformable object that requires automatic analysis. Characterizing factors such as age, gender, facial features, expressions, clothing as well as personality are important in a variety of applications, including face tracking, behavior recognition, user identification and social interaction [1]. Gender and age are seen to be crucial biometric traits for identifying individuals. Bio-metric recognition is the process of gathering information about a person unique physiological and behavioral characteristics for purposes of human identification in addition to verification (security models) [2]. There are two types of biometrics: hard biometrics (physical, behavioral, and biological) and soft biometrics (age, gender, ethnicity, height as well as face measures) [3]. Soft-biometric characteristics, such as skin tone, hair type, distance between nose and eye, facial form in addition to so on, can be used to categorize unlabeled subjects for different genders also age groups and to speed up data traversal.
计算机视觉中最具意义的任务之一是人脸分析,因为它是一个高度可变形的物体,需要自动分析。在包括人脸追踪、行为识别、用户识别和社会互动在内的各种应用中,年龄、性别、面部特征、表情、服装以及个性等特征描述非常重要 [1]。性别和年龄被视为识别个体的关键生物特征。生物识别识别是收集有关个人独特生理和行为特征的信息的过程,目的是进行人类识别,以及验证(安全模型)[2]。生物识别有两种类型:硬生物识别(物理、行为和生物)和软生物识别(年龄、性别、种族、身高以及面部测量)[3]。如肤色、发质、鼻眼距离、面部形状等软生物特征,可以用于对未标记的个体进行不同性别和年龄组的分类,并加快数据传输。

Face, age and gender detection algorithms have a significant influence on the gender and age rage that is reflected in the user photo validation process [4]. An essential first step in determining age and gender is facial recognition [5]. Numerous techniques, including Haar Cascade, Convolutional Neural Networks (CNN) [6], Histogram of Oriented Gradients (HoG) also Deep Neural Networks (DNN) can be used to accomplish this task [7]. Each algorithm has pros and cons of its own. Neural Convolution Architecture CNN is a prominent deep learning system that can automatically learn to do classification tasks based just on images [8]. Because it treats the prediction of age and gender range as a two-class classification problem, this algorithm will be quite helpful in our situation [9]. For gender, there are males and females, and for age ranges, there are various classes.
面部、年龄和性别检测算法对用户照片验证过程中反映的性别和年龄范围有重大影响[4]。确定年龄和性别的一个基本步骤是面部识别[5]。包括 Haar Cascade、卷积神经网络(CNN)[6]、方向梯度直方图(HoG)以及深度神经网络(DNN)在内的多种技术都可以用于完成这项任务[7]。每种算法都有其自身的优缺点。神经网络卷积架构 CNN 是一种突出的深度学习系统,它可以根据图像自动学习进行分类任务[8]。因为它将年龄和性别的预测视为一个二分类问题,所以这个算法在我们的情况下将非常有帮助[9]。对于性别,有男性和女性,对于年龄范围,有各种类别。

A primary drawback is that CNNs are highly effective in displaying images, it is challenging to adequately train a classic big-scale CNN with small datasets [10]. However, due to many reasons, obtaining a large scale of photos with age and gender identifiers is challenging [7]. The training data amount for the gender and age prediction problem are typically small, and the face images of the same individuals only span a small range of ages [8]. The models frequently experience overfitting since it is difficult to take advantage of the most universal characteristics of the age with small data [9]. The second drawback is that the current techniques take the entire image as an input for the network, and the intricate background information significantly impedes the process of extracting features and further impairs prediction ability [10]. As a result, if the feature extractor uses the original image as input, the age prediction will be affected. The third drawback is that most CNN-based techniques use the output from the network's final fully connected layer to represent the image [11]. To address the aforementioned limitations, Gender and Age Classification using ASMNet based Facial Fiducial Detection and Jordan Neural Network. Major contributions of the designed model are
一个主要缺点是 CNN 在显示图像方面非常有效,但使用小数据集训练经典的大规模 CNN 具有挑战性[10]。然而,由于许多原因,获取带有年龄和性别标识的大规模照片具有挑战性[7]。性别和年龄预测问题的训练数据量通常很小,同一个人的面部图像只覆盖了很小的年龄范围[8]。由于难以利用小数据中最普遍的年龄特征,模型经常出现过拟合[9]。第二个缺点是,当前技术将整个图像作为网络的输入,复杂的背景信息极大地阻碍了特征提取过程,并进一步损害了预测能力[10]。因此,如果特征提取器使用原始图像作为输入,年龄预测将受到影响。第三个缺点是,大多数基于 CNN 的技术使用网络最终全连接层的输出来表示图像[11]。 为了解决上述局限性,使用基于 ASMNet 的人脸关键点检测和 Jordan 神经网络进行性别和年龄分类。该模型的主要贡献是

  • Gender and Age Classification using ASMNet based Facial Fiducial Detection and Jordan Neural Network.

  • Initially, the images are pre-processed using cropping, centre surround device normalization, optimized Gabor filter and logarithmic transformation to improve the image quality.

  • Center surround device normalization is used for normalizing the pixels in the image based on the centre point for linearizing the pixels of the facial images.

  • Optimized Gabor filter is used for removing the noise from the facial images based on the optimization of the orientation value using lyre bird optimization and Log transformations is used to enhance the contrast of dark images.

  • Facial Fiducial and pose from the facial images are detected using Active Shape Model combined with CNN (ASMNet) for learning the distinctive features of an image.

  • EfficientNetB7 and Jordan Neural Networks are used to extract features and categorize age and gender of the people.

The remaining portions of the paper are organized in the following manner several articles on facial fiducial detection are reviewed in Sect. 2; the method that has been proposed for an efficient security testing process is briefly explained in Sect. 3 In Sect. 4, the experimental results are presented for the developed Facial Fiducial Detection model; The entire study article is concluded in Sect. 5.

2 Literature review

Majority of the publications on Gender and Age Classification using ASMNet based Facial Fiducial Detection and Jordan Neural Network are studied in this field, and below is an evaluation of some articles along with their drawbacks.

Duan et al. [12] introduced a hybrid architecture that combines the strengths of two classifiers Convolutional Neural Network (CNN) and Extreme Learning Machine (ELM) to handle the classification of age and gender. By using CNN to extract features from the input images and ELM to classify the intermediate outputs, the hybrid architecture maximizes their respective benefits. This implements the design effectively and takes several precautions to reduce the probability of overfitting. This involves developing several layers and variables by examining the hybrid architecture and determining the back-propagation functions in this framework through iterations. Next, the hybrid structure is verified using two widely used datasets, namely MORPHII and Adience Benchmark.

Hassan et al. [13] involved several CNN method for classifying people based on their age and gender. The five stages of the described method are as follows: face alignment, multiple CNN, face detection, background removal, and voting systems. With three distinct CNNs in terms of depth and structure, multiple CNN model aims to extract different features for every network. After training each network independently on the AGFW dataset, predictions are combined using the voting mechanism to determine the outcome.

Khan et al. [14] designed a unified system using end-to-end semantic face segmentation for face picture analysis. A collection of stack components for face comprehension, including as head posture estimation, age categorization as well as gender recognition, are included in the suggested framework. The segmentation model based on Conditional Random Fields (CRFs) is trained using a manually labelled face data-set. A face image is divided into six segments using a multi-class face segmentation framework created using CRFs. For every class, probability maps are created using the probabilistic classification technique. To each task (head posture, age, and gender), a RDF classifier is modelled based on the probability maps as feature descriptors.

Nada et al. [15] developed an approach to confirm that the user's age range as well as gender are accurately reflected in his photo. Additionally, a double-check layer validator based on the Deep Learning approach is added by creating a link between the user photo, gender as well as date of birth form inputs. This is done by utilizing a Convolutional Neural Network (CNN or ConvNets) to recognize the gender also estimate the age from a single person's photo. Furthermore, a web API is built to facilitate the validation process. Using the images of University of Palestine students, it finally assessed this solution and found that, while it has some issues with age prediction, it has done a fantastic job with gender prediction.

Haseena et al. [16] developed to provide people nourishing food according to their age and gender as inferred from their facial features. In order to extract features using the deep convolution neural network (DCNN) approach, the presented methodology first pre-processes the input image. After the neural network has extracted the dimensions using the original facial image, the attribute selection approach is carried out based on the hybrid particle swarm optimisation (HPSO) to choose unique and recognisable facial components. A person's age and gender can be determined using support vector machines (SVM).

Majority of the articles evaluated above are related to the Identification of Facial Fiducial model. The ELM model may require a hidden layer with a high level of complexity due to the random initialization of parameters (weights and biases) [12]. CNNs are renowned for being challenging to optimize and for requiring substantial amounts of training data and processing capacity to train [13] 15. Random forests are biased in favour of qualities with higher levels when they include categorical variables with varying numbers of levels in the data [14]. Extended training duration for big datasets. The final model's varying weights and individual influence make it difficult to comprehend and evaluate [16].

3 Proposed methodology

With the emergence of social platforms and social media, automatic age and gender classification has gained significance for a wider range of applications. However, the real-world image performance of current approaches is still not entirely good enough, particularly in light of the significant leaps in efficiency that have recently been observed with regard to the associated task of facial recognition. The process flow of the proposed model is illustrated in Fig. 1.

Fig. 1
figure 1

Process flow of the proposed model

In this proposed model, the facial images are considered as input data for gender and age prediction. These data sets are pre-processed using cropping, center surrounds device normalization, optimized Gabor filter and logarithmic transformation. Using the cropping technique, the face portion is cropped from the background. Then, center surround device normalization is used for face normalization in a pixel wise manner. Optimized Gabor filter is used for noise removal in which the orientation value is optimally selected using lyre bird optimization. Logarithmic transformation is used for enhancing the contrast of the image. Based on the preprocessed data, facial Fiducial and pose are detected using ASMNet (Active Shape Model combined with CNN). Using this model primary facial landmarks such as eye, mouth, nose tip and lips, the feature extraction is achieved using EfficientNetB7 and classification using Jordan neural network to categorize age and gender.

3.1 Pre-processing

A preliminary stage in the processing of raw data to prepare it for the main phase or further analysis. In this designed model, the pre-processing approaches are cropping, centre surround device normalization, optimized Gabor filter and logarithmic transformation.

3.1.1 Cropping

A process of cropping an illustrated image involves removing undesired exterior regions. In order to enhance framing or composition, direct the viewer's attention to the topic, or alter the size or aspect ratio, an image is said to be "cropped" when its outer edges are removed or modified. To put it another way, photo cropping is the process of enhancing a picture by deleting elements that are not needed.

3.1.2 Center surround device normalization

Land Retinex theory was applied as SSR by [17], using the most recent version. To process the image, a class of centre surround functions is applied, each of which takes an input value (called the centre) and its surrounding neighbourhood (called the surround) to produce its output value. Gaussian function surrounds the defined centre, which is every pixel value. This provides the SSR's mathematical form:

SSR(a,b)=log(I(a,b))log[F(a,b)×I(a,b)]
(1)

where SSR (a,b) is denoted as the Retinex output, input image is represented by I(a,b) and [F(a,b)×I(a,b)] represents the convolution product of I(a,b) and F(a,b). The latter function has a Gaussian kernel with a basic linear filter:

F(a,b)=ker2x03C32
(2)

where Pixel-by-pixel spatial detail retention is controlled by the empirically established filter standard deviation,r=a2+b2, and the normalizing factor k maintains the value of 1 for the area under the Gaussian curve.

3.1.3 Optimized gabor filter

Optimized Gabor filter is used for noise removal in which the orientation value is optimally selected using lyre bird optimization. The Gabor filter have edge localization ability and high edge resolution with high accuracy, high degree of extraction and complete and precise boundary.

  1. (i)

    Gabor filter

A complex plane wave and a Gaussian-shaped function make up the composite function known as a two-dimensional Gabor filter [18]. It is stated as follows:

12π\unicodex03C3βexp[12(x2\unicodex03C3+y2β2)]exp[j2πf(xcosθ+ysinθ)]
(3)

A Gabor filter's orientation is given by j=1,θ, the standard deviation is represented by \unicodex03C3 and β, respectively, and the filter centre frequency is indicated by f.

The Euler formula states that the Gabor filter can be decomposed into two parts: an imaginary part and a real part. Because finger veins appear like dark ridges in images, this is an excellent application of the real part of the Gabor filter to exploit vein information from an image. The Gabor filter has been rewritten as

Gk=12π\unicodex03C3βexp[12(x2\unicodex03C32+y2β2)]exp[2πf(xcosθk)]
(4)

where θk(=kπ/8),k(=1,2,,8) is represented as the orientation and channel index, respectively and The Gabor filter's centre frequency in the kth channel is denoted as f.

Assuming that a finger-vein image is denoted by I(x,y), filtered I(x,y) in the kth channel is represented by the notation Fk(x,y). This can obtain

Fk=Gk(x,y)I(x,y)
(5)

where convolution operation in two dimensions is represented by . As a result, the Gabor filter produces eight filtered images for a finger vein image.

  • ii) Lyrebird optimization

The splendid lyrebird and Albert's lyrebird are the two species of lyrebirds that are native to Australia [19]. This wonderful bird belongs to the family Menuridae. Their big tails, which the males fling out in an attraction display, are a sight to observe. They are also highly skilled at mimicking both artificial and natural noises from their surroundings. One of the most recognizable native birds of Australia is the lyrebird, which has a distinctive plume of neutral-colored tail feathers. Males and females of the Superb lyrebird species length 80–98 cm and 74–84 cm, respectively. When it comes to size, the female of Albert's lyrebird may grow to a maximum of 84 cm, while the male can grow a maximum of 90 cm. Similar in all respects to the superb lyrebird, Albert's lyrebird has smaller, less stunning lyrate feathers. They weigh approximately 0.93 kg, whereas superb lyrebirds weigh approximately 0.97 kg.

Step 1: Initialization.

Optimized Gabor filter is used for noise removal in which the orientation value (θ) is optimally selected using lyre bird optimization. The orientation value (θ) is considered as attributes X1, X2,…, X4.

F=(X1,X2X4)
(6)

Step 2: Fitness Function.

To evaluate the fitness function, the aforementioned equation is used to maximize accuracy based on the k-fold validation.

fitness={maximize(acccv))
(7)
acccv=1nvi,yiDδ(I(DD(i),vi),yi
(8)

Dataset D is randomly divided into k almost equal-sized, mutually exclusive subsets (the folds),D1,D2,,Dk, in k-fold cross-validation, also known as rotation estimation. For each t{1,2,,k}, inducer is tested k times on Dt and DDt. Cross-validation estimate of accuracy is computed using the total count of instances in the dataset. D(i) is, logically, the test set containing the instance xi= vi,yi and ensuing cross-validation accuracy determine.

Step 3: Updating.

Following the Fitness function process, Eq. (9) is used to update all subsequent sets of attributes. Using the lyrebird optimization process, the updating equation is determined.

Xi={XiP1,FiP1Fi,Xi,else,
(9)

Step 4: Termination.

After attaining the hyperparameters of the Gabor filter for optimally learning the information from the facial images, the entire process will be terminated.

3.1.4 Logarithmic transformation

Log transformations are one of the fundamental spatial image enhancement techniques that can be used to enhance the contrast of dark images. The gray levels of the image pixels are changed by the log transform, which is actually a gray level transform [20]. This transformation converts a limited range of low level gray values in the input image to a larger range of output levels. At greater input gray levels, the converse is true. As a result, the darker input values are dispersed into the higher gray level values, enhancing the image's overall brightness and contrast. Mathematically, the log transformation's general form is expressed as

s=clog(1+r)
(10)

where, s is denoted as the output grey level, c is a constant and r is represented as the input grey level. Assumed to be r0. In image processing, the general goal of logarithmic transformation is to improve the visual appearance of images by varying intensity values to increase contrast and bring out details that have been hidden in the original image. For classification, dataset should be preprocessed to obtain the accurate outcome, so the logarithmic transformation is used. However this transformation is complex and time consuming, but the model is more effective for the contrast enhancement of input image and accurate classification of age and gender using JNN model.

3.1.5 ASM network

A statistical representation of shape objects is the active shape model. Sset={(Sx1,Sy1),.,(Sxn,Syn)}, which are aligned into a common coordinate system, depict each shape as n points. The covariance matrix derived from a set of K training shape samples is evaluated by principle component analysis (PCA) in order to simplify the issue and identify shape components. If the model is constructed, Eq. 11 is used to approximate any training sample (S):

SS¯+Pb
(11)

where the sample mean is represented by S¯, the covariance matrix P=(p1,p2,,pt) has t eigenvectors, and b represented as a t dimensional vector provided by Eq. 12:

b=PT(SS¯)
(12)

Consequently, vector b defines a set of parameters for a deformable model, allowing the model's shape to be altered by adjusting the vector's constituent elements.

Take into consideration the ith parameter statistical variation (eigenvalue) of b to be λi. Generally, vector’s parameter bi is restricted to ±3λi in order to ensure that the image created when using ASM is reasonably comparable to the ground truth [7]. The developed shape can resemble the ones in the original training set owing to this restriction. Therefore, using this restriction, we generate a new shape SNew in accordance with Eq. 13:

SNew=S¯+Pb~
(13)

where b~ represents the restricted b. It defines the ASM operator as well, based on Eq. 14:

ASM:(Pxi,Pyi)(Axi,Ayi)
(14)

Using Eqs. 11, 12, and 13, ASM converts every given input point (Pxi,Pyi) into a new point (Axi,Ayi). Based on this algorithm, the facial fiducial and pose points are detected.

3.2 Feature extraction

The process of dimensionality reduction includes feature extraction, which is breaking up an initial set of preprocessed data into smaller, easier-to-manage groups.

3.2.1 EfficientNetB7

Given that Efficient Net [21] model is one of the most sophisticated models also achieves an accuracy score of 84.4% with 66 M parameters in ImageNet dataset classification test, it can be viewed as a collection of CNN models. Eight models that comprise EfficientNet model range in value from B0 to B7; while accuracy climbs sharply with increasing model count, the number of predicted parameters does not. Instead of using the Rectifier Linear Unit (ReLu) activation function, EfficientNet uses a novel one termed the Leaky ReLu activation function [22]. In contrast to other cutting-edge models, EfficientNet generates more efficient outcomes by evenly scaling width, resolution, and depth as the model is reduced in size. The first step in using the compound scaling technique with a fixed resource limitation is to search for a grid that shows the relationships between the baseline network's various scaling dimensions. EfficientNet utilized the main building block introduced by MobileNet V2, the MBConv bottleneck, but it was utilized much more than MobileNet V2 due to the larger "Floating point operations per second" (FLOPS) budget. Because blocks in MBConv are composed of a layer that expands and then compresses the channels, direct connections are employed between bottlenecks with significantly fewer channels than expansion layers. Computation is lowered by the K2 factor, where k is the kernel size and denotes the 2D convolution window's width and height, as the layers' designs split apart.

EfficientNet is described mathematically in (Eq. (15)) as:

P=x=1,2,nMxTx(Y(Ax,Bx,Dx))
(15)

In this case, Tx times are repeated in the variance of x, and Mx stands for the layer mean. (Ax,Bx,Dx) Represents the shape input with respect to layer x in the tensor of Y. Images' inputs are converted from 256×256×3to224×224×3. the layers must scale with a proportionate ratio optimized using the provided formula in order to increase the model's accuracy.

maxx,y,zAcc(P(x,y,z))
(16)
P(x,y,z)=s=1,2,MsLs(Y(z.As,z.Bs,y.Ds))
(17)

FLOPS (P) <  = destinated_flops.

Memory (P) <  = destinated_memory.

Equation (16) uses the values x, y, and z to indicate the height, width, and resolution. Equation (17) displays the number of layers employed in the model together with parameter details. Figure 2 shows a systematic diagram of EfficientNet B7.

Fig. 2
figure 2

Systematic diagram of EfficientNet B7 architecture

3.3 Classification

Jordan developed a novel circular neural network by fusing the distributed parallel processing theory with the Hopfield network storage notion [23]. Input layer, output layer, hidden layer, also context layer are the four components that make up a Jordan neural network. The connection between the output layer and the context layer has a first-order delay operator, allowing the context layer to hold the output layer's data. Within the neural network, there are two different kinds of activation functions: nonlinear and linear. In this work, output layer uses a linear function, whereas hidden layer uses a sigmoid nonlinear activation function, denoted by formula f(x)=1/1+ex. Figure 3 displays the topology of the Jordan neural network.

Fig. 3
figure 3

Displays Jordan neural network topology

X(t)=(x(t),x(tτ),x(t2τ),,x(t(m1)τ))T For Jordan neural network, ^T stands for the input vector, where t=n1,n1+1,,n and n1=1+(m1)τ. hidden layer's output vector is represented by vector h(t)=(h1(t),h2(t),,hr(t))T. These are weights from the input layer neuron to (j) to ith first hidden layer neuron (Wij(1))(i=1,2,,r,j=1,2,,m). These are weights from the context layer neuron to ith first hidden layer neuron (Wij(2))(i=1,2,,r). Hidden neuron (i) weights to output layer neuron are (Wij(3))(i=1,2,,r). Deep layer neuron (i) biases are represented by (bi(1))(i=1,2,,r). Biases of the output layer and context layer are denoted by b(2)andb(3), respectively. Activation functions of the output layer f(2) also context layer f(3) are typically linear functions, but the activation function of hidden layer neuron (i) is represented as fi(1)(i=1,2,,r). Hidden layer neuron's (i) value is given by hi(t)(i=1,2,,r). Output layer and context layer values are represented by y(t) and d(t).

To facilitate clear communication and easy writing, the following indicators will be shown.

W(1)=(Wij(1))=((W1(1))(W2(1))(Wr(1))),i=1,2,,r;j=1,2,,m,
(18)
W(2)=((2)i(2))i=1,2,,r=((2)1(2),(2)2(2),,(2)r(2))T,
(19)
W(3)=((3)i(3))i=1,2,,r=((3)1(3),(3)2(3),,(3)r(3))T,
(20)
b(1)=((1)1(1),(1)2(1),,(1)r(1))T.
(21)

Thus, the Jordan neural network can be acquired;

hi(t)=(1)i(1)X(t)+(2)i(2)d(t)+(1)i(1)),1tT,i=1,2,r,
(22)
y(t)=f(2)(i=1rWi(3)hi(t)+b(2)),1tT,
(23)
d(t)=f(3)(αd(t1)+y(t1)+b(3)),1tT,d(0)=d(1)=0
(24)

Figure 4, shows a thorough schematic of the Jordan neural network's first three phases to help you better comprehend Eq. (24).

Fig. 4
figure 4

Specifics of recurrent connections in Jordan neural network

Takens theorem of embedding yields the following equation:

x^(t+1)y(t)=g(X(t)),
(25)

In this case, Jordan neural network's fitting value at time t is represented as x^(t+1). Furthermore, smooth mapping g can be written as in

g(X(t))=f(2)(i=1r[Wi(3)fi(1)(Wi(1)X(t)+Wi(2)f(3)(αd(t1)+y(t1)+b(3))+bi(1))+b(2))
(26)

Thus, by using this Jordan neural network, the age and gender prediction based on the facial images are trained and tested.

4 Result and discussion

Gender and Age Classification using ASMNet based Facial Fiducial Detection and Jordan Neural Network. Python 3.8.8 is used to evaluate a desired model, together with a 2.50 GHz Intel(R), Core(TM) i5-10300H processor, 32.0 GB of RAM (31.8 GB of which is useable), and the following specifications: 32 GB of memory. The collected datasets are pre-processed using cropping, center surrounds device normalization, optimized Gabor filter and logarithmic transformation. Based on the preprocessed data, facial Fiducial and pose are detected using ASMNet (Active Shape Model combined with CNN). Then, feature extraction is achieved using EfficientNetB7 and classification using the Jordan neural network to categorize age and gender.

  • iDataset Description

Face image is considered as input and is included in the dataset utilized in this framework [24]. UTKFace is an enormous face dataset with a wide age range (0–116 years old).More than 20,000 face images with ethnicity, gender, and age annotations make up the dataset. The photographs exhibit a wide range of variations in terms of clarity, occlusion, lighting, facial expression, and posture. Many tasks, such as age estimation, face detection, landmark localization, age regression/progression, etc., could be performed using this dataset. For the proposed model, consider dataset as 10,136 data that the classifier used to create its best predictions involved age and gender. In this case, 80% (8108) is employed for training and 20% (2028) for testing.

Figure 5 demonstrates Gender and Age Classification using ASMNet based Facial Fiducial Detection and Jordan Neural Network. The first columns of the image in Fig. 5 displays the original image, the 2nd column represents the cropping image. The 3rd column provides the Gabor filter images and the 4th column represents Logarithmic transformation images then final column shows segmented images of facial Fiducial and pose detection.

Fig. 5
figure 5

Face image transformation using pre-processing and segmentation methods

Table 1 represents the hyperparameters of the Jordan Neural Network to detect the age and gender of the people. Parameter are activation function as tanh, adam optimizer, mean square error loss, epochs as 200 and batch size is 32.

Table 1 Hyperperameters of Jordan neural network

The proposed model confusion metrics are shown in Fig. 6. A confusion metre helps visualise the outcomes of different expected results by providing a tabular arrangement for them. It compiles all of the predicted and actual values of a classifier into a table. The total quantity of data used for testing is 2028, of which 1906 are anticipated according to the actual class while the remaining 122 are incorrectly predicted.

Fig. 6
figure 6

Measures of confusion for the proposed model

The Receiver Operating Characteristic Curve (ROC) for the facial fiducial prediction of face images is shown in Fig. 7. An ROC curve (receiver operating characteristic curve) is a graph showing the performance of a classification model at all classification thresholds.

Fig. 7
figure 7

ROC plot for the brain tumours prediction of MRI images

In Fig. 8, measurements of the Detection rate are used to compare the ASMNet and the existing model. The ASMNet values for the existing model for the MTCNN and PKPCA are 86, 94, 89, 90 and 80, 85, 86, 87 and 83, 81, 77, and 73. As a result, the ASMNet is more perfect when compared to the existing approaches. Figure 9 depicts the Detection failure rate comparison of the ASMNet with the existing techniques. The ASMNet and the existing techniques each have Detection failure rate values of 14, 6, 11, 10 and 20, 15, 14, 13 and 17, 19, 23, 27, respectively. Based on the obtained Detection failure rate values, the model values are better performance than those of the existing model.

Fig. 8
figure 8

Detection rate evaluation of ASMNet and existing approaches

Fig. 9
figure 9

Detection failure rate evaluation of ASMNet and existing approaches

Figure 10 illustrates the comparison based on peak signal-to-noise ratio (PSNR), Signal-to-noise ratio (SNR) and structural content for the proposed Optimized Gabor Filter With Logarithmic Transformation (OGF-LT) and existing techniques include Weighted Gradient Filter (WGF), Weiner Filter (WF), and Non Local Mean (NLM) filter. The PSNR for the proposed OGF-LT and existing approaches such as WGF, WF, and NLM are 69, 65, 57, and 52, respectively. Likewise, the SNR and SC values for the proposed and existing are 67,63,58,55 and 61,54,49,43 respectively. The performance measurements of the proposed OGF-LT and the existing techniques are analyzed, and the results obtained show that the proposed method's outcome is well for improving image quality than that of the other approaches.

Fig. 10
figure 10

Comparison of proposed OGF-LT and existing techniques based on PSNR, SNR and SC

Figure 11 represents the accuracy, PPV, Hit Rate, selectivity and NPV evaluation between JNN and existing techniques. The JNN is compared with existing methods including Extended Nearest Neighbor (ENN), Support Vector Machine (SVM) and Multilayer Perceptron (MLP). The achieved accuracy for the JNN and the existing methods are 93.0, 86.0, 83.0, and 80.0 and Positive Predictive values (PPV) for the JNN and the existing methods are 87.0, 81.6, 84.0, and 76.0. The obtained Hit rate for the JNN and the existing methods are 89.0, 85.6, 78.0, and 82.0 and the accomplished Selectivity for the JNN and the existing methods are 94.82, 86.0, 81.39, and 88.0. The achieved Negative Predictive Value (NPV) for the JNN and the existing methods are 92.320, 89.3, 82.39, and 78.3. The achieved accuracy, PPV, Hit Rate, selectivity and NPV for the proposed JNN methods are 93%, 87%, 89%, 94.82% and 92.32%. The JNN model is more accurate for detecting age and gender when compared to the existing approaches interms of PPV, accuracy, Hit Rate, selectivity and NPV.

Fig. 11
figure 11

Performance metrics like accuracy, PPV, Hit Rate, selectivity and NPV comparison between proposed JNN and existing models

The fall-out, FOR, Miss Rate, FDR and error comparison between JNN and existing approaches is presented in Fig. 12. For the JNN, ENN, SVM and MLP the false omission rate (FOR) and false discovery rate (FDR) values are, respectively, 7.67, 10.70, 17.60, 21.70 and 13.0, 18.40, 16.0, 24.0. The achieved Fall-out for the JNN and the existing methods are 5.17, 14.0, 18.60, and 12.0 and the acquired Miss Rate for the JNN and the existing methods are 11.0, 14.40, 22.0, and 18.0. The obtained error values for JNN, ENN, SVM and MLP are 7.0, 14.0, 17.0, and 20.0. The JNN has less error, fall-out, FOR, Miss Rate, FDR than the existing methods as a result.

Fig. 12
figure 12

Performance metrics like fall-out, FOR, Miss Rate, FDR and error comparison between proposed JNN and existing models

Performance indicators such as F1_score, phi coefficient, kappa, MK, and FM are compared between the proposed JNN and the existing models in Fig. 13. The attained F1_score values for JNN, ENN, SVM and MLP are 87.98, 83.55, 80.88, and 78.88 and also Phi coefficient for the JNN and the existing methods are 84.8, 83.7, 76.34, and 79.9.The Kappa values obtained for the JNN and the existing approaches are 88.3, 78.5, 85.6, and 73.6, respectively. The JNN and existing model's markedness (MK) values are 91.3, 88.2, 81.39, and 85.7, respectively and The Fowlkes-Mallows index (FMI) values are 90.60, 82.199, 87.8, and 81.0. JNN model performs better than the existing model.

Fig. 13
figure 13

Performance metrics like F1_score, phi coefficient, kappa, MK and FM comparison between proposed JNN and existing models

A JNN and the existing model are compared in Fig. 14 in terms of training time measurements. The JNN and existing models training times are 230.79, 269.32, 291.63 and 323.76 respectively. The JNN training time value is greater than that of the existing model. The testing and execution time of the JNN is compared with the existing methods in Figs. 15 and 16. For the JNN, ENN, SVM and MLP, the testing times attained are 0.88, 1.38, 0.96, and 1.47 s. And in that order, the overall execution of the proposed and existing model are 231.67, 270.7, 292.59 and 325.23. The JNN model overcomes the existing model in testing and execution time. Hence, based on the comparison between the proposed and existing models, the proposed model is found to validate superior performance metrics. Table 2 illustrates the accuracy comparison of different CNN models varying data size.

Fig. 14
figure 14

Training time evaluation of JNN and existing approaches

Fig. 15
figure 15

Testing time evaluation of JNN and existing approaches

Fig. 16
figure 16

Execution time evaluation of JNN and existing appraoches

Table 2 Accuray comparison of different CNN models varying data size

Through this analysis it is shown that high rate of accuracy around 91.5% is achieved when the size of the dataset is larger (1.5 Gb) using Inceptionv3. Accuracy varies based on training process and hyperparameter of the models such as number of layers, optimizer and activation function. In case of using smaller dataset though accuracy is minimal computation complexity is low. Thus it represents trained with less dataset makes the classification accuracy less and time consuming is less [29].

5 Conclusion

Gender and age prediction utilizing face images is based on the unique features of each individual. These features can be used for a variety of purposes, including human–machine interaction, access control, forensic work, preventing identity theft or fraud, and identifying individuals in organizations. But earlier age estimation research relied on handcrafted features for encoding age-related patterns. There are numerous approaches and significant literature regarding the subject. However, biological variances and uncertainty will always be linked with age estimates due to the wide range of face appearance and other intrinsic and extrinsic factors. The proposed model regards the facial image as an input. Cropping, logarithmic transformation, optimal Gabor filter and centre surround device normalization are the pre-processing techniques used on these data sets. The face area is clipped from the background using cropping technique. Next, pixel-by-pixel face normalization is achieved using centre surround device normalization. For the purpose of eliminating noise, an optimized Gabor filter is employed, with lyrebird optimization being utilized to determine the orientation value optimally. A logarithmic alteration is applied to the image to improve contrast. ASMNet (Active Shape Model paired with CNN) detects facial fiducial and position based on preprocessed data. Primary facial landmarks including the lips, nose tip, eye, and mouth are detected using this model. Next, EfficientNetB7 is used to extract features, and a Jordan neural network is used for classification in order to classify age and gender. Performance metrics for this designed model include Accuracy, Positive predictive value, Hit rate, Selectivity, NPV, FOR, FDR, Fall-out, Miss-Rate, F1-Score, Error, Phi-coefficient, Kappa, MK, FM, Testing time, Training time and Execution time. The proposed models achieved performance metrics values are 93, 87, 89, 94.82, 92.32, 7.67, 13, 5.17, 11, 87.98, 70, 84.8, 88.3, 91.3, 90.60, 230.79, 0.88 and 231.67 Seconds. These evaluated values are contrasted with the results of existing methods like ENN, SVM and MLP. Gender and Age Classification using ASMNet based Facial Fiducial Detection and Jordan Neural Network is better than the existing model along with that using this prediction technique the possibility of error rate gets reduced and timely detection can be achieved. Future work should focus on using hybrid deep learning techniques to enhance the model and incorporate additional facial images while extracting local features under different circumstances for recognizing research areas such as human emotions and race.