How to use the NumPy mean function
Reading: How to use the NumPy mean function
Let ’ s scram started by first gear talking about what the NumPy average function does.
NumPy mean computes the average of the values in a NumPy array
NumPy mean calculates the mean of the values within a NumPy align ( or an array-like aim ). Let ’ s take a expect at a ocular representation of this. imagine we have a NumPy array with six values : We can use the NumPy entail officiate to compute the hateful value :
It ’ sulfur actually reasonably exchangeable to some other NumPy functions like NumPy sum ( which computes the sum on a NumPy array ), NumPy median, and a few others. These are similar in that they compute drumhead statistics on NumPy arrays. further devour in this tutorial, I ’ ll picture you precisely how the numpy.mean serve works by walking you through concrete examples with very code. But before I do that, let ’ s take a front at the syntax of the NumPy mean function so you know how it works in general.
The syntax of numpy mean
syntactically, the numpy.mean serve is reasonably bare. There ’ s the name of the officiate – np.mean ( ) – and then respective parameters inside of the function that enable you to control it. In the image above, I ’ ve alone shown 3 parameters –
a
, axis
, and dtype
. There are actually a few early parameters that you can use to control the np.mean routine. Let ’ s look at all of the parameters now to better understand how they work and what they do.
The parameters of numpy mean
The np.mean affair has five parameters :
a
axis
dtype
out
keepdims
Let ’ s promptly discuss each parameter and what it does. a
( required )
The a =
parameter enables you to specify the claim NumPy align that you want numpy.mean to operate on. This parameter is required. You need to give the NumPy beggarly something to operate on. Having said that, it ’ s actually a bit compromising. You can give it any array like object. That means that you can pass the np.mean ( ) function a proper NumPy array. But you can besides give it things that are structurally exchangeable to arrays like Python lists, tuples, and early objects. axis
( optional )
technically, the axis
is the proportion on which you perform the calculation. On the other hand, saying it that way confuses many beginners. so another means to think of this is that the axis
parameter enables you to calculate the mean of the rows or column. The reason for this is that NumPy arrays have axes. What is an axis ? An “ bloc ” is like a dimension along a NumPy align. think of axes like the directions in a cartesian organize system. In cartesian coordinates, you can move in unlike directions. We typically call those directions “ ten ” and “ y. ” similarly, you can move along a NumPy array in different directions. You can move down the rows and across the column. In NumPy, we call these “ directions ” axes. specifically, in a two-dimensional array, “ axis 0 ” is the management that points vertically down the rows and “ axis 1 ” is the guidance that points horizontally across the column. indeed how does this relate to NumPy average ? When you have a multi dimensional NumPy array object, it ’ s possible to compute the mean of a set of values down along the rows or across the column. In these cases, NumPy produces a new align object that holds the calculate means for the rows or the column respectively. This credibly sounds a short abstract and jumble, so I ’ ll indicate you solid examples of how to do this subsequently in this blog post. additionally, if you ’ re still a little broken about them, you should read our tutorial that explains how to think about NumPy axes.
dtype
( optional )
The dtype
parameter enables you to specify the demand data character that will be used when computing the beggarly. By default, if the values in the input array are integers, NumPy will actually treat them as floating charge numbers ( float64
to be exact ). And if the numbers in the stimulation are floats, it will keep them as the lapp kind of float ; so if the inputs are float32
, the output of np.mean will be float32. If the inputs are float64
, the output will be float64
. Keep in mind that the data type can actually matter when you ’ re calculating the mean ; for floating degree numbers, the output will have the same preciseness as the input. If the input is a data type with relatively lower preciseness ( like float16
or float32
) the output signal may be inaccurate due to the lower preciseness. To fix this, you can use the dtype
parameter to specify that the output should be a higher preciseness float. ( See the examples below. ) out
( optional )
The out
parameter enables you to specify a NumPy align that will accept the end product of np.mean ( ). If you use this parameter, the output range that you specify needs to have the like shape as the output that the beggarly function computes. keepdims
( optional )
The keepdims
parameter enables you keep the dimensions of the end product the same as the dimensions of the input signal. This confuses many people, thus let me explain. The NumPy intend function summarizes data. It takes a bombastic number of values and summarizes them. so if you want to compute the mean of 5 numbers, the NumPy average function will summarize those 5 values into a one respect, the intend. When it does this, it is efficaciously reducing the dimensions. If we summarize a 1-dimensional align down to a one scalar respect, the dimensions of the output ( a scalar ) are lower than the dimensions of the stimulation ( a 1-dimensional array ). The lapp thing happens if we use the np.mean serve on a 2-d array to calculate the mean of the rows or the base of the column. When we compute those means, the output will have a reduced number of dimensions. sometimes, we don ’ thyroxine want that. There will be times where we want the output to have the claim like number of dimensions as the stimulation. For model, a 2-d array goes in, and a 2-d array comes out. To make this happen, we need to use the
keepdims
parameter. By setting keepdims = True
, we will cause the NumPy mean function to produce an output that keeps the dimensions of the end product the lapp as the dimensions of the remark. This confuses many people, so there will be a concrete example below that will show you how this works. note that by default option, keepdims
is set to keepdims = False
. So the natural behavior of the function is to reduce the number of dimensions when computing means on a NumPy array.
Examples: how to use the numpy mean function
now that we ’ ve taken a look at the syntax and the parameters of the NumPy beggarly function, let ’ s look at some examples of how to use the NumPy mean function to calculate averages. Before I show you these examples, I want to make note of an crucial learn principle. When you ’ re trying to learn and master data skill code, you should study and practice childlike examples. simple examples are examples that can help you intuitively understand how the syntax works. simple examples are besides things that you can practice and memorize. Mastering syntax ( like mastering any skill ) requires study, practice, and repeat. And by the means, before you run these examples, you need to make sure that you ’ ve imported NumPy properly into your Python environment. To do that, you ’ ll need to run the following code :
import numpy as np
Ok, now let ’ s move on to the code.
Compute the mean of the elements of a 1-d array with np.mean
here, we ’ ll start with something very childlike. We ’ rhenium going to calculate the mean of the values in a single 1-dimensional array. To do this, we ’ ll first base create an array of six values by using the np.array affair.
np_array_1d = np.array([0,20,40,60,80,100])
Let ’ s cursorily examine the contents of the range by using the print()
affair.
print(np_array_1d)
Which produces the following output :
[0 20 40 60 80 100]
As you can see, the modern range, np_array_1d
, contains six values between 0 and 100. immediately, let ’ s calculate the entail of the data. here, we ’ re equitable going to call the np.mean function. The only argument to the function will be the name of the array, np_array_1d
.
np.mean(np_array_1d)
This code will produce the mean of the values :
50
visually though, we can think of this as follows. The NumPy beggarly function is taking the values in the NumPy array and computing the average. Keep in beware that the range itself is a 1-dimensional structure, but the resultant role is a single scalar value. In a sense, the entail ( ) officiate has reduced the count of dimensions. The output has a lower count of dimensions than the input. This will be crucial to understand when we start using the
keepdims
parameter late in this tutorial.
Compute the mean of the elements of a 2-d array with np.mean
future, let ’ s calculate the mean of the values in a two-dimensional NumPy array. To do this, we first need to create a 2-d align. We can do that by using the np.arange routine. We ’ ll besides use the reshape method to reshape the range into a two-dimensional array object.
np_array_2x3 = np.arange(start = 0, stop = 21, step = 4).reshape((2,3))
Let ’ s quickly search at the contents of the array by using the code print(np_array_2x3)
:
[[ 0 4 8] [12 16 20]]
As you can see, this is a two-dimensional object with six values : 0, 4, 8, 12, 16, 20. By using the reshape ( ) function, these values have been re-arranged into an array with 2 rows and 3 columns. now, let ’ s calculate the mean of these values. To do this, we ’ ll use the NumPy bastardly function good like we did in the prior exercise. We ’ ll call the function and the argument to the routine will merely be the name of this 2-d array.
np.mean(np_array_2x3)
Which produces the follow resultant role :
10.0
here, we ’ ra working with a two-dimensional array, but the beggarly ( ) function has still produced a single measure. When you use the NumPy beggarly officiate on a 2-d range ( or an array of higher dimensions ) the default behavior is to compute the think of of all of the values. Having said that, you can besides use the NumPy mean routine to compute the beggarly value in every row or the mean value in every column of a NumPy array. Let ’ s take a look at how to do that.
Compute the column means of a 2-d array
here, we ’ ll look at how to calculate the column mean. To understand how to do this, you need to know how axe work in NumPy. Recall earlier in this tutorial, I explained that NumPy arrays have what we call axis. Again, axes are like directions along the align. Axis 0 refers to the quarrel steering. Axis 1 refers to the column direction. You in truth need to know this in order to use the
axis
parameter of NumPy entail. There ’ s not in truth a big way to learn this, so I recommend that you barely memorize it … the row-direction is axis 0 and the column direction is axis 1. Having explained axes again, let ’ s take a front at how we can use this information in conjunction with the axis
parameter.
The axis parameter specifies which axis you want to summarize
Using the axis
parameter is confusing to many people, because the way that it is used is a little counter intuitive. With that in mind, let me explain this in a way that might improve your intuition. When we use the axis parameter, we are specifying which axis we want to summarize. Said differently, we are specifying which axis we want to collapse. sol when we specify axis = 0
, that means that we want to collapse axis 0. Remember, bloc 0 is the row axis, so this means that we want to collapse or summarize the rows, but keep the column integral. Let me show you an example to help this make sense.
Compute a mean with axis = 0
Let ’ s first create a two-dimensional NumPy align. ( note : we used this code earlier in the tutorial, so if you ’ ve already run it, you don ’ t need to run it again. )
np_array_2x3 = np.arange(start = 0, stop = 21, step = 4).reshape((2,3))
Ok. Let ’ s cursorily examine the contents by using the code print(np_array_2x3)
:
[[ 0 4 8] [12 16 20]]
As you can see, this is a two-dimensional array with 2 rows and 3 columns. immediately that we have our NumPy range, let ’ s calculate the mean and set up axis = 0
.
np.mean(np_array_2x3, axis = 0)
Which produces the trace output :
array([ 6., 10., 14.])
What happened here ? basically, the np.mean function has produced a new array. But notice what happened here. alternatively of calculating the hateful of all of the values, it created a drumhead ( the hateful ) along the “ axis-0 direction. ” Said differently, it collapsed the data along the axis-0 direction, computing the mean of the values along that direction.
Why ? Remember, axis 0 is the row axis. so when we set
axis = 0
inside of the np.mean routine, we ’ rhenium basically indicating that we want NumPy to calculate the think of down axis 0 ; calculate the mean down the row-direction ; calculate row-wise. This is a short confuse to beginners, so I think it ’ mho important to think of this in terms of directions. Along which direction should the average affair function ? When we set axis = 0
, we ’ ra argue that the entail function should move along the 0th axis … the guidance of axis 0. If that doesn ’ t make feel, look again at the photograph immediately above and pay attention to the management along which the think of is being calculated.
Compute the row means of a 2-d array
similarly, we can compute row means of a NumPy range. In this case, we ’ rhenium going to use the NumPy array that we created earlier with the trace code :
np_array_2x3 = np.arange(start = 0, stop = 21, step = 4).reshape((2,3))
This code creates the follow array :
[[ 0 4 8] [12 16 20]]
It is a two-dimensional array. As you can see, it has 3 column and 2 rows. immediately, we ’ ra going to calculate the mean while setting axis = 1
.
np.mean(np_array_2x3, axis = 1)
Which gives us the end product :
array([ 4., 16.])
indeed permit ’ s speak about what happened here. First remember that bloc 1 is the column commission ; the focus that sweeps across the column. When we set
axis = 1
inside of the NumPy entail serve, we ’ rhenium telling np.mean that we want to calculate the beggarly such that we summarize the datum in that guidance. Again, said differently, we are collapsing the axis-1 guidance and computing our compendious statistic in that commission ( i.e., the base ). Do you see now ? Axis 1 is the column management ; the commission that sweeps across the column. When we set
axis = 1
, we are indicating that we want NumPy to operate along this direction. It will therefore compute the mean of the values along that steering ( axis 1 ), and produce an array that contains those base values : [4., 16.]
.
How to use the keepdims parameter with np.mean
Ok. now that you ’ ve learned about how to use the axis
argument, let ’ s talk about how to use the keepdims
argument. The keepdims
parameter of NumPy mean enables you to control the dimensions of the output. specifically, it enables you to make the dimensions of the output precisely the same as the dimensions of the input array. To understand this, let ’ s beginning take a look at a few of our anterior examples. Earlier in this blog position, we calculated the base of a 1-dimensional align with the code np.mean(np_array_1d)
, which produced the intend rate, 50
. There ’ second something elusive here though that you might have missed. The dimensions of the output are not the lapp as the stimulation. To see this, let ’ s take a look first gear at the dimensions of the input signal range. We can do this by examining the ndim
property, which tells us the number of dimensions :
np_array_1d.ndim
When you run this code, it will produce the following end product : 1
. The array np_array_1d
is a 1-dimensional array. now let ’ s take a spirit at the act of dimensions of the output of np.mean ( ) when we use it on np_array_1d
. again, we can do this by using the ndim
parameter :
np.mean(np_array_1d).ndim
Which produces the follow output : 0
. So the input ( np_array_1d
) has 1 property, but the output of np.sum has 0 dimensions … the end product is a scalar. In some sense, the output of np.sum has a reduced number of dimensions as the remark. This is relevant to the keepdims
argument, so bear with me as we take a search at another example. Let ’ s look at the dimensions of the 2-d array that we used earlier in this web log post :
np_array_2x3.ndim
When you run this code, the output will tell you that np_array_2x3
is a two-dimensional array. What about the output of np.sum ? If we don ’ thyroxine specify an axis, the end product of np.sum ( ) on this range will have 0 dimensions. You can check it with this code :
np.mean(np_array_2x3).dim
Which produces the surveil output : 0
. When we use np.mean on a 2-d array, it calculates the base. The average value is a scalar, which has 0 dimensions. In this casing, the output of np.mean has a different number of dimensions than the input. What if we set an axis ? Remember, if we use np.mean and set axis = 0
, it will produce an array of means. Run this code :
np.mean(np_array_2x3, axis = 0)
Which produces the output array([ 6., 10., 14.])
. And how many dimensions does this output have ? We can check by using the ndim
impute :
np.mean(np_array_2x3, axis = 0).ndim
Which tells us that the output signal of np.mean in this case, when we set axis set to 0, is a 1-dimensional object. The input had 2 dimensions and the output signal has 1 proportion. again, the output has a different issue of dimensions than the input. Ok, now that we ’ ve looked at some examples showing issue of dimensions of inputs vs. outputs, we ’ rhenium quick to talk about the keepdims
argument.
The keepdims parameter keeps the dimensions of the output the same as the dimensions of the input
The keepdims
parameter enables you to set the dimensions of the output to be the like as the dimensions of the remark. keepdims
takes a logical argument … meaning that you can set it to True
or False
. By default, the parameter is set as keepdims = False
. This means that the mean ( ) officiate will not keep the dimensions the like. By default, the dimensions of the output will not be the lapp as the dimensions of the input. And that ’ s precisely what we just saw in the stopping point few examples in this section ! On the other hand, if we set keepdims = True
, this will cause the number of dimensions of the output to be precisely the lapp as the dimensions of the input signal.
Set keepdims equal to true (and keep the same dimensions)
Let ’ s take a expect. once again, we ’ re going to operate on our NumPy range np_array_2x3
. Remember, this is a two-dimensional aim, which we saw by examining the ndim
property. nowadays, let ’ s once again examine the dimensions of the np.mean function when we calculate with axis = 0
.
np.mean(np_array_2x3, axis = 0).ndim
This code indicates that the output of np.mean in this case has 1-dimension. Why ? Because we didn ’ triiodothyronine assign anything for keepdims
so it defaulted to keepdims = False
. This code does not deep the dimensions of the output the same as the dimensions of the input signal. now, let ’ s explicitly use the keepdims
parameter and set keepdims = True
.
np.mean(np_array_2x3, axis = 0, keepdims = True).ndim
Which produces the be end product :
2
When we use np.mean on a 2-d align and bent keepdims = True
, the output signal will besides be a 2-d array. When we set keepdims = True
, the dimensions of the output will be the same as the dimensions of the input. I ’ m not going to explain when and why you might need to do this …. precisely understand that when you need to dimensions of the output to be the lapp, you can force this behavior by setting keepdims = True
.
How to use the dtype parameter
Ok, one last case. Let ’ s look at how to specify the output datatype by using the dtype
parameter. As I mentioned earlier, if the values in your remark range are integers the output will be of the float64
data type. If the values in the input align are floats, then the end product will be the same type of float. so if the inputs are float32
, the outputs will be float32
, etc. But what if you want to specify another data type for the output ? You can do this with the dtype
parameter. Let ’ s take a look at a simple example. here, we ’ ll create a simple 1-dimensional NumPy align of integers by using the NumPy numpy arange function.
np_array_1d_int = np.array([1,3,4,7,11])
And we can check the data type of the values in this array by using the dtype assign :
np_array_1d_int.dtype
When you run that code, you ’ ll find that the values are being stored as integers ; int64
to be precise. immediately let ’ s practice numpy mean to calculate the bastardly of the numbers :
mean_output = np.mean(np_array_1d_int)
now, we can check the data type of the output, mean_output
.
mean_output.dtype
Which tells us that the datatype is float64
. This is precisely the behavior we should expect. As I mentioned earlier, by default, NumPy produces output with the float64
data type.
Setting the data type with the dtype parameter
so nowadays that we ’ ve looked at the default demeanor, let ’ s change it by explicitly setting the dtype
argument.
mean_output_alternate = np.mean(np_array_1d_int, dtype = 'float32')
The object mean_output_alternate
contains the forecast bastardly, which is 5.1999998
. now, let ’ s check the datatype of mean_output_alternate
.
mean_output_alternate.dtype
When you run this, you can see that mean_output_alternate
contains values of the float32
data type. This is precisely what we ’ d expect, because we set dtype = 'float32'
.
Be careful when you use the dtype parameter
As I mentioned earlier, you need to be careful when you use the dtype
parameter. If you need the output of np.mean to have high preciseness, you need to be sure to select a data type with senior high school preciseness. For model, if you need the solution to have senior high school preciseness, you might select float64
. If you select a datum type with gloomy preciseness ( like int
), the consequence may be inaccurate or imprecise.
If you want to learn data science in Python, learn NumPy
You ’ ve probably heard that 80 % of data skill employment is fair data handling. That ’ s by and large true. If you want to be great at data science in Python, you need to know how to manipulate data in Python. And one of the primary toolkits for manipulating data in Python is the NumPy faculty. In this post, I ’ ve shown you how to use the NumPy bastardly affair, but we besides have respective early tuturials about early NumPy topics, like how to create a numpy array, how to reshape a numpy array, how to create an array with all zeros, and many more. If you ’ ra concern in learning NumPy, decidedly check those out.
For more Python data science tutorials, sign up for our email list
More broadly though, if you ’ ra concerned in learning ( and mastering ) data skill in Python, or data skill generally, you should sign up for our e-mail list good now. here at the Sharp Sight blog, we regularly post tutorials about a variety of data skill topics … in detail, about NumPy. If you want to learn NumPy and data science in Python, sign up for our electronic mail list. If you sign up for our e-mail list, you ’ ll receive Python data science tutorials delivered to your inbox. You ’ ll get free tutorials on :
- NumPy
- Pandas
- Base Python
- Scikit learn
- Machine learning
- Deep learning
- … and more.
Want to learn data skill in Python ? Sign astir immediately.
Sign up for FREE data science tutorials
If you want to master data skill fast, sign up for our electronic mail list. When you sign up, you ‘ll receive FREE hebdomadally tutorials on how to do data skill in R and Python.