# How to use the NumPy mean function

Reading: How to use the NumPy mean function

Let ’ s scram started by first gear talking about what the NumPy average function does.

## NumPy mean computes the average of the values in a NumPy array

NumPy mean calculates the mean of the values within a NumPy align ( or an array-like aim ). Let ’ s take a expect at a ocular representation of this. imagine we have a NumPy array with six values : We can use the NumPy entail officiate to compute the hateful value : It ’ sulfur actually reasonably exchangeable to some other NumPy functions like NumPy sum ( which computes the sum on a NumPy array ), NumPy median, and a few others. These are similar in that they compute drumhead statistics on NumPy arrays. further devour in this tutorial, I ’ ll picture you precisely how the numpy.mean serve works by walking you through concrete examples with very code. But before I do that, let ’ s take a front at the syntax of the NumPy mean function so you know how it works in general.

## The syntax of numpy mean

syntactically, the numpy.mean serve is reasonably bare. There ’ s the name of the officiate – np.mean ( ) – and then respective parameters inside of the function that enable you to control it. In the image above, I ’ ve alone shown 3 parameters – `a`

, `axis`

, and `dtype`

. There are actually a few early parameters that you can use to control the np.mean routine. Let ’ s look at all of the parameters now to better understand how they work and what they do.

### The parameters of numpy mean

The np.mean affair has five parameters :

`a`

`axis`

`dtype`

`out`

`keepdims`

Let ’ s promptly discuss each parameter and what it does. ** a** ( required )

The

`a =`

parameter enables you to specify the claim NumPy align that you want numpy.mean to operate on. This parameter is required. You need to give the NumPy beggarly something to operate on. Having said that, it ’ s actually a bit compromising. You can give it any array like object. That means that you can pass the np.mean ( ) function a proper NumPy array. But you can besides give it things that are structurally exchangeable to arrays like Python lists, tuples, and early objects. **( optional )**

`axis`

technically, the

`axis`

is the proportion on which you perform the calculation. On the other hand, saying it that way confuses many beginners. so another means to think of this is that the `axis`

parameter enables you to calculate the mean of the rows or column. The reason for this is that NumPy arrays have axes. What is an axis ? An “ bloc ” is like a dimension along a NumPy align. think of axes like the directions in a cartesian organize system. In cartesian coordinates, you can move in unlike directions. We typically call those directions “ ten ” and “ y. ” similarly, you can move along a NumPy array in different directions. You can move down the rows and across the column. In NumPy, we call these “ directions ” axes. specifically, in a two-dimensional array, “ axis 0 ” is the management that points vertically down the rows and “ axis 1 ” is the guidance that points horizontally across the column. indeed how does this relate to NumPy average ? When you have a multi dimensional NumPy array object, it ’ s possible to compute the mean of a set of values down along the rows or across the column. In these cases, NumPy produces a new align object that holds the calculate means for the rows or the column respectively. This credibly sounds a short abstract and jumble, so I ’ ll indicate you solid examples of how to do this subsequently in this blog post. additionally, if you ’ re still a little broken about them, you should read our tutorial that explains how to think about NumPy axes. **( optional )**

`dtype`

The

`dtype`

parameter enables you to specify the demand data character that will be used when computing the beggarly. By default, if the values in the input array are integers, NumPy will actually treat them as floating charge numbers ( `float64`

to be exact ). And if the numbers in the stimulation are floats, it will keep them as the lapp kind of float ; so if the inputs are `float32`

, the output of np.mean will be float32. If the inputs are `float64`

, the output will be `float64`

. Keep in mind that the data type can actually matter when you ’ re calculating the mean ; for floating degree numbers, the output will have the same preciseness as the input. If the input is a data type with relatively lower preciseness ( like `float16`

or `float32`

) the output signal may be inaccurate due to the lower preciseness. To fix this, you can use the `dtype`

parameter to specify that the output should be a higher preciseness float. ( See the examples below. ) **( optional )**

`out`

The

`out`

parameter enables you to specify a NumPy align that will accept the end product of np.mean ( ). If you use this parameter, the output range that you specify needs to have the like shape as the output that the beggarly function computes. **( optional )**

`keepdims`

The

`keepdims`

parameter enables you keep the dimensions of the end product the same as the dimensions of the input signal. This confuses many people, thus let me explain. The NumPy intend function summarizes data. It takes a bombastic number of values and summarizes them. so if you want to compute the mean of 5 numbers, the NumPy average function will summarize those 5 values into a one respect, the intend. When it does this, it is efficaciously reducing the dimensions. If we summarize a 1-dimensional align down to a one scalar respect, the dimensions of the output ( a scalar ) are lower than the dimensions of the stimulation ( a 1-dimensional array ). The lapp thing happens if we use the np.mean serve on a 2-d array to calculate the mean of the rows or the base of the column. When we compute those means, the output will have a reduced number of dimensions. sometimes, we don ’ thyroxine want that. There will be times where we want the output to have the claim like number of dimensions as the stimulation. For model, a 2-d array goes in, and a 2-d array comes out. To make this happen, we need to use the `keepdims`

parameter. By setting `keepdims = True`

, we will cause the NumPy mean function to produce an output that keeps the dimensions of the end product the lapp as the dimensions of the remark. This confuses many people, so there will be a concrete example below that will show you how this works. note that by default option, `keepdims`

is set to `keepdims = False`

. So the natural behavior of the function is to reduce the number of dimensions when computing means on a NumPy array. ## Examples: how to use the numpy mean function

now that we ’ ve taken a look at the syntax and the parameters of the NumPy beggarly function, let ’ s look at some examples of how to use the NumPy mean function to calculate averages. Before I show you these examples, I want to make note of an crucial learn principle. When you ’ re trying to learn and master data skill code, you should study and practice childlike examples. simple examples are examples that can help you intuitively understand how the syntax works. simple examples are besides things that you can practice and memorize. Mastering syntax ( like mastering any skill ) requires study, practice, and repeat. And by the means, before you run these examples, you need to make sure that you ’ ve imported NumPy properly into your Python environment. To do that, you ’ ll need to run the following code :

import numpy as np

Ok, now let ’ s move on to the code.

### Compute the mean of the elements of a 1-d array with np.mean

here, we ’ ll start with something very childlike. We ’ rhenium going to calculate the mean of the values in a single 1-dimensional array. To do this, we ’ ll first base create an array of six values by using the np.array affair.

np_array_1d = np.array([0,20,40,60,80,100])

Let ’ s cursorily examine the contents of the range by using the `print()`

affair.

print(np_array_1d)

Which produces the following output :

[0 20 40 60 80 100]

As you can see, the modern range, `np_array_1d`

, contains six values between 0 and 100. immediately, let ’ s calculate the entail of the data. here, we ’ re equitable going to call the np.mean function. The only argument to the function will be the name of the array, `np_array_1d`

.

np.mean(np_array_1d)

This code will produce the mean of the values :

50

visually though, we can think of this as follows. The NumPy beggarly function is taking the values in the NumPy array and computing the average. Keep in beware that the range itself is a 1-dimensional structure, but the resultant role is a single scalar value. In a sense, the entail ( ) officiate has reduced the count of dimensions. The output has a lower count of dimensions than the input. This will be crucial to understand when we start using the `keepdims`

parameter late in this tutorial.

### Compute the mean of the elements of a 2-d array with np.mean

future, let ’ s calculate the mean of the values in a two-dimensional NumPy array. To do this, we first need to create a 2-d align. We can do that by using the np.arange routine. We ’ ll besides use the reshape method to reshape the range into a two-dimensional array object.

np_array_2x3 = np.arange(start = 0, stop = 21, step = 4).reshape((2,3))

Let ’ s quickly search at the contents of the array by using the code `print(np_array_2x3)`

:

[[ 0 4 8] [12 16 20]]

As you can see, this is a two-dimensional object with six values : 0, 4, 8, 12, 16, 20. By using the reshape ( ) function, these values have been re-arranged into an array with 2 rows and 3 columns. now, let ’ s calculate the mean of these values. To do this, we ’ ll use the NumPy bastardly function good like we did in the prior exercise. We ’ ll call the function and the argument to the routine will merely be the name of this 2-d array.

np.mean(np_array_2x3)

Which produces the follow resultant role :

10.0

here, we ’ ra working with a two-dimensional array, but the beggarly ( ) function has still produced a single measure. When you use the NumPy beggarly officiate on a 2-d range ( or an array of higher dimensions ) the default behavior is to compute the think of of all of the values. Having said that, you can besides use the NumPy mean routine to compute the beggarly value in every row or the mean value in every column of a NumPy array. Let ’ s take a look at how to do that.

### Compute the column means of a 2-d array

here, we ’ ll look at how to calculate the column mean. To understand how to do this, you need to know how axe work in NumPy. Recall earlier in this tutorial, I explained that NumPy arrays have what we call axis. Again, axes are like directions along the align. Axis 0 refers to the quarrel steering. Axis 1 refers to the column direction. You in truth need to know this in order to use the `axis`

parameter of NumPy entail. There ’ s not in truth a big way to learn this, so I recommend that you barely memorize it … the row-direction is axis 0 and the column direction is axis 1. Having explained axes again, let ’ s take a front at how we can use this information in conjunction with the `axis`

parameter.

##### The axis parameter specifies which axis you want to summarize

Using the `axis`

parameter is confusing to many people, because the way that it is used is a little counter intuitive. With that in mind, let me explain this in a way that might improve your intuition. When we use the axis parameter, we are specifying which axis we want to summarize. Said differently, we are specifying which axis we want to collapse. sol when we specify `axis = 0`

, that means that we want to collapse axis 0. Remember, bloc 0 is the row axis, so this means that we want to collapse or summarize the rows, but keep the column integral. Let me show you an example to help this make sense.

##### Compute a mean with axis = 0

Let ’ s first create a two-dimensional NumPy align. ( note : we used this code earlier in the tutorial, so if you ’ ve already run it, you don ’ t need to run it again. )

np_array_2x3 = np.arange(start = 0, stop = 21, step = 4).reshape((2,3))

Ok. Let ’ s cursorily examine the contents by using the code `print(np_array_2x3)`

:

[[ 0 4 8] [12 16 20]]

As you can see, this is a two-dimensional array with 2 rows and 3 columns. immediately that we have our NumPy range, let ’ s calculate the mean and set up `axis = 0`

.

np.mean(np_array_2x3, axis = 0)

Which produces the trace output :

array([ 6., 10., 14.])

What happened here ? basically, the np.mean function has produced a new array. But notice what happened here. alternatively of calculating the hateful of all of the values, it created a drumhead ( the hateful ) along the “ axis-0 direction. ” Said differently, it collapsed the data along the axis-0 direction, computing the mean of the values along that direction.

Why ? Remember, axis 0 is the row axis. so when we set `axis = 0`

inside of the np.mean routine, we ’ rhenium basically indicating that we want NumPy to calculate the think of down axis 0 ; calculate the mean down the row-direction ; calculate row-wise. This is a short confuse to beginners, so I think it ’ mho important to think of this in terms of directions. Along which direction should the average affair function ? When we set `axis = 0`

, we ’ ra argue that the entail function should move along the 0th axis … the guidance of axis 0. If that doesn ’ t make feel, look again at the photograph immediately above and pay attention to the management along which the think of is being calculated.

### Compute the row means of a 2-d array

similarly, we can compute row means of a NumPy range. In this case, we ’ rhenium going to use the NumPy array that we created earlier with the trace code :

np_array_2x3 = np.arange(start = 0, stop = 21, step = 4).reshape((2,3))

This code creates the follow array :

[[ 0 4 8] [12 16 20]]

It is a two-dimensional array. As you can see, it has 3 column and 2 rows. immediately, we ’ ra going to calculate the mean while setting `axis = 1`

.

np.mean(np_array_2x3, axis = 1)

Which gives us the end product :

array([ 4., 16.])

indeed permit ’ s speak about what happened here. First remember that bloc 1 is the column commission ; the focus that sweeps across the column. When we set `axis = 1`

inside of the NumPy entail serve, we ’ rhenium telling np.mean that we want to calculate the beggarly such that we summarize the datum in that guidance. Again, said differently, we are collapsing the axis-1 guidance and computing our compendious statistic in that commission ( i.e., the base ). Do you see now ? Axis 1 is the column management ; the commission that sweeps across the column. When we set `axis = 1`

, we are indicating that we want NumPy to operate along this direction. It will therefore compute the mean of the values along that steering ( axis 1 ), and produce an array that contains those base values : `[4., 16.]`

.

### How to use the keepdims parameter with np.mean

Ok. now that you ’ ve learned about how to use the `axis`

argument, let ’ s talk about how to use the ** keepdims** argument. The

`keepdims`

parameter of NumPy mean enables you to control the dimensions of the output. specifically, it enables you to make the dimensions of the output precisely the same as the dimensions of the input array. To understand this, let ’ s beginning take a look at a few of our anterior examples. Earlier in this blog position, we calculated the base of a 1-dimensional align with the code `np.mean(np_array_1d)`

, which produced the intend rate, `50`

. There ’ second something elusive here though that you might have missed. The dimensions of the output are not the lapp as the stimulation. To see this, let ’ s take a look first gear at the dimensions of the input signal range. We can do this by examining the `ndim`

property, which tells us the number of dimensions : np_array_1d.ndim

When you run this code, it will produce the following end product : `1`

. The array `np_array_1d`

is a 1-dimensional array. now let ’ s take a spirit at the act of dimensions of the output of np.mean ( ) when we use it on `np_array_1d`

. again, we can do this by using the `ndim`

parameter :

np.mean(np_array_1d).ndim

Which produces the follow output : `0`

. So the input ( `np_array_1d`

) has 1 property, but the output of np.sum has 0 dimensions … the end product is a scalar. In some sense, the output of np.sum has a reduced number of dimensions as the remark. This is relevant to the `keepdims`

argument, so bear with me as we take a search at another example. Let ’ s look at the dimensions of the 2-d array that we used earlier in this web log post :

np_array_2x3.ndim

When you run this code, the output will tell you that `np_array_2x3`

is a two-dimensional array. What about the output of np.sum ? If we don ’ thyroxine specify an axis, the end product of np.sum ( ) on this range will have 0 dimensions. You can check it with this code :

np.mean(np_array_2x3).dim

Which produces the surveil output : `0`

. When we use np.mean on a 2-d array, it calculates the base. The average value is a scalar, which has 0 dimensions. In this casing, the output of np.mean has a different number of dimensions than the input. What if we set an axis ? Remember, if we use np.mean and set `axis = 0`

, it will produce an array of means. Run this code :

np.mean(np_array_2x3, axis = 0)

Which produces the output `array([ 6., 10., 14.])`

. And how many dimensions does this output have ? We can check by using the `ndim`

impute :

np.mean(np_array_2x3, axis = 0).ndim

Which tells us that the output signal of np.mean in this case, when we set axis set to 0, is a 1-dimensional object. The input had 2 dimensions and the output signal has 1 proportion. again, the output has a different issue of dimensions than the input. Ok, now that we ’ ve looked at some examples showing issue of dimensions of inputs vs. outputs, we ’ rhenium quick to talk about the `keepdims`

argument.

###### The keepdims parameter keeps the dimensions of the output the same as the dimensions of the input

The `keepdims`

parameter enables you to set the dimensions of the output to be the like as the dimensions of the remark. `keepdims`

takes a logical argument … meaning that you can set it to `True`

or `False`

. By default, the parameter is set as `keepdims = False`

. This means that the mean ( ) officiate will not keep the dimensions the like. By default, the dimensions of the output will not be the lapp as the dimensions of the input. And that ’ s precisely what we just saw in the stopping point few examples in this section ! On the other hand, if we set `keepdims = True`

, this will cause the number of dimensions of the output to be precisely the lapp as the dimensions of the input signal.

##### Set keepdims equal to true (and keep the same dimensions)

Let ’ s take a expect. once again, we ’ re going to operate on our NumPy range `np_array_2x3`

. Remember, this is a two-dimensional aim, which we saw by examining the `ndim`

property. nowadays, let ’ s once again examine the dimensions of the np.mean function when we calculate with `axis = 0`

.

np.mean(np_array_2x3, axis = 0).ndim

This code indicates that the output of np.mean in this case has 1-dimension. Why ? Because we didn ’ triiodothyronine assign anything for `keepdims`

so it defaulted to `keepdims = False`

. This code does not deep the dimensions of the output the same as the dimensions of the input signal. now, let ’ s explicitly use the `keepdims`

parameter and set `keepdims = True`

.

np.mean(np_array_2x3, axis = 0, keepdims = True).ndim

Which produces the be end product :

2

When we use np.mean on a 2-d align and bent `keepdims = True`

, the output signal will besides be a 2-d array. When we set `keepdims = True`

, the dimensions of the output will be the same as the dimensions of the input. I ’ m not going to explain when and why you might need to do this …. precisely understand that when you need to dimensions of the output to be the lapp, you can force this behavior by setting `keepdims = True`

.

### How to use the dtype parameter

Ok, one last case. Let ’ s look at how to specify the output datatype by using the `dtype`

parameter. As I mentioned earlier, if the values in your remark range are integers the output will be of the `float64`

data type. If the values in the input align are floats, then the end product will be the same type of float. so if the inputs are `float32`

, the outputs will be `float32`

, etc. But what if you want to specify another data type for the output ? You can do this with the `dtype`

parameter. Let ’ s take a look at a simple example. here, we ’ ll create a simple 1-dimensional NumPy align of integers by using the NumPy numpy arange function.

np_array_1d_int = np.array([1,3,4,7,11])

And we can check the data type of the values in this array by using the dtype assign :

np_array_1d_int.dtype

When you run that code, you ’ ll find that the values are being stored as integers ; `int64`

to be precise. immediately let ’ s practice numpy mean to calculate the bastardly of the numbers :

mean_output = np.mean(np_array_1d_int)

now, we can check the data type of the output, `mean_output`

.

mean_output.dtype

Which tells us that the datatype is `float64`

. This is precisely the behavior we should expect. As I mentioned earlier, by default, NumPy produces output with the `float64`

data type.

##### Setting the data type with the dtype parameter

so nowadays that we ’ ve looked at the default demeanor, let ’ s change it by explicitly setting the `dtype`

argument.

mean_output_alternate = np.mean(np_array_1d_int, dtype = 'float32')

The object `mean_output_alternate`

contains the forecast bastardly, which is `5.1999998`

. now, let ’ s check the datatype of `mean_output_alternate`

.

mean_output_alternate.dtype

When you run this, you can see that `mean_output_alternate`

contains values of the `float32`

data type. This is precisely what we ’ d expect, because we set `dtype = 'float32'`

.

##### Be careful when you use the dtype parameter

As I mentioned earlier, you need to be careful when you use the `dtype`

parameter. If you need the output of np.mean to have high preciseness, you need to be sure to select a data type with senior high school preciseness. For model, if you need the solution to have senior high school preciseness, you might select `float64`

. If you select a datum type with gloomy preciseness ( like `int`

), the consequence may be inaccurate or imprecise.

## If you want to learn data science in Python, learn NumPy

You ’ ve probably heard that 80 % of data skill employment is fair data handling. That ’ s by and large true. If you want to be great at data science in Python, you need to know how to manipulate data in Python. And one of the primary toolkits for manipulating data in Python is the NumPy faculty. In this post, I ’ ve shown you how to use the NumPy bastardly affair, but we besides have respective early tuturials about early NumPy topics, like how to create a numpy array, how to reshape a numpy array, how to create an array with all zeros, and many more. If you ’ ra concern in learning NumPy, decidedly check those out.

## For more Python data science tutorials, sign up for our email list

More broadly though, if you ’ ra concerned in learning ( and mastering ) data skill in Python, or data skill generally, you should sign up for our e-mail list good now. here at the Sharp Sight blog, we regularly post tutorials about a variety of data skill topics … in detail, about NumPy. If you want to learn NumPy and data science in Python, sign up for our electronic mail list. If you sign up for our e-mail list, you ’ ll receive Python data science tutorials delivered to your inbox. You ’ ll get free tutorials on :

- NumPy
- Pandas
- Base Python
- Scikit learn
- Machine learning
- Deep learning
- … and more.

Want to learn data skill in Python ? Sign astir immediately.

## Sign up for FREE data science tutorials

If you want to master data skill fast, sign up for our electronic mail list. When you sign up, you ‘ll receive FREE hebdomadally tutorials on how to do data skill in R and Python.