# Tips for Python - Numpy and Pandas library

## Introduction

Saving time for cleaning, tidying and processing data is quite useful in data science. It means that you can get more time to analyses and think about solutions.

If I working Data Science with Python, usually I am using Pandas and Numpy library. It is a great library with a lot smart functions. However, sometimes I forget some functions and write my own functions to solve calculations. For practicing it is cool, but it spends some time.

As a reference guide I will write some interesting functions built-in Pandas to enforce my memory and besides, maybe I can help some one.

## Pandas

### Function accumulate

Let suppose for a 10 days series with rainy data:

df = pd.DataFrame({"DAY": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
"RAIN": [6, 0, 0, 20, 30, 40, 0, 0, 10, 15]})

... >>> df
DAY  RAIN
0    1     6
1    2     0
2    3     0
3    4    20
4    5    30
5    6    40
6    7     0
7    8     0
8    9    10
9   10    15


For this data We need to calculate the accumulate sum.

>>>  df['RAIN'].cumsum()
0      6
1      6
2      6
3     26
4     56
5     96
6     96
7     96
8    106
9    121
Name: RAIN, dtype: float64


### Function diff

Now we desire to calculate the rain difference between each day.

>>> df['RAIN'].diff()
0     NaN
1    -6.0
2     0.0
3    20.0
4    10.0
5    10.0
6   -40.0
7     0.0
8    10.0
9     5.0
Name: RAIN, dtype: float64


Note that in the first row the NaN value, If you want remove this just use df['RAIN'].diff()[1:]

### Rolling function

To create a window to get data from a column is possible to write a code with looping (for):

for i in range(1, len(df)):
cc = df['RAIN'].iloc[i-1:i+1].mean()
print(cc)
>>> 3.0
0.0
10.0
25.0
35.0
20.0
0.0
5.0
12.5


Instead a looping, it is possible to use the rolling method. This method Provide rolling window calculations. For each row is grepped an assigned range, e.g. if the window’s size is 2, for the row 6 the values from row 6 until 5 will be get to calculations.

>>> df['RAIN'].rolling(2).mean()
0     NaN
1     3.0
2     0.0
3    10.0
4    25.0
5    35.0
6    20.0
7     0.0
8     5.0
9    12.5
Name: RAIN, dtype: float64


You can replace mean() for sum() or another function.

## Numpy

### Create a empty array

Supposing you wanna to create an empty array with 3 rows and 5 columns.

arr = np.empty(shape=(3, 5))
arr[:] = np.NaN
>>> arr
array([[nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan]])


Now We will fill some values:

>>> arr = np.random.randint(0, 20, 5)
>>> arr = np.random.randint(0, 20, 5)
>>> arr = np.random.randint(0, 20, 5)
>>> arr
array([[ 7.,  8.,  5.,  5., 17.],
[16., 14., 15., 19., 16.],
[15.,  6.,  9.,  6.,  5.]])


To get average for rows or columns:

>>> arr.mean(axis=0)
array([12.66666667,  9.33333333,  9.66666667, 10.0 , 12.66666667])
>>> arr.mean(axis=1)
array([ 8.4, 16.0 , 8.2])


https://pandas.pydata.org

https://www.python.org/