Conditional in Python: map vs loop

2021-12-02

python / tips / programming / map / loop / data science / R

Usually I need to classify some kind of data. Some weeks ago, I had a huge file with dates and precipitation, and one of the tasks was classify each day as ten-day. Thus, each month was divided in three sections as follow:

from day 01 up to 10 is *ten-day 01*
from day 11 up to 20 is *ten-day 02*
from day 21 up to 31 is *ten-day 03*

Most of time I had used loops (Python language), however this time the process took a little bit slow. To improve this process I tried map method.

Regarding the article in Real Python - Python’s map(): Processing Iterables Without a Loop, the function map() has two advantages:

Since map() is written in C and is highly optimized, its internal implied loop can be more efficient than a regular Python for loop. This is one advantage of using map().

A second advantage of using map() is related to memory consumption. With a for loop, you need to store the whole list in your system’s memory. With map(), you get items on demand, and only one item is in your system’s memory at a given time.

So, let’s try!

Hands On!

Load packages and create a data frame:

import pandas as pd
import numpy as np
import time

df_prob = pd.DataFrame({'day': np.random.uniform(1, 31, 50000).round(0)})

df_prob.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50000 entries, 0 to 49999
Data columns (total 1 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   day     50000 non-null  float64
dtypes: float64(1)
memory usage: 390.8 KB

Now, comparing Loop vs map()

# loop
start_time = time.time()
ls_desc = []
for idx in range(len(df_prob)):
    if df_prob['day'].iloc[idx] <= 10:
        dd = '01'
    elif df_prob['day'].iloc[idx] > 10 and \
            df_prob['day'].iloc[idx] <= 20:
        dd = '02'
    elif df_prob['day'].iloc[idx] > 20 and \
            df_prob['day'].iloc[idx] <= 31:
        dd = '03'
    else:
        dd = '-99'
    ls_desc.append(dd)
print("loop --- %s seconds ---" % (time.time() - start_time))

# map
start_time = time.time()
ls_desc = list(map(lambda x: '01' if x <= 10 else
                   ('02' if x > 10 and x <= 20 else(
                       '03' if x > 20 and x <= 31 else '-99')),
                   df_prob['day'].values))
print("map --- %s seconds ---" % (time.time() - start_time))

The results:

>>> loop --- 0.8795549869537354 seconds ---
>>> map --- 0.022134780883789062 seconds ---

The function map was better than loop. For a small data like this example, maybe it does not matter. But for a large file ….

Notice that is necessary nested if / else to run into map function.

Extra test

After all, I tough about R statistical software. It is because I use R for several analysis. Actually, R is my favorite choice for data analyses.

How was R (base):

df_prob = data.frame('day' = round(runif(50000,  1, 31)))

start_time = Sys.time()
ls_desc <- ifelse(
    df_prob$day <= 10, '01',
           ifelse(df_prob$day > 10 & df_prob <= 20, '02',
           ifelse(df_prob$day > 20 & df_prob$day <= 31, '03', '-99')))
final_time <- (Sys.time() - start_time)
print(paste(" --- %s seconds ---", final_time))

> > [1] "loop --- %s seconds --- 0.0353324413299561"

R was fast as map() function in Python (ok, a little bit slower). Besides, both of them have a similar style.

Hardware and software

The test was performed on a workstation:


rafatieppo@rt-av52a:~/Dropbox/emacs_dot$ screenfetch 
         _,met$$$$$gg.           rafatieppo@rt-av52a
      ,g$$$$$$$$$$$$$$$P.        OS: Debian 10 buster
    ,g$$P""       """Y$$.".      Kernel: x86_64 Linux 4.19.0-16-amd64
   ,$$P'              `$$$.      Uptime: 1h 13m
  ',$$P       ,ggs.     `$$b:    Packages: 2699
  `d$$'     ,$P"'   .    $$$     Shell: bash 5.0.3
   $$P      d$'     ,    $$P     Resolution: 3840x1080
   $$:      $$.   -    ,d$$'     DE: MATE 1.20.2
   $$\;      Y$b._   _,d$P'      WM: Metacity (Marco)
   Y$$.    `.`"Y$$$$P"'          GTK Theme: 'Adapta' [GTK2/3]
   `$$b      "-.__               Icon Theme: Numix
    `Y$$                         Font: Monaco 14
     `Y$$.                       CPU: Intel Core i5-9300H @ 8x 4.1GHz [44.0°C]
       `$$b.                     GPU: GeForce GTX 1050
         `Y$$b.                  RAM: 1862MiB / 15901MiB
            `"Y$b._             
                `""""

References

https://www.python.org/

https://cran.r-project.org/

https://realpython.com/python-map-function/

Conditional in Python: map vs loop

who is faster

Hands On!

Extra test

Hardware and software

References

Share!