I am enthusiast programmer, thus I am no a professional. I wanna getting better and some times I write some notes to improve my English and programming skills. Certainly you will find some mistakes in the text and programming concepts in this post, I am sorry, I am trying.
Lets go to what matter now!
The word change is ambiguous in Python, it means that we have two distinct types of “change” in Python.
There a change for an assignment statement, and a change through a mutation.
Let’s say we have a variable x pointing to the value 7.
>>> x = 7
>>> id(x)
9062816
If we point x to a new object (e.g. a list), it’s id will change:
>>> x = [7, 8, 9]
>>> id(x)
140573420844168
If we assign x to y this will make y point to the same memory location as x:
>>> y = x
>>> id(y)
140573420844168
If change some item of x, the value will be changed in y.
>>> x[2] = 51
>>> y
[7, 8, 51]
The same happens in Data Frame:
>>> df_a = pd.DataFrame({'NAME': ['Joe', 'Mary', 'Paul'],
... 'AGE': [25, 35, 46]})
>>> df_a
NAME AGE
0 Joe 25
1 Mary 35
2 Paul 46
If We assign df_b equal df_a:
>>> df_b = df_a
>>> df_b
NAME AGE
0 Joe 25
1 Mary 35
2 Paul 46
Now if We change a item in df_a, We get the same effect on df_b:
>>> df_a.iloc[2,1] = 99
>>> df_a
NAME AGE
0 Joe 25
1 Mary 35
2 Paul 99
>>> df_b
NAME AGE
0 Joe 25
1 Mary 35
2 Paul 99
To avoid that in Pandas, We need to use the method copy()
:
>>> df_a = pd.DataFrame({'NAME': ['Joe', 'Mary', 'Paul'],
... 'AGE': [25, 35, 46]})
>>> df_b = df_a.copy()
>>> df_a
NAME AGE
0 Joe 25
1 Mary 35
2 Paul 46
>>> df_b
NAME AGE
0 Joe 25
1 Mary 35
2 Paul 46
>>> df_a.iloc[2,1] = 99
>>> df_a
NAME AGE
0 Joe 25
1 Mary 35
2 Paul 99
>>> df_b
NAME AGE
0 Joe 25
1 Mary 35
2 Paul 46
That is it. Be careful for other objects (array, list, etc).
References
https://www.pythonmorsels.com/topics/2-types-change/