API: how to create a "shallow copy" of a DataFrame?

How can you create a copy of the DataFrame without copying the actual data, but having a new DataFrame that when updated (not in place) does not modify the original ("shallow copy")? And how is this expected to behave? 
I suppose that in technical terms, this would be a new BlockManager that references the same arrays?

I ran in the above questions, and actually didn't know a clear answer. The context was: I wanted to replace one column of a DataFrame, but without modifying the original one. And so was wondering if I could do that without making a full copy of the DataFrame (as in theory this is not needed, and I just wanted to update one object column before serializing). 

---

So you can do something like this with `copy(deep=False)`. Let's explore this somewhat:

Making a normal (deep) and shallow copy:

```
In [1]: df = pd.DataFrame({'a': [1, 2, 3], 'b': [.1, .2, .3]}) 

In [2]: df_copy = df.copy() 

In [3]: df_shallow = df.copy(deep=False)
```

Modifying values in place works as expected: for the copy it does not change the original df, for the shallow copy it does:
```
In [4]: df_copy.iloc[0,0] = 10  

In [5]: df_shallow.iloc[1,0] = 20  

In [6]: df    
Out[6]: 
    a    b
0   1  0.1
1  20  0.2
2   3  0.3
```

Overwriting a full column, however, becomes more tricky (due to our BlockManager ...):

```
# this updates the original df
In [7]: df_shallow['a'] = [10, 20, 30] 

In [8]: df
Out[8]: 
    a    b
0  10  0.1
1  20  0.2
2  30  0.3

# this does not update the original
In [9]: df_shallow['b'] = [100, 200, 300]  

In [10]: df_shallow  
Out[10]: 
    a    b
0  10  100
1  20  200
2  30  300

In [11]: df  
Out[11]: 
    a    b
0  10  0.1
1  20  0.2
2  30  0.3
```

This is of course somewhat expected if you know the internals: if the new column is of the same dtype, it seems to modify the array of the block in place, while if it needs to create a new block (because the dtype changed on assignment), the reference with the old data is broken and it doesn't modify the original dataframe. 

While writing this down, I am realizing that my question is maybe more: *should assigning a column (`df['a'] =  ..`) be seen as an in-place modification of your dataframe that has impact through shallow copies?* 
Because in reality, `df['a']` cannot always happen in place (if you are overwriting with a different dtype), this gives rather inconsistent and surprising behaviour depending on the dtypes.




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

API: how to create a "shallow copy" of a DataFrame? #29309

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

API: how to create a "shallow copy" of a DataFrame? #29309

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions