DataFrames Manipulations

In [1]:
import pandas as pd

Weather datasets

In [2]:
eu_weather_df = pd.DataFrame({
  "town": ["Atina", "Oslo", "London"],
  "temp":[35,21,25],
  "rain": [False, False, True ]
})
eu_weather_df
Out[2]:
rain temp town
0 False 35 Atina
1 False 21 Oslo
2 True 25 London
In [3]:
bg_weather_df = pd.DataFrame({
  "town": ["Sofia", "Sandanski", "Pleven"],
  "temp":[25,32,21],
  "rain": [False, False, True ]
})
bg_weather_df
Out[3]:
rain temp town
0 False 25 Sofia
1 False 32 Sandanski
2 True 21 Pleven

Re-arange columns

In [4]:
cols = ["town", "temp", "rain"]
In [5]:
eu_weather_df = eu_weather_df[cols]
eu_weather_df
Out[5]:
town temp rain
0 Atina 35 False
1 Oslo 21 False
2 London 25 True
In [6]:
bg_weather_df = bg_weather_df[cols]
bg_weather_df
Out[6]:
town temp rain
0 Sofia 25 False
1 Sandanski 32 False
2 Pleven 21 True

Add columns

The simplest, and most commmon way is with [] operator. But you can use .insert or .loc methods

In [7]:
eu_weather_df["wind"]= [1.5, 7.5, 4]
eu_weather_df
/data/projects/www/wwwcourses.github.io/ProgressBG-VC-Python/.venv/lib/python3.6/site-packages/ipykernel_launcher.py:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.
Out[7]:
town temp rain wind
0 Atina 35 False 1.5
1 Oslo 21 False 7.5
2 London 25 True 4.0

Add columns from another dataframe

In [8]:
wind_df=pd.DataFrame([3.4, 2,6.5], columns=["wind"])
wind_df
Out[8]:
wind
0 3.4
1 2.0
2 6.5
In [9]:
bg_weather_df_copy = bg_weather_df.copy()
bg_weather_df_copy
Out[9]:
town temp rain
0 Sofia 25 False
1 Sandanski 32 False
2 Pleven 21 True
In [10]:
bg_weather_df_copy_conc = pd.concat([bg_weather_df_copy, wind_df],axis=1)
bg_weather_df_copy_conc
Out[10]:
town temp rain wind
0 Sofia 25 False 3.4
1 Sandanski 32 False 2.0
2 Pleven 21 True 6.5

Deleting columns

in place: with del or pop()

In [11]:
eu_weather_df

df_tmp = eu_weather_df.copy()
del df_tmp["wind"]
df_tmp
Out[11]:
town temp rain
0 Atina 35 False
1 Oslo 21 False
2 London 25 True

in new dataframe

In [12]:
df_tmp = eu_weather_df.drop(["wind","rain"], axis=1)
df_tmp
Out[12]:
town temp
0 Atina 35
1 Oslo 21
2 London 25

Append new rows

with append()

In [13]:
# keep indexes:
appended_weather = eu_weather_df.append(bg_weather_df)

#auto indexing
# appended_weather = eu_weather_df.append(bg_weather_df,ignore_index=True)

appended_weather
Out[13]:
rain temp town wind
0 False 35 Atina 1.5
1 False 21 Oslo 7.5
2 True 25 London 4.0
0 False 25 Sofia NaN
1 False 32 Sandanski NaN
2 True 21 Pleven NaN

with concat() - for rows, i.e. axis=0

In [14]:
# keep indexes
concat_weather = pd.concat([eu_weather_df, bg_weather_df])

# auto indexing:
# concat_weather = pd.concat([eu_weather_df, bg_weather_df], ignore_index=True)

# add keys for each DF
concat_weather = pd.concat([eu_weather_df, bg_weather_df],keys=["EU", "BG"])

# retrieve by index location:
# concat_weather.loc["BG"]

concat_weather
Out[14]:
rain temp town wind
EU 0 False 35 Atina 1.5
1 False 21 Oslo 7.5
2 True 25 London 4.0
BG 0 False 25 Sofia NaN
1 False 32 Sandanski NaN
2 True 21 Pleven NaN

Concat dataframe as new columns, i.e. axis=1

In [15]:
print(eu_weather_df)
print(bg_weather_df)
concat_weather_as_columns = pd.concat([eu_weather_df, bg_weather_df], axis=1)
concat_weather_as_columns
     town  temp   rain  wind
0   Atina    35  False   1.5
1    Oslo    21  False   7.5
2  London    25   True   4.0
        town  temp   rain
0      Sofia    25  False
1  Sandanski    32  False
2     Pleven    21   True
Out[15]:
town temp rain wind town temp rain
0 Atina 35 False 1.5 Sofia 25 False
1 Oslo 21 False 7.5 Sandanski 32 False
2 London 25 True 4.0 Pleven 21 True

DataFrame Object summary statistics

describe()

In [16]:
print(bg_weather_df)
bg_weather_df.describe()
        town  temp   rain
0      Sofia    25  False
1  Sandanski    32  False
2     Pleven    21   True
Out[16]:
temp
count 3.000000
mean 26.000000
std 5.567764
min 21.000000
25% 23.000000
50% 25.000000
75% 28.500000
max 32.000000

count(), max(), min(),mean(), std()

In [17]:
print(bg_weather_df)
print("***count:\n", bg_weather_df.count())
print("***min:\n", bg_weather_df.min())
print("***std:\n", bg_weather_df.std())
        town  temp   rain
0      Sofia    25  False
1  Sandanski    32  False
2     Pleven    21   True
***count:
 town    3
temp    3
rain    3
dtype: int64
***min:
 town    Pleven
temp        21
rain     False
dtype: object
***std:
 temp    5.567764
rain    0.577350
dtype: float64