DataFrames Manipulations¶

import pandas as pd

Weather datasets¶

eu_weather_df = pd.DataFrame({
  "town": ["Atina", "Oslo", "London"],
  "temp":[35,21,25],
  "rain": [False, False, True ]
})
eu_weather_df

bg_weather_df = pd.DataFrame({
  "town": ["Sofia", "Sandanski", "Pleven"],
  "temp":[25,32,21],
  "rain": [False, False, True ]
})
bg_weather_df

Re-arange columns¶

cols = ["town", "temp", "rain"]

eu_weather_df = eu_weather_df[cols]
eu_weather_df

bg_weather_df = bg_weather_df[cols]
bg_weather_df

Add columns¶

The simplest, and most commmon way is with [] operator. But you can use .insert or .loc methods

eu_weather_df["wind"]= [1.5, 7.5, 4]
eu_weather_df

/data/projects/www/wwwcourses.github.io/ProgressBG-VC-Python/.venv/lib/python3.6/site-packages/ipykernel_launcher.py:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.

Add columns from another dataframe¶

wind_df=pd.DataFrame([3.4, 2,6.5], columns=["wind"])
wind_df

bg_weather_df_copy = bg_weather_df.copy()
bg_weather_df_copy

bg_weather_df_copy_conc = pd.concat([bg_weather_df_copy, wind_df],axis=1)
bg_weather_df_copy_conc

Deleting columns¶

in place: with del or pop()¶

eu_weather_df

df_tmp = eu_weather_df.copy()
del df_tmp["wind"]
df_tmp

in new dataframe¶

df_tmp = eu_weather_df.drop(["wind","rain"], axis=1)
df_tmp

Append new rows¶

with append()¶

# keep indexes:
appended_weather = eu_weather_df.append(bg_weather_df)

#auto indexing
# appended_weather = eu_weather_df.append(bg_weather_df,ignore_index=True)

appended_weather

with concat() - for rows, i.e. axis=0¶

# keep indexes
concat_weather = pd.concat([eu_weather_df, bg_weather_df])

# auto indexing:
# concat_weather = pd.concat([eu_weather_df, bg_weather_df], ignore_index=True)

# add keys for each DF
concat_weather = pd.concat([eu_weather_df, bg_weather_df],keys=["EU", "BG"])

# retrieve by index location:
# concat_weather.loc["BG"]

concat_weather

Concat dataframe as new columns, i.e. axis=1¶

print(eu_weather_df)
print(bg_weather_df)
concat_weather_as_columns = pd.concat([eu_weather_df, bg_weather_df], axis=1)
concat_weather_as_columns

     town  temp   rain  wind
0   Atina    35  False   1.5
1    Oslo    21  False   7.5
2  London    25   True   4.0
        town  temp   rain
0      Sofia    25  False
1  Sandanski    32  False
2     Pleven    21   True

DataFrame Object summary statistics¶

describe()¶

print(bg_weather_df)
bg_weather_df.describe()

        town  temp   rain
0      Sofia    25  False
1  Sandanski    32  False
2     Pleven    21   True

count(), max(), min(),mean(), std()¶

print(bg_weather_df)
print("***count:\n", bg_weather_df.count())
print("***min:\n", bg_weather_df.min())
print("***std:\n", bg_weather_df.std())

        town  temp   rain
0      Sofia    25  False
1  Sandanski    32  False
2     Pleven    21   True
***count:
 town    3
temp    3
rain    3
dtype: int64
***min:
 town    Pleven
temp        21
rain     False
dtype: object
***std:
 temp    5.567764
rain    0.577350
dtype: float64

	temp
count	3.000000
mean	26.000000
std	5.567764
min	21.000000
25%	23.000000
50%	25.000000
75%	28.500000
max	32.000000

	rain	temp	town	wind
0	False	35	Atina	1.5
1	False	21	Oslo	7.5
2	True	25	London	4.0
0	False	25	Sofia	NaN
1	False	32	Sandanski	NaN
2	True	21	Pleven	NaN

	wind
0	3.4
1	2.0
2	6.5