Pandas

Jérémie Decock (www.jdhp.org)

Import directives

In [1]:
%matplotlib inline
#%matplotlib notebook

from IPython.display import display

import matplotlib
matplotlib.rcParams['figure.figsize'] = (9, 9)

import pandas as pd
import numpy as np

Make data

Series (1D data)

With automatic indices

In [2]:
data_list = [1, 3, np.nan, 7]
serie = pd.Series(data_list)
serie
Out[2]:
0    1.0
1    3.0
2    NaN
3    7.0
dtype: float64
In [3]:
data_array = np.array(data_list)
serie = pd.Series(data_array)
serie
Out[3]:
0    1.0
1    3.0
2    NaN
3    7.0
dtype: float64

With defined indices

In [4]:
indices = pd.Series([1, 3, 5, 7])
serie = pd.Series([10, 30, 50, 70], index=indices)
serie
Out[4]:
1    10
3    30
5    50
7    70
dtype: int64
In [5]:
indices = pd.Series(['A', 'B', 'C', 'D'])
serie = pd.Series([10, 30, 50, 70], index=indices)
serie
Out[5]:
A    10
B    30
C    50
D    70
dtype: int64

Frames (2D data)

With automatic indices and columns

In [6]:
data_array = np.array([[1, 2, 3], [4, 5, 6]])
df = pd.DataFrame(data_array)
df
Out[6]:
0 1 2
0 1 2 3
1 4 5 6

With defined indices and columns

Using lists:

In [7]:
data_array = np.array([[1, 2, 3], [4, 5, 6]])
df = pd.DataFrame(data_array, index=[10, 20], columns=[100, 200, 300])
df
Out[7]:
100 200 300
10 1 2 3
20 4 5 6

Using numpy arrays:

In [8]:
data_array = np.array([[1, 2, 3], [4, 5, 6]])
index_array = np.array([10, 20])
columns_array = np.array([100, 200, 300])
df = pd.DataFrame(data_array, index=index_array, columns=columns_array)
df
Out[8]:
100 200 300
10 1 2 3
20 4 5 6

Using Series:

In [9]:
data_array = np.array([[1, 2, 3], [4, 5, 6]])
index_series = pd.Series([10, 20])
columns_series = pd.Series([100, 200, 300])
df = pd.DataFrame(data_array, index=index_series, columns=columns_series)
df
Out[9]:
100 200 300
10 1 2 3
20 4 5 6

With columns from dict

Dictionary keys define columns label.

In [10]:
data_dict = {'A': 'foo',
             'B': [10, 20, 30],
             'C': 3}
df = pd.DataFrame(data_dict)
df
Out[10]:
A B C
0 foo 10 3
1 foo 20 3
2 foo 30 3

To define index as well:

In [11]:
data_dict = {'A': 'foo',
             'B': [10, 20, 30],
             'C': 3}
df = pd.DataFrame(data_dict, index=[10, 20, 30])
df
Out[11]:
A B C
10 foo 10 3
20 foo 20 3
30 foo 30 3

Panels (3D data)

Panels are deprecated.

Pandas now focuses on 1D (Series) and 2D (DataFrame) data structures.

The recommended alternative to work with 3-dimensional data is the xarray python library.

An other workaround: one can simply use a MultiIndex DataFrame for easily working with higher dimensional data.

See http://pandas.pydata.org/pandas-docs/stable/dsintro.html#deprecate-panel.

Panel4D and PanelND (ND data)

Panel4D and PanelND are deprecated.

Pandas now focuses on 1D (Series) and 2D (DataFrame) data structures.

The recommended alternative to work with n-dimensional data is the xarray python library.

An other workaround: one can simply use a MultiIndex DataFrame for easily working with higher dimensional data.

See http://pandas.pydata.org/pandas-docs/stable/dsintro.html#panel4d-and-panelnd-deprecated.

Export/import data (write/read files)

Reader functions are accessibles from the top level pd object.

Writer functions are accessibles from data objects (i.e. Series, DataFrame or Panel objects).

In [12]:
data_array = np.array([[1, 2, 3], [4, 5, 6]])
df = pd.DataFrame(data_array, index=[10, 20], columns=[100, 200, 300])
df
Out[12]:
100 200 300
10 1 2 3
20 4 5 6

CSV files

Write CSV files

Simplest version:

In [13]:
df.to_csv(path_or_buf="python_pandas_io_test.csv")
In [14]:
!cat python_pandas_io_test.csv
,100,200,300
10,1,2,3
20,4,5,6

Setting more options:

In [15]:
# FYI, many other options are available
df.to_csv(path_or_buf="python_pandas_io_test.csv",
          sep=',',
          columns=None,
          header=True,
          index=True,
          index_label=None,
          compression=None,  # allowed values are 'gzip', 'bz2' or 'xz'
          date_format=None)
In [16]:
!cat python_pandas_io_test.csv
,100,200,300
10,1,2,3
20,4,5,6

Read CSV files

Simplest version:

In [17]:
df = pd.read_csv("python_pandas_io_test.csv")
df
Out[17]:
Unnamed: 0 100 200 300
0 10 1 2 3
1 20 4 5 6

Setting more options:

In [18]:
df = pd.read_csv("python_pandas_io_test.csv",
                 sep=',',
                 delimiter=None,
                 header='infer',
                 names=None,
                 index_col=0,
                 usecols=None,
                 squeeze=False,
                 prefix=None,
                 mangle_dupe_cols=True,
                 dtype=None,
                 engine=None,
                 converters=None,
                 true_values=None,
                 false_values=None,
                 skipinitialspace=False,
                 skiprows=None,
                 nrows=None,
                 na_values=None,
                 keep_default_na=True,
                 na_filter=True,
                 verbose=False,
                 skip_blank_lines=True,
                 parse_dates=False,
                 infer_datetime_format=False,
                 keep_date_col=False,
                 date_parser=None,
                 dayfirst=False,
                 iterator=False,
                 chunksize=None,
                 compression='infer',
                 thousands=None,
                 decimal=b'.',
                 lineterminator=None,
                 quotechar='"',
                 quoting=0,
                 escapechar=None,
                 comment=None,
                 encoding=None,
                 dialect=None,
                 tupleize_cols=False,
                 error_bad_lines=True,
                 warn_bad_lines=True,
                 skipfooter=0,
                 skip_footer=0,
                 doublequote=True,
                 delim_whitespace=False,
                 as_recarray=False,
                 compact_ints=False,
                 use_unsigned=False,
                 low_memory=True,
                 buffer_lines=None,
                 memory_map=False,
                 float_precision=None)
df
Out[18]:
100 200 300
10 1 2 3
20 4 5 6
In [19]:
!rm python_pandas_io_test.csv

JSON files

In [20]:
import io

Write JSON files

Simplest version
In [21]:
df.to_json(path_or_buf="python_pandas_io_test.json")
In [22]:
!cat python_pandas_io_test.json
{"100":{"10":1,"20":4},"200":{"10":2,"20":5},"300":{"10":3,"20":6}}
Setting orient="split"
In [23]:
df.to_json(path_or_buf="python_pandas_io_test_split.json",
           orient="split")
In [24]:
!cat python_pandas_io_test_split.json
{"columns":["100","200","300"],"index":[10,20],"data":[[1,2,3],[4,5,6]]}
Setting orient="records"
In [25]:
df.to_json(path_or_buf="python_pandas_io_test_records.json",
           orient="records")
In [26]:
!cat python_pandas_io_test_records.json
[{"100":1,"200":2,"300":3},{"100":4,"200":5,"300":6}]
Setting orient="index" (the default option for Series)
In [27]:
df.to_json(path_or_buf="python_pandas_io_test_index.json",
           orient="index")
In [28]:
!cat python_pandas_io_test_index.json
{"10":{"100":1,"200":2,"300":3},"20":{"100":4,"200":5,"300":6}}
Setting orient="columns" (the default option for DataFrame) (for DataFrame only)
In [29]:
df.to_json(path_or_buf="python_pandas_io_test_columns.json",
           orient="columns")
In [30]:
!cat python_pandas_io_test_columns.json
{"100":{"10":1,"20":4},"200":{"10":2,"20":5},"300":{"10":3,"20":6}}
Setting orient="values" (for DataFrame only)
In [31]:
df.to_json(path_or_buf="python_pandas_io_test_values.json",
           orient="values")
In [32]:
!cat python_pandas_io_test_values.json
[[1,2,3],[4,5,6]]
Setting more options
In [33]:
# FYI, many other options are available
df.to_json(path_or_buf="python_pandas_io_test.json",
           orient='columns',     # For DataFrame: 'split','records','index','columns' or 'values'
           date_format=None,     # None, 'epoch' or 'iso'
           double_precision=10,
           force_ascii=True,
           date_unit='ms')
In [34]:
!cat python_pandas_io_test.json
{"100":{"10":1,"20":4},"200":{"10":2,"20":5},"300":{"10":3,"20":6}}

Read JSON files

Using orient="split"

Dict like data {index -> [index], columns -> [columns], data -> [values]}

In [35]:
!cat python_pandas_io_test_split.json
{"columns":["100","200","300"],"index":[10,20],"data":[[1,2,3],[4,5,6]]}
In [36]:
df = pd.read_json("python_pandas_io_test_split.json",
                  orient="split")
df
Out[36]:
100 200 300
10 1 2 3
20 4 5 6
Using orient="records"

List like [{column -> value}, ... , {column -> value}]

In [37]:
!cat python_pandas_io_test_records.json
[{"100":1,"200":2,"300":3},{"100":4,"200":5,"300":6}]
In [38]:
df = pd.read_json("python_pandas_io_test_records.json",
                  orient="records")
df
Out[38]:
100 200 300
0 1 2 3
1 4 5 6
Using orient="index"

Dict like {index -> {column -> value}}

In [39]:
!cat python_pandas_io_test_index.json
{"10":{"100":1,"200":2,"300":3},"20":{"100":4,"200":5,"300":6}}
In [40]:
df = pd.read_json("python_pandas_io_test_index.json",
                  orient="index")
df
Out[40]:
100 200 300
10 1 2 3
20 4 5 6
Using orient="columns"

Dict like {column -> {index -> value}}

In [41]:
!cat python_pandas_io_test_columns.json
{"100":{"10":1,"20":4},"200":{"10":2,"20":5},"300":{"10":3,"20":6}}
In [42]:
df = pd.read_json("python_pandas_io_test_columns.json",
                  orient="columns")
df
Out[42]:
100 200 300
10 1 2 3
20 4 5 6
Using orient="values" (for DataFrame only)

Just the values array

In [43]:
!cat python_pandas_io_test_values.json
[[1,2,3],[4,5,6]]
In [44]:
df = pd.read_json("python_pandas_io_test_values.json",
                  orient="values")
df
Out[44]:
0 1 2
0 1 2 3
1 4 5 6
Setting more options
In [45]:
df = pd.read_json("python_pandas_io_test.json",
                  orient=None,
                  typ='frame',
                  dtype=True,
                  convert_axes=True,
                  convert_dates=True,
                  keep_default_dates=True,
                  numpy=False,
                  precise_float=False,
                  date_unit=None,
                  encoding=None,
                  lines=False)
df
Out[45]:
100 200 300
10 1 2 3
20 4 5 6
In [46]:
!rm python_pandas_io_test*.json

Other file formats

Many other file formats can be used to import or export data with JSON.

See the following link for more information: http://pandas.pydata.org/pandas-docs/stable/io.html

Select columns

In [47]:
data_array = np.array([np.arange(1, 10, 1), np.arange(10, 100, 10), np.arange(100, 1000, 100)]).T
df = pd.DataFrame(data_array,
                  index=np.arange(1, 10, 1),
                  columns=['A', 'B', 'C'])
df
Out[47]:
A B C
1 1 10 100
2 2 20 200
3 3 30 300
4 4 40 400
5 5 50 500
6 6 60 600
7 7 70 700
8 8 80 800
9 9 90 900
In [48]:
df.B
Out[48]:
1    10
2    20
3    30
4    40
5    50
6    60
7    70
8    80
9    90
Name: B, dtype: int64
In [49]:
df["B"]
Out[49]:
1    10
2    20
3    30
4    40
5    50
6    60
7    70
8    80
9    90
Name: B, dtype: int64
In [50]:
df.loc[:,"B"]
Out[50]:
1    10
2    20
3    30
4    40
5    50
6    60
7    70
8    80
9    90
Name: B, dtype: int64
In [51]:
df.loc[:,['A','B']]
Out[51]:
A B
1 1 10
2 2 20
3 3 30
4 4 40
5 5 50
6 6 60
7 7 70
8 8 80
9 9 90

Select rows

In [52]:
data_array = np.array([np.arange(1, 10, 1), np.arange(10, 100, 10), np.arange(100, 1000, 100)]).T
df = pd.DataFrame(data_array,
                  index=np.arange(1, 10, 1),
                  columns=['A', 'B', 'C'])
df
Out[52]:
A B C
1 1 10 100
2 2 20 200
3 3 30 300
4 4 40 400
5 5 50 500
6 6 60 600
7 7 70 700
8 8 80 800
9 9 90 900
In [53]:
df.B < 50.
Out[53]:
1     True
2     True
3     True
4     True
5    False
6    False
7    False
8    False
9    False
Name: B, dtype: bool
In [54]:
df[df.B < 50.]
Out[54]:
A B C
1 1 10 100
2 2 20 200
3 3 30 300
4 4 40 400

Select over index: select the 5 first rows

In [55]:
df.iloc[:5]
Out[55]:
A B C
1 1 10 100
2 2 20 200
3 3 30 300
4 4 40 400
5 5 50 500

Select rows and columns

In [56]:
data_array = np.array([np.arange(1, 10, 1), np.arange(10, 100, 10), np.arange(100, 1000, 100)]).T
df = pd.DataFrame(data_array,
                  index=np.arange(1, 10, 1),
                  columns=['A', 'B', 'C'])
df
Out[56]:
A B C
1 1 10 100
2 2 20 200
3 3 30 300
4 4 40 400
5 5 50 500
6 6 60 600
7 7 70 700
8 8 80 800
9 9 90 900
In [57]:
df[df.B < 50][df.A >= 2].loc[:,['A','B']]
/Users/jdecock/anaconda/lib/python3.5/site-packages/ipykernel_launcher.py:1: UserWarning: Boolean Series key will be reindexed to match DataFrame index.
  """Entry point for launching an IPython kernel.
Out[57]:
A B
2 2 20
3 3 30
4 4 40

Apply a function to selected colunms values

In [58]:
data_array = np.array([np.arange(1, 10, 1), np.arange(10, 100, 10), np.arange(100, 1000, 100)]).T
df = pd.DataFrame(data_array,
                  index=np.arange(1, 10, 1),
                  columns=['A', 'B', 'C'])
df
Out[58]:
A B C
1 1 10 100
2 2 20 200
3 3 30 300
4 4 40 400
5 5 50 500
6 6 60 600
7 7 70 700
8 8 80 800
9 9 90 900
In [59]:
df.B *= 2.
df
Out[59]:
A B C
1 1 20.0 100
2 2 40.0 200
3 3 60.0 300
4 4 80.0 400
5 5 100.0 500
6 6 120.0 600
7 7 140.0 700
8 8 160.0 800
9 9 180.0 900
In [60]:
df.B = pow(df.B, 2)
df
Out[60]:
A B C
1 1 400.0 100
2 2 1600.0 200
3 3 3600.0 300
4 4 6400.0 400
5 5 10000.0 500
6 6 14400.0 600
7 7 19600.0 700
8 8 25600.0 800
9 9 32400.0 900

Apply a function to selected rows values

In [61]:
data_array = np.array([np.arange(1, 10, 1), np.arange(10, 100, 10), np.arange(100, 1000, 100)]).T
df = pd.DataFrame(data_array,
                  index=np.arange(1, 10, 1),
                  columns=['A', 'B', 'C'])
df
Out[61]:
A B C
1 1 10 100
2 2 20 200
3 3 30 300
4 4 40 400
5 5 50 500
6 6 60 600
7 7 70 700
8 8 80 800
9 9 90 900
In [62]:
df[df.B < 50.] *= -1.
df
Out[62]:
A B C
1 -1.0 -10.0 -100.0
2 -2.0 -20.0 -200.0
3 -3.0 -30.0 -300.0
4 -4.0 -40.0 -400.0
5 5.0 50.0 500.0
6 6.0 60.0 600.0
7 7.0 70.0 700.0
8 8.0 80.0 800.0
9 9.0 90.0 900.0
In [63]:
df[df.B < 50.] = pow(df[df.B < 50.], 2)
df
Out[63]:
A B C
1 1.0 100.0 10000.0
2 4.0 400.0 40000.0
3 9.0 900.0 90000.0
4 16.0 1600.0 160000.0
5 5.0 50.0 500.0
6 6.0 60.0 600.0
7 7.0 70.0 700.0
8 8.0 80.0 800.0
9 9.0 90.0 900.0

Merge

In [64]:
a1 = np.array([np.arange(1, 5, 1), np.arange(10, 50, 10), np.arange(100, 500, 100)]).T
df1 = pd.DataFrame(a1,
                   columns=['ID', 'B', 'C'])

a2 = np.array([np.arange(1, 5, 1), np.arange(1000, 5000, 1000), np.arange(10000, 50000, 10000)]).T
df2 = pd.DataFrame(a2,
                   columns=['ID', 'B', 'C'])

display(df1)
display(df2)

df = pd.merge(df1, df2, on="ID", suffixes=('_1', '_2'))  #.dropna(how='any')

display(df)
ID B C
0 1 10 100
1 2 20 200
2 3 30 300
3 4 40 400
ID B C
0 1 1000 10000
1 2 2000 20000
2 3 3000 30000
3 4 4000 40000
ID B_1 C_1 B_2 C_2
0 1 10 100 1000 10000
1 2 20 200 2000 20000
2 3 30 300 3000 30000
3 4 40 400 4000 40000

Merge with NaN

In [65]:
a1 = np.array([np.arange(1, 5, 1), np.arange(10, 50, 10), np.arange(100, 500, 100)]).T
df1 = pd.DataFrame(a1,
                   columns=['ID', 'B', 'C'])

a2 = np.array([np.arange(1, 5, 1), np.arange(1000, 5000, 1000), np.arange(10000, 50000, 10000)]).T
df2 = pd.DataFrame(a2,
                   columns=['ID', 'B', 'C'])

df1.iloc[0,2] = np.nan
df1.iloc[1,1] = np.nan
df1.iloc[2,2] = np.nan
df1.iloc[3,1] = np.nan

df2.iloc[0,1] = np.nan
df2.iloc[1,2] = np.nan
df2.iloc[2,1] = np.nan
df2.iloc[3,2] = np.nan

df = pd.merge(df1, df2, on="ID", suffixes=('_1', '_2'))  #.dropna(how='any')

display(df1)
display(df2)
display(df)
ID B C
0 1 10.0 NaN
1 2 NaN 200.0
2 3 30.0 NaN
3 4 NaN 400.0
ID B C
0 1 NaN 10000.0
1 2 2000.0 NaN
2 3 NaN 30000.0
3 4 4000.0 NaN
ID B_1 C_1 B_2 C_2
0 1 10.0 NaN NaN 10000.0
1 2 NaN 200.0 2000.0 NaN
2 3 30.0 NaN NaN 30000.0
3 4 NaN 400.0 4000.0 NaN

Merge with missing rows

In [66]:
a1 = np.array([np.arange(1, 5, 1), np.arange(10, 50, 10), np.arange(100, 500, 100)]).T
df1 = pd.DataFrame(a1,
                   columns=['ID', 'B', 'C'])

a2 = np.array([np.arange(1, 3, 1), np.arange(1000, 3000, 1000), np.arange(10000, 30000, 10000)]).T
df2 = pd.DataFrame(a2,
                   columns=['ID', 'B', 'C'])

display(df1)
display(df2)

print("Left: use only keys from left frame (SQL: left outer join)")
df = pd.merge(df1, df2, on="ID", how="left", suffixes=('_1', '_2'))  #.dropna(how='any')
display(df)

print("Right: use only keys from right frame (SQL: right outer join)")
df = pd.merge(df1, df2, on="ID", how="right", suffixes=('_1', '_2'))  #.dropna(how='any')
display(df)

print("Inner: use intersection of keys from both frames (SQL: inner join) [DEFAULT]")
df = pd.merge(df1, df2, on="ID", how="inner", suffixes=('_1', '_2'))  #.dropna(how='any')
display(df)

print("Outer: use union of keys from both frames (SQL: full outer join)")
df = pd.merge(df1, df2, on="ID", how="outer", suffixes=('_1', '_2'))  #.dropna(how='any')
display(df)
ID B C
0 1 10 100
1 2 20 200
2 3 30 300
3 4 40 400
ID B C
0 1 1000 10000
1 2 2000 20000
Left: use only keys from left frame (SQL: left outer join)
ID B_1 C_1 B_2 C_2
0 1 10 100 1000.0 10000.0
1 2 20 200 2000.0 20000.0
2 3 30 300 NaN NaN
3 4 40 400 NaN NaN
Right: use only keys from right frame (SQL: right outer join)
ID B_1 C_1 B_2 C_2
0 1 10 100 1000 10000
1 2 20 200 2000 20000
Inner: use intersection of keys from both frames (SQL: inner join) [DEFAULT]
ID B_1 C_1 B_2 C_2
0 1 10 100 1000 10000
1 2 20 200 2000 20000
Outer: use union of keys from both frames (SQL: full outer join)
ID B_1 C_1 B_2 C_2
0 1 10 100 1000.0 10000.0
1 2 20 200 2000.0 20000.0
2 3 30 300 NaN NaN
3 4 40 400 NaN NaN

GroupBy

In [67]:
a = np.array([[3, 5, 5, 5, 7, 7, 7, 7],
              [2, 4, 4, 3, 1, 3, 3, 2],
              [3, 4, 5, 6, 1, 8, 9, 8]]).T
df = pd.DataFrame(a,
                  columns=['A', 'B', 'C'])

df
Out[67]:
A B C
0 3 2 3
1 5 4 4
2 5 4 5
3 5 3 6
4 7 1 1
5 7 3 8
6 7 3 9
7 7 2 8

GroupBy with single key

In [68]:
df.groupby(["A"]).count()
Out[68]:
B C
A
3 1 1
5 3 3
7 4 4
In [69]:
df.groupby(["A"]).sum().B
Out[69]:
A
3     2
5    11
7     9
Name: B, dtype: int64
In [70]:
df.groupby(["A"]).mean().B
Out[70]:
A
3    2.000000
5    3.666667
7    2.250000
Name: B, dtype: float64

GroupBy with multiple keys

In [71]:
df.groupby(["A","B"]).count()
Out[71]:
C
A B
3 2 1
5 3 1
4 2
7 1 1
2 1
3 2

Count the number of occurrences of a column value

In [72]:
df.A.value_counts()
Out[72]:
7    4
5    3
3    1
Name: A, dtype: int64
In [73]:
df.A.value_counts().plot.bar()
Out[73]:
<matplotlib.axes._subplots.AxesSubplot at 0x10ef8b940>

Count the number of NaN values in a column

In [74]:
a = np.array([[3, np.nan, 5, np.nan, 7, 7, 7, 7],
              [2, 4, 4, 3, 1, 3, 3, 2],
              [3, 4, 5, 6, 1, 8, 9, 8]]).T
df = pd.DataFrame(a,
                  columns=['A', 'B', 'C'])

df
Out[74]:
A B C
0 3.0 2.0 3.0
1 NaN 4.0 4.0
2 5.0 4.0 5.0
3 NaN 3.0 6.0
4 7.0 1.0 1.0
5 7.0 3.0 8.0
6 7.0 3.0 9.0
7 7.0 2.0 8.0
In [75]:
df.A.isnull().sum()
Out[75]:
2

Plot

In [76]:
#help(df.plot)

Line plot

In [77]:
x = np.arange(0, 6, 0.1)
y1 = np.cos(x)
y2 = np.sin(x)
Y = np.array([y1, y2]).T

df = pd.DataFrame(Y,
                  columns=['cos(x)', 'sin(x)'],
                  index=x)
df.iloc[:10]
Out[77]:
cos(x) sin(x)
0.0 1.000000 0.000000
0.1 0.995004 0.099833
0.2 0.980067 0.198669
0.3 0.955336 0.295520
0.4 0.921061 0.389418
0.5 0.877583 0.479426
0.6 0.825336 0.564642
0.7 0.764842 0.644218
0.8 0.696707 0.717356
0.9 0.621610 0.783327
In [78]:
df.plot(legend=True)
Out[78]:
<matplotlib.axes._subplots.AxesSubplot at 0x1115b21d0>

or

In [79]:
df.plot.line(legend=True)
Out[79]:
<matplotlib.axes._subplots.AxesSubplot at 0x111546f28>

Bar plot

In [80]:
x = np.arange(0, 6, 0.5)
y1 = np.cos(x)
y2 = np.sin(x)
Y = np.array([y1, y2]).T

df = pd.DataFrame(Y,
                  columns=['cos(x)', 'sin(x)'],
                  index=x)
df
Out[80]:
cos(x) sin(x)
0.0 1.000000 0.000000
0.5 0.877583 0.479426
1.0 0.540302 0.841471
1.5 0.070737 0.997495
2.0 -0.416147 0.909297
2.5 -0.801144 0.598472
3.0 -0.989992 0.141120
3.5 -0.936457 -0.350783
4.0 -0.653644 -0.756802
4.5 -0.210796 -0.977530
5.0 0.283662 -0.958924
5.5 0.708670 -0.705540

Vertical

In [81]:
df.plot.bar(legend=True)
Out[81]:
<matplotlib.axes._subplots.AxesSubplot at 0x11177ac88>
In [82]:
df.plot.bar(legend=True, stacked=True)
Out[82]:
<matplotlib.axes._subplots.AxesSubplot at 0x111b580f0>

Horizontal

In [83]:
df.plot.barh(legend=True)
Out[83]:
<matplotlib.axes._subplots.AxesSubplot at 0x111d10c18>

Histogram

In [84]:
x1 = np.random.normal(size=(10000))
x2 = np.random.normal(loc=3, scale=2, size=(10000))
X = np.array([x1, x2]).T

df = pd.DataFrame(X, columns=[r'$\mathcal{N}(0,1)$', r'$\mathcal{N}(3,2)$'])

df.plot.hist(alpha=0.2, bins=100, legend=True)
Out[84]:
<matplotlib.axes._subplots.AxesSubplot at 0x111f40da0>

Box plot

In [85]:
x1 = np.random.normal(size=(10000))
x2 = np.random.normal(loc=3, scale=2, size=(10000))
X = np.array([x1, x2]).T

df = pd.DataFrame(X, columns=[r'$\mathcal{N}(0,1)$', r'$\mathcal{N}(3,2)$'])

df.plot.box()
Out[85]:
<matplotlib.axes._subplots.AxesSubplot at 0x11288d2e8>

Hexbin plot

In [86]:
df = pd.DataFrame(np.random.randn(1000, 2), columns=['a', 'b'])
df['b'] = df['b'] + np.arange(1000)
df.plot.hexbin(x='a', y='b', gridsize=25)
Out[86]:
<matplotlib.axes._subplots.AxesSubplot at 0x1129e4518>

Kernel Density Estimation (KDE) plot

In [87]:
x1 = np.random.normal(size=(10000))
x2 = np.random.normal(loc=3, scale=2, size=(10000))
X = np.array([x1, x2]).T

df = pd.DataFrame(X, columns=[r'$\mathcal{N}(0,1)$', r'$\mathcal{N}(3,2)$'])

df.plot.kde()
Out[87]:
<matplotlib.axes._subplots.AxesSubplot at 0x11266e2e8>

Area plot

In [88]:
df = pd.DataFrame(np.random.rand(10, 4), columns=['a', 'b', 'c', 'd'])

df.plot.area()
Out[88]:
<matplotlib.axes._subplots.AxesSubplot at 0x1153eae80>

Pie chart

In [89]:
x = np.random.randint(low=0, high=6, size=(50))

df = pd.DataFrame(x, columns=["A"])
df.A.value_counts()
Out[89]:
3    11
0     9
5     8
4     8
1     8
2     6
Name: A, dtype: int64
In [90]:
df.A.value_counts().plot.pie(y="A")
Out[90]:
<matplotlib.axes._subplots.AxesSubplot at 0x114c86c88>

Scatter plot

In [91]:
x1 = np.random.normal(size=(10000))
x2 = np.random.normal(loc=3, scale=2, size=(10000))
X = np.array([x1, x2]).T

df = pd.DataFrame(X, columns=[r'$\mathcal{N}(0,1)$', r'$\mathcal{N}(3,2)$'])

df.plot.scatter(x=r'$\mathcal{N}(0,1)$',
                y=r'$\mathcal{N}(3,2)$',
                alpha=0.2)
Out[91]:
<matplotlib.axes._subplots.AxesSubplot at 0x115372668>