第2回時系列データの描画¶

目次¶

はじめに
データの読み込み
データのプロット
- 折れ線グラフ
- 散布図
- 図のおめかし
関数の細かい使い方を調べる
追加で勉強するとよいこと

はじめに¶

生のPythonがわかってきたところで、さっそくデータ解析をやってみましょう。データ解析にもいろいろありますが、基本的には次のような流れになります。

データをプログラム上に読み込む
読み込んだデータで必要な処理・計算を行う
結果を文字やグラフとして出力する

より複雑になると、1.の前にデータをプログラムが読める状態に変換する工程が入ったり、2.の計算が膨大な量になっていったりするのですが、ここでは第一歩として「データを読み込む」→「グラフを描く」という流れを学びます。

Python自体には図を描画する機能がありませんので、ライブラリであるmatplotlibの力を借ります。下の1文は、「matplotlib内のpyplotという機能をpltという変数名で読み込む」という命令です。

import matplotlib.pyplot as plt

このように、外部の機能を用いるプログラムを書くときにははじめにその機能を読み込んでくる必要があります。データ解析を行う際にはmatplotlib.pyplotやNumPyなどといったライブラリを毎回のように使いますので、プログラムの最初でまとめて読み込んでおくのが良いでしょう。 NumPyはさまざまな計算に関わる機能を詰め込んだ非常に強力なライブラリで、詳細は次回以降に解説します。ここでは、大きなデータを読み込むためにその機能の一部を使います。

import numpy as np # numpyをnpという名前で読み込む

データの読み込み¶

それでは、図示するデータをプログラムに読み込みましょう。

今回は読み込みが簡単な例と複雑な例として、以下のデータを用います。

2010年から2021年までの年ごとの台風上陸数
データソースは気象庁HP
2019年に係留ブイ「Kuroshio Extension Observatory (KEO)」で観測された、1日ごとの気温・海面水温・風速
32.3°N, 144.6°E（黒潮続流南側）に係留されているブイ。気象・海洋データを常時取得している。データソースはNOAA

データが少ない場合：手入力¶

データ数が少ない場合は、手入力でデータを記入することもできます。以下は、2001年以降の年ごとの台風上陸数をリストとして作成する場合です。

num_typhoon = [2, 3, 2, 2, 4, 4, 6, 4, 5, 5, 0, 3]
year = list(range(2010, 2022)) 
print(num_typhoon)
print(year)

[2, 3, 2, 2, 4, 4, 6, 4, 5, 5, 0, 3]
[2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021]

データが多い場合：ファイルから読み込み¶

一方、データが多いときに手打ちで入力しようとすると、手間がかかるばかりか誤入力の危険性もあります。せっかくプログラムを使うのですから、コンピュータに自動的にさせられるところはそのようにしてしまうのがスマートです。 Python自体やNumPy、Pandasなどのライブラリには多彩な読み込み機能が実装されており、それを用いてファイルからデータを読み込みましょう。

まず以下からデータをダウンロードし、適当なフォルダに保存します。

KEOで観測された2019年の日ごとの気温 https://yasushifujiwara.github.io/dataanalysistutorial/data_KEO/KEOairt_2019daily.csv
KEOで観測された2019年の日ごとの海面水温 https://yasushifujiwara.github.io/dataanalysistutorial/data_KEO/KEOsst_2019daily.csv
KEOで観測された2019年の日ごとの風速 https://yasushifujiwara.github.io/dataanalysistutorial/data_KEO/KEOwind_2019daily.csv

今回のデータはCSV（Comma-Separated Value）形式のファイルです。CSVはその名の通り、コンマ,で区切られた数字が羅列されている形式です。ただし今回は各行に数字は1つだけなので、次のような内容になっています。

(KEOairt_2019daily.csvの内容)
15.1
15.8
13.0
13.8
17.3
...

各行に日ごとの気温が記されており、これが365行続いています。ほかのファイルも同様です。さて、このファイルを読み込んでみましょう。

airt = np.loadtxt(r"C:\Users\yfuji\Desktop\lesson\data\data_KEO\KEOairt_2019daily.csv")

CSVを読み込む関数であるNumPy（今はnpという名前で読み込まれています）のloadtxt関数を使ってファイルを読み込み、読み込んだ配列にairtという名前を付けました。ライブラリはフォルダのような階層構造になっており、np.loadtxtは「npの中のloadtxt関数」を意味します。

np.loadtxt関数に与えた文字列r"C:\Users\yfuji\lesson\data\data_KEO\KEOairt_2019daily.csv"は、読み込みたいファイルの場所を示したパスです。私のWindows PCの場合はこのような位置でしたが、自分のパソコンのOS・ユーザー名・ファイル位置に合わせて適宜書き換えましょう。ヒントとして、「現在地（このJupyterノートブックが置かれている場所）のパス」は次のようなコマンドで得られます。

%pwd

'C:\\Users\\yfuji\\Desktop\\lesson'

Windowsではフォルダの階層関係をバックスラッシュ\で表しますが、バックスラッシュは特別な意味（エスケープ）にも使われるのでそれを避けるために二重の\\で表されています。「この文字列内で\は別の意味ではなく、単純に\を意味しますよ」というのを教えるために、上の例では文字列の前にrをつけています。一方、MacやLinuxではフォルダの階層関係はスラッシュ/で表されているので気にする必要はありません。

さて、読み込んだairtはどのような内容でしょうか。

print(airt)

[15.1 15.8 13.  13.8 17.3 13.4 13.5 16.8 13.  15.8 15.  15.  14.1 12.8
 15.7 17.1 15.9 13.3 14.4 17.6 15.  13.9 15.6 13.7 13.6 12.2 10.7 14.4
 12.3 13.5 17.3 12.4 15.6 17.2 18.  15.9 17.7 17.9 15.2 16.1 12.8 15.9
 14.  15.3 12.4 15.6 16.4 13.3 14.3 17.3 18.5 14.8 15.5 13.9 14.2 17.2
 15.6 15.  18.6 15.6 14.9 16.3 18.2 14.6 17.  17.3 12.5 14.  17.2 18.7
 17.8 16.8 14.6 15.9 15.4 15.6 14.5 15.7 17.4 18.8 18.8 15.8 14.2 16.9
 16.9 16.6 16.8 16.2 17.2 17.7 14.8 12.8 12.  15.3 18.  18.4 18.7 16.3
 14.  15.5 15.8 15.  14.6 16.5 17.9 16.8 16.2 17.1 18.  17.2 18.2 19.
 18.3 18.7 19.7 19.8 17.9 15.8 16.8 17.7 20.5 18.7 17.8 17.6 18.6 18.5
 18.9 18.2 19.1 20.2 18.2 19.2 17.4 18.1 18.7 19.2 19.2 19.  19.1 20.
 20.4 20.8 20.4 20.3 20.7 20.9 21.4 21.7 21.2 20.9 21.4 20.8 20.3 21.1
 20.6 20.8 21.3 22.1 22.8 21.8 22.4 21.3 21.1 20.8 21.  22.7 23.2 22.4
 22.5 21.5 22.7 22.  23.8 24.2 24.  22.6 23.1 24.2 24.4 24.3 24.2 24.1
 24.2 24.6 25.  25.2 25.5 25.7 24.4 24.  24.1 23.8 24.9 25.1 25.8 26.
 26.  26.3 27.4 27.2 27.1 25.6 24.9 25.9 26.9 27.4 27.3 27.3 27.5 27.8
 27.9 28.1 28.  28.1 28.2 28.1 27.9 27.7 28.3 28.5 28.6 28.7 28.8 28.8
 28.8 28.7 28.6 28.5 28.4 28.6 28.7 28.4 29.1 29.3 29.3 29.2 27.8 27.6
 28.4 28.9 28.9 28.9 29.1 29.1 29.2 29.1 29.  29.  28.9 29.1 29.2 28.9
 28.7 29.  27.7 27.3 27.8 28.  28.7 28.5 28.4 27.2 25.9 27.2 28.1 28.1
 28.2 27.4 26.7 26.1 26.3 26.1 26.4 25.9 25.2 25.6 26.5 26.8 26.7 26.6
 26.9 25.9 27.  27.6 27.7 26.6 26.8 25.7 24.7 24.6 24.9 26.5 26.2 25.2
 26.7 24.2 23.5 25.8 25.5 23.6 23.6 25.5 24.2 24.  23.5 24.4 24.4 21.8
 20.6 21.1 23.6 22.6 21.2 20.8 23.2 22.3 22.2 23.4 22.2 21.7 20.9 22.8
 23.1 19.6 19.  20.5 21.1 22.9 23.1 20.8 22.6 20.  16.4 16.3 19.1 21.8
 19.8 19.  18.7 18.6 18.2 17.4 18.4 21.2 21.8 20.6 19.  19.5 17.3 17.3
 19.9 21.1 21.  19.8 18.9 20.  17.6 17.1 17.6 17.9 17.5 14.4 15.3 17.8
 19.2]

この通り、ファイルの内容が配列として読み込まれました。なお、見た目ではわかりませんがこの配列はリストではなくNumPyのもつndarrayという型に入っています。ndarrayはたくさんの数を扱うのに極めて便利な機能をもっていますが、その内容は次回以降に詳しく見ていくことにします。今回は、リストだと思ってもらって構いません。

同様に、海面水温をsst (SST=sea surface temperature)、風速をwindとして読み込みましょう。

sst = np.loadtxt(r"C:\Users\yfuji\Desktop\lesson\data\data_KEO\KEOsst_2019daily.csv")
wind = np.loadtxt(r"C:\Users\yfuji\Desktop\lesson\data\data_KEO\KEOwind_2019daily.csv")
print(sst)
print(wind)

[20.42 20.29 20.18 19.97 19.99 19.95 19.87 19.91 19.77 19.63 19.5  19.45
 19.3  19.05 19.12 19.15 18.71 18.52 18.49 18.68 18.76 18.61 18.69 18.78
 18.83 18.77 18.66 18.62 18.53 18.65 18.59 18.51 18.55 18.49 18.47 18.48
 18.45 18.52 18.41 18.49 18.55 18.55 18.43 18.41 18.33 18.22 18.25 18.18
 18.26 18.18 18.28 18.1  18.14 18.09 18.17 17.98 18.25 18.34 18.17 18.24
 18.18 18.14 18.11 18.14 18.16 18.26 18.2  18.19 18.08 18.11 18.15 18.45
 18.43 18.42 18.41 18.43 18.46 18.54 18.54 18.72 19.08 19.01 18.79 18.85
 18.92 18.68 18.54 18.52 18.5  18.5  18.29 18.12 18.05 18.13 18.29 18.52
 18.76 18.74 18.62 18.57 18.55 18.54 18.48 18.49 18.43 18.4  18.35 18.33
 18.62 18.49 18.57 19.   18.93 18.63 18.86 19.1  19.57 19.32 19.39 18.67
 18.77 18.78 18.73 18.85 18.9  18.94 19.41 19.41 19.4  20.5  19.81 19.9
 19.64 19.49 19.4  19.86 20.31 20.25 19.87 19.89 19.97 20.37 20.87 20.69
 21.2  21.64 21.83 21.82 21.07 21.12 21.37 21.61 21.66 21.37 21.4  21.5
 21.99 21.86 21.43 21.13 21.04 20.98 21.14 21.4  21.33 21.31 21.5  21.69
 22.01 21.81 21.68 22.02 22.81 23.24 23.64 23.48 23.9  24.19 23.75 23.7
 23.1  23.05 23.54 24.04 24.32 24.59 24.76 24.73 24.77 25.24 25.01 25.08
 25.08 25.07 25.26 25.33 25.64 26.18 26.98 26.94 27.03 26.94 26.76 26.37
 26.43 27.02 26.86 26.66 27.07 27.82 28.32 28.75 28.88 28.89 28.63 28.29
 28.25 27.78 28.51 28.68 28.64 28.54 28.5  28.46 28.45 28.5  28.53 28.41
 28.37 28.82 29.43 29.2  29.29 29.35 29.15 28.83 28.69 28.56 29.14 28.62
 28.53 28.73 28.93 29.48 29.61 30.   29.8  29.24 28.78 28.74 28.77 28.87
 29.33 29.69 29.12 28.78 28.45 28.21 28.02 28.12 28.18 28.15 27.88 27.58
 27.86 27.86 27.72 27.53 27.83 27.91 27.66 27.3  27.21 27.09 27.05 26.99
 27.09 27.1  27.27 27.53 27.18 27.33 27.28 27.2  26.97 26.88 27.01 27.24
 26.87 26.57 26.56 26.64 26.9  26.71 26.56 26.53 26.17 26.07 25.93 25.93
 25.91 25.83 25.69 25.81 25.7  25.68 25.73 25.61 25.45 25.29 25.15 25.04
 24.89 24.73 24.59 24.49 24.42 24.37 24.33 24.15 24.11 23.98 23.97 23.78
 23.61 23.47 23.5  23.53 23.44 23.48 23.22 23.2  22.91 22.53 22.46 22.35
 22.33 22.24 22.16 22.11 22.09 21.9  21.79 21.74 21.83 21.76 21.59 21.58
 21.61 21.55 21.4  21.42 21.52 21.52 21.5  21.52 21.41 21.33 21.02 20.9
 20.89 20.7  20.61 20.51 20.66]
[ 5.1  8.7  9.3  3.1  9.1  9.3  3.6  9.7 10.3  8.9  7.6  3.5 10.8  6.
  5.7 11.2 12.4  9.5  6.2 12.5 10.6  6.4  8.4  7.5  4.8 12.8 12.   9.1
 13.   3.4 12.2  7.   8.3  9.7  5.1 10.4  5.3  8.1  7.1  3.3  8.   3.5
  4.2  9.2  8.2  5.   9.2  4.8  4.   9.9  9.6  6.4  6.1 11.3  4.5  5.9
  6.5  8.6  9.5  5.7  6.1  8.5  7.4  7.5  9.   9.5  9.7  3.7  9.7 12.3
  8.6 12.3  7.6  5.8  7.8  8.   4.3  6.4  8.6  7.6  6.9  5.2  6.1  7.4
  5.3  6.6  4.5  6.6  9.7 11.5  9.1  8.9  7.7  8.8 12.1  7.1  7.4  4.9
 10.1  6.5  6.9  5.6  7.2  8.8  3.9  4.3 11.6  8.1  5.   4.   5.1  2.
  7.2  8.3  8.4  7.4  8.3  5.2  6.  10.4 12.1  2.8  3.   5.   7.8  4.9
  4.3  3.   5.8  1.4  2.7  5.6  5.7  6.5  9.3  9.4  8.6  9.8 11.3 11.5
 11.6  7.1  3.2  5.4  2.8  1.7  3.6  5.6  8.4  3.3  6.3  4.1  7.5  3.2
  5.7  2.5  0.5  8.5  8.7  6.4  7.2  5.3  4.4  6.6 10.1 10.1  9.5  1.6
  2.6  5.2  2.2  3.4  8.7  8.7  8.1  2.6  2.5  6.1  9.4  8.6  9.1  8.
  6.6  5.6  6.6  5.9  6.   6.8  3.5  3.8  6.1  4.2  7.1  4.4  7.8  6.6
  4.2  2.7  1.8  4.   4.2  3.7  2.7  2.8  5.8  3.7  6.   4.2  2.6  3.1
  2.8  2.3  2.1  3.7  7.1  7.8  4.9  3.3  2.3  3.2  5.6  6.5  8.5  8.6
  7.4  6.7  7.6  8.2  6.4  3.8  1.6  2.7  3.6  6.4  7.9  7.4  6.7  2.8
  1.6  7.6  7.2  5.3  4.   0.3  1.9  1.8  3.3  5.5  7.   8.7  6.5  4.2
  1.9  2.4  5.9 11.5 12.7 11.6 10.7  7.4  7.9  4.5 11.3  4.9  5.9  7.2
  6.4  1.9  9.4  8.6  7.6  8.   1.5  9.3  7.8  4.8  7.   5.3  1.3  1.8
  5.8  3.9  9.8 11.9 12.2  3.1  3.8  7.7 10.1 10.  10.2  6.4  1.6  8.1
  7.9  7.5 15.6  6.7  7.2  3.4  2.2  6.2  2.5  1.9  2.   4.7  3.8  8.2
  6.   6.8  4.3  4.4  3.8  7.9  5.3  5.9  4.8  5.5  5.3  4.7  6.1  8.8
  9.4  6.7  6.7  9.5 14.3 11.6  5.4  9.9  3.7  6.8 11.4  7.3  5.7  8.2
  7.6  9.4  9.3  3.9  3.7 11.9 11.   8.5  4.7  8.4  1.9  8.9  7.9  4.9
  8.3  6.4  9.   7.1  2.7  3.4  4.5  7.6  7.3  1.2 12.3  8.3  5.4  7.1
  5.6]

データのプロット¶

折れ線グラフ¶

さて、いよいよグラフを描いてみましょう。1年間の気温などの変化ですから、横軸に時間、縦軸に日付をとった折れ線グラフで表すのが基本です。図の描画はmatplotlibの仕事です。今はmatplotlib内のpyplotモジュール（機能群、小ライブラリと思えばよいです）をpltという名で読み込んでいます。

基本の描画は、pyplot内のplot関数を用いてplt.plot(x, y)の形で行えます。ここでxとyは折れ線グラフの点のx座標・y座標が順に入った配列です。ここではxに日付、yに気温を使います。

date = list(range(1, 366))
print(date) # 1年の何日目か

plt.plot(date, airt)

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299, 300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 313, 314, 315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325, 326, 327, 328, 329, 330, 331, 332, 333, 334, 335, 336, 337, 338, 339, 340, 341, 342, 343, 344, 345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363, 364, 365]

[<matplotlib.lines.Line2D at 0x17ead3342e0>]

続けて別の配列のplotを行うことで、複数のグラフを描くことができます。ここでは、気温airtと海面水温sstを描くことにしましょう。さらに、複数の量を同時にプロットしているので凡例を表示することにします。plt.plotの呼び出しの際にオプション引数labelに凡例の文字列を与え、すべてのプロットが終わったのちにplt.legend関数を呼び出すことで凡例が表示できます。また、グラフを描くときは縦軸・横軸が何を表しているのか記載するのが基本中の基本です。横軸のラベルはplt.xlabel, 縦軸のラベルはplt.ylabelで表示できます。ここでは横軸が1年の中の日数なので"Day of year", 縦軸が気温及び水温なので"Temperature [degree]"としましょう。 Jupyter labではセルごとに新しい図が描かれますので、もう一度airtから書き直します。

plt.plot(date, airt, label="air temperature")
plt.plot(date, sst, label="SST")
plt.legend()
plt.xlabel("Day of year")
plt.ylabel("Temperature [degree]")

Text(0, 0.5, 'Temperature [degree]')

さらに、plt.xlimやplt.ylimで描画の範囲を指定できます。短い変動に着目したいときは、図を拡大しましょう。

plt.plot(date, airt, label="air temperature")
plt.plot(date, sst, label="SST")
plt.legend()
plt.xlabel("Day of year")
plt.ylabel("Temperature [degree]")
plt.xlim(150, 210) # x座標が150~210までを表示する
plt.ylim(18, 28) # y座標が18~28までを表示する

(18.0, 28.0)

また、plt.plotのオプション引数colorで線の色を指定できます。 colorとして指定できる文字としては"r"(赤), "g"(緑), "b"(青), "k"(黒), "gray", "orange"などのほか、"#a03261"のような16進数コード、"C0", "C1"...といったデフォルトで設定された色の列（結局これがおすすめ）などが使えます。

# C0, C1, ...の例
plt.plot([0, 1], [4, 4], color="C0", label="C0")
plt.plot([0, 1], [3, 3], color="C1", label="C1")
plt.plot([0, 1], [2, 2], color="C2", label="C2")
plt.plot([0, 1], [1, 1], color="C3", label="C3")
plt.plot([0, 1], [0, 0], color="C4", label="C4")
plt.legend()

<matplotlib.legend.Legend at 0x17eadae43d0>

練習2-1¶

年間の風速の変化を折れ線グラフで表示しましょう。横軸ラベル、縦軸ラベルを適切に指定しましょう。（windに入っている風速は[m/s]の単位です）線の色を自分で好きに設定しましょう。

散布図¶

上の図を見たらわかる通り、気温と海面水温は高い正の相関を持っています。まあ、気温・水温共に夏は高く冬は低いので、当たり前と言えば当たり前ですね。 2つの量の相関関係を図示するには、散布図が便利です。散布図も折れ線グラフと同じく、plt.plot関数にx座標とy座標を与えるだけで作れますが、線ではなく点を描く必要があります。3つ目の引数を与えてplt.plot(x座標, y座標, 線種)とすることで線や点の種類を指定できます。線種として与えられる文字列には以下のようなものがあります。

"-" 実線（デフォルト）
"--" 破線
":" 点線
"." 点（中サイズ）
"o" 点（大サイズ）
"," 点（小サイズ）
".-" 点 + 実線

ここでは"."を使って、横軸に気温・縦軸に海面水温をとった散布図を描いてみましょう。

plt.plot(airt, sst, ".")
plt.plot([18, 30], [18, 30], "--") # SST = airtを示す補助線、別になくてもいい
plt.xlabel("air temperature [degree]")
plt.ylabel("SST [degree]")

Text(0, 0.5, 'SST [degree]')

不思議な分布をしていますね。処理をせずにただ図示しただけの図を見ても、たくさんの疑問や発見が出てきます。

なぜほとんどの点が海面水温 > 気温となる（海水のほうが暖かい）のか？
なぜ海面水温18度のところに底辺がある（海水がそれよりも冷たくならない）のか？
なぜ高温になるにつれて気温と海面水温の差が小さくなるのか？

ぜひ考えてみてください。

練習2-2¶

気温と海面水温の散布図を、季節ごとに色分けしてプロットしてみましょう。例：4月から9月を加熱期、10月から3月を冷却期とする（4月1日は年間の91日目、10月1日は年間の273日目）。

図のおめかし¶

ここまで紹介した方法で、1次元データについておおよその図示はできるようになったと思います。しかし、これらの図はプレゼンスライドや論文に使うのは不十分です。特に、図のサイズや線の太さ、軸・ラベルの文字サイズを調整する必要があります。ここでは気温・海面水温の時系列グラフを例に、遠くからでも見やすい図に整形して保存するまでを実演します。

plt.figure(figsize=(10,5))                          # 図全体のサイズ、今回は少し横長にする
plt.plot(date, airt, label="air temperature", lw=1.5) # lwはlinewidth、つまり線の太さのこと
plt.plot(date, sst, label="SST", lw=1.5)              # 1.5は元と同じですが、より太くするのが良い場合もあります
plt.legend(fontsize=16, loc="upper left")           # fontsizeで凡例の文字サイズを大きく、位置を左上に指定
plt.xlabel("Day of year", fontsize=18)              # fontsizeでラベルの文字サイズを大きく
plt.ylabel("Temperature [degree]", fontsize=18)     # 同上
plt.tick_params(labelsize=16)                       # x軸・y軸の数字のサイズを大きく
plt.savefig(r"\Users\yfuji\Desktop\lesson\data\airt_sst.png") # 保存したい画像のパス、拡張子から形式を判別してくれる

関数の細かい使い方を調べる¶

ここではplt.plotなど代表的な関数の使い方の基本を紹介しましたが、これらの関数にはたくさんのオプションがありそれを活かすことでより描画の自由度が広がります。 NumPyやmatplotlibの使い方で迷ったら、公式ドキュメントを参照するのが1番確実です。

matplotlib.pyplot公式ドキュメント https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.html
NumPy公式ドキュメント https://numpy.org/devdocs/reference/routines.html

また、Jupyter labやipython環境では関数名?と入力することで関数のヘルプ（docstringと呼ばれる、説明文）が表示されます。インターネットがなくてもすぐに使い方を調べられるので便利です。以下では、CSVの読み込みに使ったnp.loadtxtの説明文を表示しています。

これらの説明文や、プログラムがうまく走らなかったときのエラーメッセージはすべて英語です。 Pythonは比較的メジャーな言語で日本語の情報も多く出回っていますが、それでも公式の情報やエラーメッセージの解読で英語を避けて通ることはできません。覚悟を決めて、頑張って読みましょう。

np.loadtxt?

Signature:
np.loadtxt(
    fname,
    dtype=<class 'float'>,
    comments='#',
    delimiter=None,
    converters=None,
    skiprows=0,
    usecols=None,
    unpack=False,
    ndmin=0,
    encoding='bytes',
    max_rows=None,
    *,
    quotechar=None,
    like=None,
)
Docstring:
Load data from a text file.

Each row in the text file must have the same number of values.

Parameters
----------
fname : file, str, pathlib.Path, list of str, generator
    File, filename, list, or generator to read.  If the filename
    extension is ``.gz`` or ``.bz2``, the file is first decompressed. Note
    that generators must return bytes or strings. The strings
    in a list or produced by a generator are treated as lines.
dtype : data-type, optional
    Data-type of the resulting array; default: float.  If this is a
    structured data-type, the resulting array will be 1-dimensional, and
    each row will be interpreted as an element of the array.  In this
    case, the number of columns used must match the number of fields in
    the data-type.
comments : str or sequence of str or None, optional
    The characters or list of characters used to indicate the start of a
    comment. None implies no comments. For backwards compatibility, byte
    strings will be decoded as 'latin1'. The default is '#'.
delimiter : str, optional
    The string used to separate values. For backwards compatibility, byte
    strings will be decoded as 'latin1'. The default is whitespace.
converters : dict or callable, optional
    A function to parse all columns strings into the desired value, or
    a dictionary mapping column number to a parser function.
    E.g. if column 0 is a date string: ``converters = {0: datestr2num}``.
    Converters can also be used to provide a default value for missing
    data, e.g. ``converters = lambda s: float(s.strip() or 0)`` will
    convert empty fields to 0.
    Default: None.
skiprows : int, optional
    Skip the first `skiprows` lines, including comments; default: 0.
usecols : int or sequence, optional
    Which columns to read, with 0 being the first. For example,
    ``usecols = (1,4,5)`` will extract the 2nd, 5th and 6th columns.
    The default, None, results in all columns being read.

    .. versionchanged:: 1.11.0
        When a single column has to be read it is possible to use
        an integer instead of a tuple. E.g ``usecols = 3`` reads the
        fourth column the same way as ``usecols = (3,)`` would.
unpack : bool, optional
    If True, the returned array is transposed, so that arguments may be
    unpacked using ``x, y, z = loadtxt(...)``.  When used with a
    structured data-type, arrays are returned for each field.
    Default is False.
ndmin : int, optional
    The returned array will have at least `ndmin` dimensions.
    Otherwise mono-dimensional axes will be squeezed.
    Legal values: 0 (default), 1 or 2.

    .. versionadded:: 1.6.0
encoding : str, optional
    Encoding used to decode the inputfile. Does not apply to input streams.
    The special value 'bytes' enables backward compatibility workarounds
    that ensures you receive byte arrays as results if possible and passes
    'latin1' encoded strings to converters. Override this value to receive
    unicode arrays and pass strings as input to converters.  If set to None
    the system default is used. The default value is 'bytes'.

    .. versionadded:: 1.14.0
max_rows : int, optional
    Read `max_rows` rows of content after `skiprows` lines. The default is
    to read all the rows. Note that empty rows containing no data such as
    empty lines and comment lines are not counted towards `max_rows`,
    while such lines are counted in `skiprows`.

    .. versionadded:: 1.16.0
    
    .. versionchanged:: 1.23.0
        Lines containing no data, including comment lines (e.g., lines 
        starting with '#' or as specified via `comments`) are not counted 
        towards `max_rows`.
quotechar : unicode character or None, optional
    The character used to denote the start and end of a quoted item.
    Occurrences of the delimiter or comment characters are ignored within
    a quoted item. The default value is ``quotechar=None``, which means
    quoting support is disabled.

    If two consecutive instances of `quotechar` are found within a quoted
    field, the first is treated as an escape character. See examples.

    .. versionadded:: 1.23.0
like : array_like, optional
    Reference object to allow the creation of arrays which are not
    NumPy arrays. If an array-like passed in as ``like`` supports
    the ``__array_function__`` protocol, the result will be defined
    by it. In this case, it ensures the creation of an array object
    compatible with that passed in via this argument.

    .. versionadded:: 1.20.0

Returns
-------
out : ndarray
    Data read from the text file.

See Also
--------
load, fromstring, fromregex
genfromtxt : Load data with missing values handled as specified.
scipy.io.loadmat : reads MATLAB data files

Notes
-----
This function aims to be a fast reader for simply formatted files.  The
`genfromtxt` function provides more sophisticated handling of, e.g.,
lines with missing values.

.. versionadded:: 1.10.0

The strings produced by the Python float.hex method can be used as
input for floats.

Examples
--------
>>> from io import StringIO   # StringIO behaves like a file object
>>> c = StringIO("0 1\n2 3")
>>> np.loadtxt(c)
array([[0., 1.],
       [2., 3.]])

>>> d = StringIO("M 21 72\nF 35 58")
>>> np.loadtxt(d, dtype={'names': ('gender', 'age', 'weight'),
...                      'formats': ('S1', 'i4', 'f4')})
array([(b'M', 21, 72.), (b'F', 35, 58.)],
      dtype=[('gender', 'S1'), ('age', '<i4'), ('weight', '<f4')])

>>> c = StringIO("1,0,2\n3,0,4")
>>> x, y = np.loadtxt(c, delimiter=',', usecols=(0, 2), unpack=True)
>>> x
array([1., 3.])
>>> y
array([2., 4.])

The `converters` argument is used to specify functions to preprocess the
text prior to parsing. `converters` can be a dictionary that maps
preprocessing functions to each column:

>>> s = StringIO("1.618, 2.296\n3.141, 4.669\n")
>>> conv = {
...     0: lambda x: np.floor(float(x)),  # conversion fn for column 0
...     1: lambda x: np.ceil(float(x)),  # conversion fn for column 1
... }
>>> np.loadtxt(s, delimiter=",", converters=conv)
array([[1., 3.],
       [3., 5.]])

`converters` can be a callable instead of a dictionary, in which case it
is applied to all columns:

>>> s = StringIO("0xDE 0xAD\n0xC0 0xDE")
>>> import functools
>>> conv = functools.partial(int, base=16)
>>> np.loadtxt(s, converters=conv)
array([[222., 173.],
       [192., 222.]])

This example shows how `converters` can be used to convert a field
with a trailing minus sign into a negative number.

>>> s = StringIO('10.01 31.25-\n19.22 64.31\n17.57- 63.94')
>>> def conv(fld):
...     return -float(fld[:-1]) if fld.endswith(b'-') else float(fld)
...
>>> np.loadtxt(s, converters=conv)
array([[ 10.01, -31.25],
       [ 19.22,  64.31],
       [-17.57,  63.94]])

Using a callable as the converter can be particularly useful for handling
values with different formatting, e.g. floats with underscores:

>>> s = StringIO("1 2.7 100_000")
>>> np.loadtxt(s, converters=float)
array([1.e+00, 2.7e+00, 1.e+05])

This idea can be extended to automatically handle values specified in
many different formats:

>>> def conv(val):
...     try:
...         return float(val)
...     except ValueError:
...         return float.fromhex(val)
>>> s = StringIO("1, 2.5, 3_000, 0b4, 0x1.4000000000000p+2")
>>> np.loadtxt(s, delimiter=",", converters=conv, encoding=None)
array([1.0e+00, 2.5e+00, 3.0e+03, 1.8e+02, 5.0e+00])

Note that with the default ``encoding="bytes"``, the inputs to the
converter function are latin-1 encoded byte strings. To deactivate the
implicit encoding prior to conversion, use ``encoding=None``

>>> s = StringIO('10.01 31.25-\n19.22 64.31\n17.57- 63.94')
>>> conv = lambda x: -float(x[:-1]) if x.endswith('-') else float(x)
>>> np.loadtxt(s, converters=conv, encoding=None)
array([[ 10.01, -31.25],
       [ 19.22,  64.31],
       [-17.57,  63.94]])

Support for quoted fields is enabled with the `quotechar` parameter.
Comment and delimiter characters are ignored when they appear within a
quoted item delineated by `quotechar`:

>>> s = StringIO('"alpha, #42", 10.0\n"beta, #64", 2.0\n')
>>> dtype = np.dtype([("label", "U12"), ("value", float)])
>>> np.loadtxt(s, dtype=dtype, delimiter=",", quotechar='"')
array([('alpha, #42', 10.), ('beta, #64',  2.)],
      dtype=[('label', '<U12'), ('value', '<f8')])

Two consecutive quote characters within a quoted field are treated as a
single escaped character:

>>> s = StringIO('"Hello, my name is ""Monty""!"')
>>> np.loadtxt(s, dtype="U", delimiter=",", quotechar='"')
array('Hello, my name is "Monty"!', dtype='<U26')
File:      c:\users\yfuji\miniconda3\lib\site-packages\numpy\lib\npyio.py
Type:      function

追加で勉強するとよいこと¶

`plt.hist`¶

ヒストグラムを描く関数。

`datetime`モジュール¶

特にdatetime.datetime, datetime.timedelta, datetime.timezone。今回は「2019年1月1日を0日目とした日数」としてtime配列を使いましたが、きちんと暦を取り扱うにはdatetimeを使う必要があります。自力で勉強するのは少し骨が折れますが、基本的な使い方を覚えると暦の扱いも簡単にできます。

第2回 時系列データの描画¶

目次¶