/pokutuna/seaborn - Scrapbox Reader

generated at 2/22/2025, 3:35:54 PM
seaborn
seaborn: statistical data visualization — seaborn 0.12.2 documentation
記事
【Python】seabornで綺麗なグラフ作成を！たった1行で書けます | Smart-Hint
最新のseabornAPIを全て試してデータ可視化スキルを高める【可視化, 2020/9/9-ver0.11.0】 - Qiita
seabornの２種の神器、relplotとcatplot | やましなぶろぐ
Pythonデータ可視化に使えるseaborn 25メソッド データ分析 - Qiita 例があってよい
seabornの細かい見た目調整をあきらめない Python - Qiita

簡単な使い方
基本的に一番上のやつだけ使って  kind=  で指定
relplot / displot / catplot
どれも  kind=  でグラフの種類 &  col=  で横に並べられる
 data, x, y, col, hue  はよく指定する
 col ,  row ,  hue  の3つの軸で分けつつ複数グラフ同時に書ける
数値同士の関係見るなら relplot
seaborn.relplot — seaborn 0.12.2 documentation
カテゴリと値なら catplot
seaborn.catplot — seaborn 0.12.2 documentation
分布を見るなら displot
seaborn.displot — seaborn 0.12.2 documentation
1行の各列の値ごとにヒストグラムにしたいことが多いので melt で縦持ちに整える → 縦持ち横持ち
facet_hist.pysns.displot(
    pd.melt(df, id_vars=['id', 'label'], value_vars=['foo', 'bar', 'baz]),
    kind='hist',
    col='variable',
    hue='label', # 重ねず行増やすなら row で
    x='value',
    bins=30,
    kde=True,
    # log_scale=True,
    # facet_kws={'sharey': False, 'sharex': False},
    )
これら + heatmap でだいたい済む
seaborn.heatmap

relational
 sns.lineplot 
 sns.scatterplot 
categorical
seaborn.histplot
ヒストグラム
 hue='col'  で複数カラム重ねる
 multiple='stack'  で積み上げ
 multiple='dodge'  で横並べ
 multiple='fill'  で 積み上げ 100%
seaborn.boxplot — seaborn 0.12.2 documentation
 showfliers=False  外れ値表示しない
 whis=1.5  外れ値の範囲、デフォルトは 1.5 IQR
タプルを渡すとパーセンタイル指定、 whis=(0, 100)  で全体をヒゲに含める
seaborn.boxenplot — seaborn 0.12.2 documentation
boxplot の区分多くするやつ
seaborn.violinplot — seaborn 0.12.2 documentation
boxplot より複数のピークが見れてよい
四分位 + カーネル密度推定
Suggestion: half violinplot half histogram · Issue 2152 · mwaskom/seaborn
えーこれ良いと思うんだけどな
他
seaborn.heatmap
ヒートマップ、dataframe で値域が一定でないと意味ないかな?
 sns.heatmap(df.corr()) 
seaborn.jointplot
散布図 + 各軸にグラフ
 sns.jointplot(data=df, x='x', y='y')  が基本
 hue='col'  で col ごとに色分け
普段から relplot これでやってもいいか?
seaborn.regplot
回帰線つける、あんまりつかわないか
seaborn.lmplot
regplot をカテゴリ別に重ねる、こっちのほうが使うかな
 hue=  指定するだけ
seaborn.pairplot
列の2つのペアごとに可視化する、関係性を見る
seaborn.PairGrid — seaborn 0.12.2 documentation の定番の shorthand
 sns.pairplot(df, corner=True)  あたりをよく使いそうかな? 下半分でいいし
 sns.pairplot(df, kind='reg')  は散布図に重ねて回帰線書けるので単に情報増えるから使いがち
 diag_kind='kde'  も
ecdfplot - seaborn.ecdfplot — seaborn 0.13.2 documentation
累積分布
 sns.displot(kind="ecdf", ...)  でもよい

FacetGrid
seaborn.FacetGrid
複数グラフ並べる、 FacetGrid  で group by を宣言して map で実際のグラフの表示を指定する
facetgrid.pyg = sns.FacetGrid(tips, col="time", row="sex")
g.map(sns.scatterplot, "total_bill", "tip")
軸を共通にしないなら
 sharex=False ,  sharey=False 
軸を各 subplot に表示したいが、グラフの範囲(値域)は共通にしたい場合
 sharex=False ,  sharey=False  共有をやめつつ、軸の範囲を計算して共通で与えるのが楽
 sns.FacetGrid(..., xlim=(0, 10), ylim=(0, 100)) 
heatmap なら  vmin ,  vmax  を計算して与える(↓の例のように)
FacetGrid で heatmap 出す
 map_dataframe  で  pivot  する
facet_heatmap.pyg = sns.FacetGrid(df, col="category")

def plot_heatmap(data, **kwargs):
    pivot = data.pivot(index="x", columns="y", values="value")
    sns.heatmap(pivot, **kwargs,)

# map_dataframe は FacetGrid の条件のサブセットの DataFrame ごとに呼ばれる
g.map_dataframe(
    plot_heatmap,
    annot=True, # 値出す
    cbar=False
    # 軸は手作りするより min~max を共通にするのが楽
    vmin=df['value'].min(),
    vmax=df['value'].max()),
)
グラフ間のスペースちょっとだけ広げる
matplotlib.pyplot.subplots_adjust — Matplotlib 3.1.0 documentation
 g.figure.subplots_adjust(wspace=0.3, hspace=0.3) 
単位は平均の軸幅、軸高さによるものらしい( wspace=0.5  = 各サブグラフの横幅の半分?)
1 だとこれ
軸ラベルや軸タイトルがかぶるときに  xticklabels  とかでいじるより短くてハマりにくいと思う
 square=True  にしたい、という欲がレイアウトを狂わせるので諦めるのも吉

凡例をグラフの外に出す
 plt.legend(loc="upper left", bbox_to_anchor=(1, 1)) 
あるいは
legend.pyax = sns.hogeplot(...)
sns.move_levend(ax, "upper left", bbox_to_anchor=(1, 1))

サイズ調整
しぶしぶ matplotlib 使う、調整はしやすい
size.pyimport matplotlib.pyplot as plt
plt.figure(figsize=(8, 4.5))
sns.scatterplot(...)
特定の値域だけを描画
 g.set(xlim=(0, 500)) 

軸の刻み方変える
tick.pyg = sns.ecdfplot(...)
g.set_xticks(range(0, 500, 50))
g.grid(True, which="both") # グリッド表示


ラベルがかぶる
大抵 x 軸ラベル
rotate_labels.pyimport seaborn as sns
import matplotlib.pyplot as plt

# matplotlib 経由で触る
sns.barplot(x='カテゴリ', y='値', data=df)
plt.xticks(rotation=90)

# グラフ
g = sns.relplot(data=df, x="day", y="total_bill", kind="line")
g.tick_params(axis="x", labelrotation=90) # これがいちばん穏当かな

カラーパレット
seaborn.set_palette — seaborn 0.12.2 documentation
 sns.set_palette  で全体に
各グラフに  palette=  や  cmap=  で都度指定
Choosing color palettes — seaborn 0.13.2 documentation
Seabornのカラーパレットの選び方 - Qiita
連続データ
 rocket  
 mako 
 viridis 
まあ単に単色のほうがわかりやすいことも
 seagreen 
 Blues 
相関関係、中央から両端へ
 coolwarm 
 Spectral 
 icefire 
逆にするなら _r  をつける 

多クラスの可視化
palette が足りない時
Collection of perceptually accurate colormaps — colorcet v3.1.0 で要素数分のパレット作って渡す
How to make a color map with many unique colors in seaborn - Stack Overflow
colorcet.pyimport colorcet as cc
palette = sns.color_palette(cc.glasbey, n_colors=len(set(labels)))
sns.scatterplot(x=embedding_2d[:, 0], y=embedding_2d[:, 1], hue=labels, palette=palette)
まあ1つ1つの要素見るというよりクラスタリングの結果可視化するときに...

あるいは color * marker で順繰りに使う
markers.pyimport math
colors = sns.color_palette("tab10", 10)
markers = ["o", "^", "D", "X", "s", "H", "P", "h"]
markers_map = {label: markers[math.floor(i / len(colors)) % len(markers)] for i, label in enumerate(set(labels))}

# 特定のエラー値だけ上書きして置き換える
# markers_map[-1] = "*" 

sns.scatterplot(
   x=embedding_2d[:, 0],
   y=embedding_2d[:, 1],
   hue=labels,
   style=labels,
   markers=markers_map,
   palette=colors,
)

見た目
Properties of Mark objects — seaborn 0.12.2 documentation
seaborn.set_style — seaborn 0.12.2 documentation
 sns.set_style('whitegrid')  が多いかな?
matplotlib の出力にも影響する
seaborn.set_theme — seaborn 0.12.2 documentation
markers
https://seaborn.pydata.org/tutorial/properties.html#marker
seaborn: scatterplot の markers に指定できる文字 - Qiita
複数列で hue
まあ結合して列を作って指定
python - Multiple Columns for HUE parameter in Seaborn violinplot - Stack Overflow

時刻フォーマットがダルくなりがちなのでどうするか調べる
普通に dataframe の plot のほうがましだったりする
histplot 
 discrete=True 
x 軸のラベルが棒グラフなどのまんなかに来る
 shrink=0.8  などで横と隙間を開ける
 stat='probability'  で高さの合計を 1.0 に
seaborn.histplot — seaborn 0.13.0 documentation
python - Seaborn Plot Distribution with histogram with stat = density or probability? - Stack Overflow

#Python