import ibis
from ibis import _
import ibis.selectors as s
import seaborn.objects as so
= ibis.duckdb.connect() con
ibis
+ seaborn.objects
: Data Exploration
Learning Goals
We are now ready to start assembling the information we have learned from our initial exploration and bring together our skills with ibis
and seaborn.objects
.
= "https://huggingface.co/datasets/cboettig/ram_fisheries/resolve/main/v4.65/"
base_url = con.read_csv(base_url + "stock.csv", nullstr="NA")
stock = con.read_csv(base_url + "timeseries.csv", nullstr="NA") timeseries
Last time we reached the conclusion that we wanted to average across multiple assessments:
= (
cod_stocks
timeseries"stockid")
.join(stock, filter(_.tsid == "TCbest-MT")
.filter(_.commonname == "Atlantic cod")
.
.group_by(_.tsyear, _.stockid, _.primary_country, _.primary_FAOarea)= _.tsvalue.mean())
.agg(catch )
This is great, but even after aggregating the assessments, we have a lot of individual cod stocks in various locations:
(
so.Plot(cod_stocks, = "tsyear",
x ="catch",
y= "stockid")
color =3))
.add(so.Lines(linewidth=(10, 6))
.layout(size )
Our good friend, the COD2J3KL
series shows up with it’s remarkable declines, but what’s going on with those highly variable but very large catches? This will obviously impact our assessment of whether or not the species as a whole has collapsed. This is too many stocks to easily explore, let’s try breaking this out by at the by country:
(
so.Plot(cod_stocks, = "tsyear",
x ="catch",
y= "stockid",
color = "primary_country")
group =3))
.add(so.Lines(linewidth"primary_country", wrap = 6)
.facet(=(16, 10))
.layout(size )
Note in the country-based graphs, several countries have multiple stocks. Norway and Canada stand out for the largest harvests. The ‘grammar of graphics’ in Seaborn objects makes it easy to quickly visually explore the data along different dimensions. For instance, we can divide stocks by primary FAO Area instead of country with a single change:
(
so.Plot(cod_stocks, = "tsyear",
x ="catch",
y= "stockid",
color = "primary_FAOarea")
group =3))
.add(so.Lines(linewidth"primary_FAOarea")
.facet(=(12, 8))
.layout(size )
We have visually grouped the data into the two parts of the globe where Atlantic Cod are found: “Western North-Atlantic” (FAO area 21) and “Eastern North-Atlantic” (area 27). While we see declines in some Eastern stocks, the pattern in the West is much more dramatic. If we want to consider the fate of Western Atlantic Cod as a whole, we can add up all these individual stocks to get a picture for the entire FAO Region. We call the resulting table cod_fao
because now it no longer reflects individual stocks, we have summed all the individual stocks up when we aggregated to the level of entire FAO regions.
= (cod_stocks
cod_fao
.group_by(_.tsyear, _.primary_FAOarea)= _.catch.sum())
.agg(catch )
Once again we can get a visual sense of the resulting aggrecations.
(
so.Plot(cod_fao, = "tsyear",
x ="catch")
y=3))
.add(so.Lines(linewidth"primary_FAOarea")
.facet( )