stormwater_monitoring_datasheet_extraction.lib.schema package

Subpackages

Submodules

stormwater_monitoring_datasheet_extraction.lib.schema.schema module

Pandera schemas for ETL steps.

class stormwater_monitoring_datasheet_extraction.lib.schema.schema.Creek(*args, **kwargs)

Bases: DataFrameModel

Creek metadata.

Creek type.

Constraints:

PK: site_id. FK: site_id,: Site.creek_site_id (unenforced). DEFERRABLE INITIALLY DEFERRED

class Config

Bases: object

The configuration for the schema.

Strict schema enforcement.

multiindex_strict = True
multiindex_unique = ['site_id']
name = 'Creek'
strict = True
creek_type: Series[Annotated[CategoricalDtype]] = 'creek_type'

The creek type. constants.CreekType.

site_id: Index[str] = 'site_id'

The site ID.

class stormwater_monitoring_datasheet_extraction.lib.schema.schema.FormCleaned(*args, **kwargs)

Bases: FormVerified

Form metadata cleaned.

Constraints:

PK: form_id.

class Config

Bases: Config

name = 'FormCleaned'
class stormwater_monitoring_datasheet_extraction.lib.schema.schema.FormExtracted(*args, **kwargs)

Bases: DataFrameModel

Form metadata extracted from the datasheets.

Constraints:

PK: form_id.

class Config

Bases: object

The configuration for the schema.

Not a strict schema at this stage since it’s the “raw” extracted data.

We do enforce the primary key since it’s created by the extraction process.

multiindex_strict = False
name = 'FormExtracted'
strict = False
city: Series[str] = 'city'

The city of observations. Nullable. Unenforced constants.City.

date: Series[str] = 'date'

The date of observations. Nullable.

form_id: Index[str] = 'form_id'

The form ID.

form_type: Series[str] = 'form_type'

The form type. Nullable. Unenforced constants.FormType.

form_version: Series[str] = 'form_version'

The form version. Nullable.

notes: Series[str] = 'notes'

Investigator notes. Nullable.

past_24hr_rainfall: Series[float] = 'past_24hr_rainfall'

The past 24-hour rainfall. Nullable.

tide_height: Series[float] = 'tide_height'

The tide height at the time of observations. Nullable.

tide_time: Series[str] = 'tide_time'

The tide time at the time of observations. Nullable.

weather: Series[str] = 'weather'

The weather at the time of observations. Nullable. Unenforced constants.Weather.

class stormwater_monitoring_datasheet_extraction.lib.schema.schema.FormInvestigatorCleaned(*args, **kwargs)

Bases: FormInvestigatorVerified

Investigators on each form cleaned.

Constraints:

PK: form_id, investigator. FK: form_id: Form.form_id (unenforced).

class Config

Bases: Config

name = 'FormInvestigatorCleaned'
class stormwater_monitoring_datasheet_extraction.lib.schema.schema.FormInvestigatorExtracted(*args, **kwargs)

Bases: DataFrameModel

Investigators on each form extracted from the datasheets.

Constraints:

PK: form_id, investigator (unenforced). FK: form_id: Form.form_id (unenforced).

class Config

Bases: object

The configuration for the schema.

Not a strict schema at this stage since it’s the “raw” extracted data.

multiindex_strict = False
name = 'FormInvestigatorExtracted'
strict = False
end_time: Series[str] = 'end_time'

The end time of the investigation. Nullable.

form_id: Index[str] = 'form_id'

The form ID.

investigator: Index[str] = 'investigator'

The investigator, part of the primary key, but nullable at this stage.

start_time: Series[str] = 'start_time'

The start time of the investigation. Nullable.

class stormwater_monitoring_datasheet_extraction.lib.schema.schema.FormInvestigatorPrecleaned(*args, **kwargs)

Bases: FormInvestigatorExtracted

Schema for the investigators precleaned.

PK: form_id, investigator (unenforced). FK: form_id: Form.form_id (unenforced).

class Config

Bases: object

The configuration for the schema.

Adds missing columns, drops extra columns.

add_missing_columns = True
multiindex_strict = 'filter'
name = 'FormInvestigatorPrecleaned'
strict = 'filter'
class stormwater_monitoring_datasheet_extraction.lib.schema.schema.FormInvestigatorVerified(*args, **kwargs)

Bases: FormInvestigatorPrecleaned

Investigators on each form verified by user.

Constraints:

PK: form_id, investigator. FK: form_id: Form.form_id (unenforced).

class Config

Bases: object

The configuration for the schema.

Strict schema, enforces the primary key.

add_missing_columns = False
multiindex_strict = True
multiindex_unique = ['form_id', 'investigator']
name = 'FormInvestigatorVerified'
strict = True
end_time: Series[str] = 'end_time'

MM”. start_time must be before end_time.

Type:

The end time of the investigation. Must be “HH

classmethod end_time_is_valid_time(end_time)

Every end_time parses with the given format.

Return type:

Series[bool]

investigator: Index[str] = 'investigator'

The investigator.

start_time: Series[str] = 'start_time'

MM”. start_time must be before end_time.

Type:

The start time of the investigation. Must be “HH

classmethod start_time_before_end_time(df)

Every start_time is before end_time.

Return type:

Series[bool]

classmethod start_time_is_valid_time(start_time)

Every start_time parses with the given format.

Return type:

Series[bool]

class stormwater_monitoring_datasheet_extraction.lib.schema.schema.FormPrecleaned(*args, **kwargs)

Bases: FormExtracted

Form metadata precleaned.

Constraints:

PK: form_id.

class Config

Bases: object

The configuration for the schema.

Adds missing columns, drops extra columns, enforces primary key.

add_missing_columns = True
multiindex_strict = 'filter'
multiindex_unique = ['form_id']
name = 'FormPrecleaned'
strict = 'filter'
class stormwater_monitoring_datasheet_extraction.lib.schema.schema.FormVerified(*args, **kwargs)

Bases: FormPrecleaned

Form metadata verified by the user.

Constraints:

PK: form_id.

class Config

Bases: object

The configuration for the schema.

Strict schema, enforces primary key.

add_missing_columns = False
multiindex_strict = True
multiindex_unique = ['form_id']
name = 'FormVerified'
strict = True
city: Series[Annotated[CategoricalDtype]] = 'city'

The city of observations.

date: Series[str] = 'date'

The date of observations. Must be “YYYY-MM-DD”, on or before today. date and tide_time must be on or before now.

classmethod date_le_today(date)

Every date is on or before today.

Return type:

Series[bool]

form_type: Series[Annotated[CategoricalDtype]] = 'form_type'

The form type.

form_version: Series[str] = 'form_version'

The form version.

classmethod is_valid_date(date)

Every date parses with the given format.

Return type:

Series[bool]

classmethod is_valid_time(tide_time)

Every value parses with the given format.

Return type:

Series[bool]

notes: Series[str] = 'notes'

Investigator notes.

past_24hr_rainfall: Series[float] = 'past_24hr_rainfall'

The past 24-hour rainfall. Nullable.

classmethod tide_datetime_le_now(df)

Every date:tide_time is before now.

Return type:

Series[bool]

tide_height: Series[float] = 'tide_height'

The tide height at the time of observations.

tide_time: Series[str] = 'tide_time'

MM”. date and tide_time must be before now.

Type:

The tide time at the time of observations. Must be “HH

weather: Series[Annotated[CategoricalDtype]] = 'weather'

The weather at the time of observations. Nullable. Unenforced constants.Weather.

class stormwater_monitoring_datasheet_extraction.lib.schema.schema.QualitativeObservationsCleaned(*args, **kwargs)

Bases: QualitativeObservationsVerified

Qualitative site observations cleaned.

Only wet outfalls, but not necessarily all visits.

Constraints:

PK: form_id, site_id, observation_type. FK: form_id, site_id: QuantitativeObservations(form_id, site_id) (unenforced).

class Config

Bases: Config

name = 'QualitativeObservationsCleaned'
class stormwater_monitoring_datasheet_extraction.lib.schema.schema.QualitativeObservationsExtracted(*args, **kwargs)

Bases: DataFrameModel

Qualitative site observations extracted from the datasheets.

Only wet outfalls, but not necessarily all visits.

Constraints:

PK: form_id, site_id, observation_type (unenforced). FK: form_id, site_id: QuantitativeObservations(form_id, site_id) (unenforced).

class Config

Bases: object

The configuration for the schema.

Not a strict schema at this stage since it’s the “raw” extracted data.

multiindex_strict = False
name = 'QualitativeObservationsExtracted'
strict = False
description: Series[str] = 'description'

The description of the observation. Nullable.

form_id: Index[str] = 'form_id'

The form ID.

observation_type: Index[str] = 'observation_type'

The observation type. Nullable. Unenforced constants.QualitativeSiteObservationTypes.

rank: Series[int] = 'rank'

The rank of the observation. Nullable. Unenforced constants.Rank.

site_id: Index[str] = 'site_id'

The site ID, part of the primary key, but nullable at this stage.

class stormwater_monitoring_datasheet_extraction.lib.schema.schema.QualitativeObservationsPrecleaned(*args, **kwargs)

Bases: QualitativeObservationsExtracted

Qualitative site observations precleaned.

Only wet outfalls, but not necessarily all visits.

Constraints:

PK: form_id, site_id, observation_type (unenforced). FK: form_id, site_id: QuantitativeObservations(form_id, site_id) (unenforced).

class Config

Bases: object

The configuration for the schema.

Adds missing columns, drops extra columns.

add_missing_columns = True
multiindex_strict = 'filter'
name = 'QualitativeObservationsPrecleaned'
strict = 'filter'
class stormwater_monitoring_datasheet_extraction.lib.schema.schema.QualitativeObservationsVerified(*args, **kwargs)

Bases: QualitativeObservationsPrecleaned

Qualitative site observations verified by user.

Only wet outfalls, but not necessarily all visits.

Constraints:

PK: form_id, site_id, observation_type. FK: form_id, site_id: QuantitativeObservations(form_id, site_id) (unenforced).

class Config

Bases: object

The configuration for the schema.

Enforces the primary key.

add_missing_columns = False
multiindex_strict = True
multiindex_unique = ['form_id', 'site_id', 'observation_type']
name = 'QualitativeObservationsVerified'
strict = True
description: Series[str] = 'description'

The description of the observation.

observation_type: Index[Annotated[CategoricalDtype]] = 'observation_type'

The observation type.

rank: Series[Annotated[CategoricalDtype]] = 'rank'

The rank of the observation.

site_id: Index[str] = 'site_id'

The site ID.

class stormwater_monitoring_datasheet_extraction.lib.schema.schema.QuantitativeObservationsCleaned(*args, **kwargs)

Bases: QuantitativeObservationsVerified

Quantitative observations cleaned.

All site visits excluding for dry outfalls.

Constraints:

PK: form_id, site_id. FK: form_id, site_id: SiteVisit(form_id, site_id) (unenforced). Unique: form_id, bottle_no.

class Config

Bases: Config

name = 'QuantitativeObservationsCleaned'
class stormwater_monitoring_datasheet_extraction.lib.schema.schema.QuantitativeObservationsExtracted(*args, **kwargs)

Bases: DataFrameModel

Quantitative observations extracted.

All site visits excluding for dry outfalls.

Constraints:

PK: form_id, site_id (unenforced). FK: form_id, site_id: SiteVisit(form_id, site_id) (unenforced). Unique: form_id, bottle_no (unenforced).

class Config

Bases: object

The configuration for the schema.

Not a strict schema at this stage since it’s the “raw” extracted data.

multiindex_strict = False
name = 'QuantitativeObservationsExtracted'
strict = False
DO_mg_per_l: Series[float] = 'DO_mg_per_l'

The dissolved oxygen.

SPS_micro_S_per_cm: Series[float] = 'SPS_micro_S_per_cm'

The specific conductance.

air_temp: Series[float] = 'air_temp'

The air temperature.

bottle_no: Series[str] = 'bacteria_bottle_no'

The bottle number.

flow: Series[str] = 'flow'

The flow. Unenforced constants.Flow.

flow_compared_to_expected: Series[str] = 'flow_compared_to_expected'

The flow compared to expected. Unenforced constants.FlowComparedToExpected.

form_id: Index[str] = 'form_id'

The form ID.

pH: Series[float] = 'pH'

The pH. Nullable.

salinity_ppt: Series[float] = 'salinity_ppt'

The salinity. Nullable.

site_id: Index[str] = 'site_id'

The site ID, part of the primary key, but nullable at this stage.

water_temp: Series[float] = 'water_temp'

The water temperature.

class stormwater_monitoring_datasheet_extraction.lib.schema.schema.QuantitativeObservationsPrecleaned(*args, **kwargs)

Bases: QuantitativeObservationsExtracted

Quantitative observations precleaned.

All site visits excluding for dry outfalls.

Constraints:

PK: form_id, site_id (unenforced). FK: form_id, site_id: SiteVisit(form_id, site_id) (unenforced). Unique: form_id, bottle_no (unenforced).

class Config

Bases: object

The configuration for the schema.

Adds missing columns, drops extra columns.

add_missing_columns = True
multiindex_strict = 'filter'
name = 'QuantitativeObservationsPrecleaned'
strict = 'filter'
class stormwater_monitoring_datasheet_extraction.lib.schema.schema.QuantitativeObservationsVerified(*args, **kwargs)

Bases: QuantitativeObservationsPrecleaned

Quantitative observations verified by user.

All site visits excluding for dry outfalls.

Constraints:

PK: form_id, site_id. FK: form_id, site_id: SiteVisit(form_id, site_id) (unenforced). Unique: form_id, bottle_no.

class Config

Bases: object

The configuration for the schema.

Strict schema, enforces primary key.

add_missing_columns = False
multiindex_strict = True
multiindex_unique = ['form_id', 'site_id']
name = 'QuantitativeObservationsVerified'
strict = True
DO_mg_per_l: Series[float] = 'DO_mg_per_l'

The dissolved oxygen.

SPS_micro_S_per_cm: Series[float] = 'SPS_micro_S_per_cm'

The specific conductance.

air_temp: Series[float] = 'air_temp'

The air temperature.

bottle_no: Series[str] = 'bacteria_bottle_no'

The bottle number. Must be unique within each form_id.

classmethod bottle_no_unique_by_form_id(df)

Every bottle_no is unique within each form_id.

Return type:

Series[bool]

flow: Series[Annotated[CategoricalDtype]] = 'flow'

The flow.

flow_compared_to_expected: Series[Annotated[CategoricalDtype]] = 'flow_compared_to_expected'

The flow compared to expected.

pH: Series[float] = 'pH'

The pH.

salinity_ppt: Series[float] = 'salinity_ppt'

The salinity.

site_id: Index[str] = 'site_id'

The site ID.

water_temp: Series[float] = 'water_temp'

The water temperature.

class stormwater_monitoring_datasheet_extraction.lib.schema.schema.Site(*args, **kwargs)

Bases: DataFrameModel

Site metadata.

All sites.

Constraints:

PK: site_id. FK: creek_site_id: Creek(site_id) (unenforced). DEFERRABLE INITIALLY DEFERRED

class Config

Bases: object

The configuration for the schema.

Strict schema enforcement.

multiindex_strict = True
multiindex_unique = ['site_id']
name = 'Site'
strict = True
classmethod check_creek_site_id_valid(df)

Check that creek_site_id is valid.

Return type:

Series[bool]

creek_site_id: Series[str] = 'creek_site_id'

If a creek, site_id, else null.

outfall_type: Series[Annotated[CategoricalDtype]] = 'outfall_type'

The outfall type. constants.OutfallType.

site_id: Index[str] = 'site_id'

The site ID.

class stormwater_monitoring_datasheet_extraction.lib.schema.schema.SiteVisitCleaned(*args, **kwargs)

Bases: SiteVisitVerified

Site visit cleaned.

All site visits, including dry outfalls with no observations.

Constraints:

PK: form_id, site_id. FK: form_id: Form.form_id (unenforced). FK: site_id: Site.site_id (unenforced).

class Config

Bases: Config

name = 'SiteVisitCleaned'
class stormwater_monitoring_datasheet_extraction.lib.schema.schema.SiteVisitExtracted(*args, **kwargs)

Bases: DataFrameModel

Site visit extracted.

All site visits, including dry outfalls with no observations.

Constraints:

PK: form_id, site_id (unenforced). FK: form_id: Form.form_id (unenforced). FK: site_id: Site.site_id (unenforced).

class Config

Bases: object

The configuration for the schema.

Not a strict schema at this stage since it’s the “raw” extracted data.

multiindex_strict = False
name = 'SiteVisitExtracted'
strict = False
arrival_time: Series[str] = 'arrival_time'

The arrival time of the investigation. Nullable.

form_id: Index[str] = 'form_id'

The form ID.

site_id: Index[str] = 'site_id'

The site ID, part of the primary key, but nullable at this stage.

class stormwater_monitoring_datasheet_extraction.lib.schema.schema.SiteVisitPrecleaned(*args, **kwargs)

Bases: SiteVisitExtracted

Site visit precleaned.

All site visits, including dry outfalls with no observations.

Constraints:

PK: form_id, site_id (unenforced). FK: form_id: Form.form_id (unenforced). FK: site_id: Site.site_id (unenforced).

class Config

Bases: object

The configuration for the schema.

Adds missing columns, drops extra columns.

add_missing_columns = True
multiindex_strict = 'filter'
name = 'SiteVisitPrecleaned'
strict = 'filter'
class stormwater_monitoring_datasheet_extraction.lib.schema.schema.SiteVisitVerified(*args, **kwargs)

Bases: SiteVisitPrecleaned

Site visit verified by user.

All site visits, including dry outfalls with no observations.

Constraints:

PK: form_id, site_id. FK: form_id: Form.form_id (unenforced). FK: site_id: Site.site_id (unenforced).

class Config

Bases: object

The configuration for the schema.

Strict schema, enforces primary key.

add_missing_columns = False
multiindex_strict = True
multiindex_unique = ['form_id', 'site_id']
name = 'SiteVisitVerified'
strict = True
arrival_time: Series[str] = 'arrival_time'

MM”.

Type:

The arrival time of the investigation. Must be “HH

classmethod arrival_time_is_valid_time(arrival_time)

Every arrival_time parses with the given format.

Return type:

Series[bool]

site_id: Index[str] = 'site_id'

The site ID.

Module contents

Pandera schemas and validations for the stormwater monitoring datasheet extraction.