stormwater_monitoring_datasheet_extraction.lib.schema package¶
Subpackages¶
- stormwater_monitoring_datasheet_extraction.lib.schema.checks package
Submodules¶
stormwater_monitoring_datasheet_extraction.lib.schema.schema module¶
Pandera schemas for ETL steps.
- class stormwater_monitoring_datasheet_extraction.lib.schema.schema.Creek(*args, **kwargs)¶
Bases:
DataFrameModelCreek metadata.
Creek type.
- Constraints:
PK: site_id. FK: site_id,: Site.creek_site_id (unenforced). DEFERRABLE INITIALLY DEFERRED
- class Config¶
Bases:
objectThe configuration for the schema.
Strict schema enforcement.
- multiindex_strict = True¶
- multiindex_unique = ['site_id']¶
- name = 'Creek'¶
- strict = True¶
-
creek_type:
Series[Annotated[CategoricalDtype]] = 'creek_type'¶ The creek type. constants.CreekType.
-
site_id:
Index[str] = 'site_id'¶ The site ID.
- class stormwater_monitoring_datasheet_extraction.lib.schema.schema.FormCleaned(*args, **kwargs)¶
Bases:
FormVerifiedForm metadata cleaned.
- Constraints:
PK: form_id.
- class stormwater_monitoring_datasheet_extraction.lib.schema.schema.FormExtracted(*args, **kwargs)¶
Bases:
DataFrameModelForm metadata extracted from the datasheets.
- Constraints:
PK: form_id.
- class Config¶
Bases:
objectThe configuration for the schema.
Not a strict schema at this stage since it’s the “raw” extracted data.
We do enforce the primary key since it’s created by the extraction process.
- multiindex_strict = False¶
- name = 'FormExtracted'¶
- strict = False¶
-
city:
Series[str] = 'city'¶ The city of observations. Nullable. Unenforced constants.City.
-
date:
Series[str] = 'date'¶ The date of observations. Nullable.
-
form_id:
Index[str] = 'form_id'¶ The form ID.
-
form_type:
Series[str] = 'form_type'¶ The form type. Nullable. Unenforced constants.FormType.
-
form_version:
Series[str] = 'form_version'¶ The form version. Nullable.
-
notes:
Series[str] = 'notes'¶ Investigator notes. Nullable.
-
past_24hr_rainfall:
Series[float] = 'past_24hr_rainfall'¶ The past 24-hour rainfall. Nullable.
-
tide_height:
Series[float] = 'tide_height'¶ The tide height at the time of observations. Nullable.
-
tide_time:
Series[str] = 'tide_time'¶ The tide time at the time of observations. Nullable.
-
weather:
Series[str] = 'weather'¶ The weather at the time of observations. Nullable. Unenforced constants.Weather.
- class stormwater_monitoring_datasheet_extraction.lib.schema.schema.FormInvestigatorCleaned(*args, **kwargs)¶
Bases:
FormInvestigatorVerifiedInvestigators on each form cleaned.
- Constraints:
PK: form_id, investigator. FK: form_id: Form.form_id (unenforced).
- class stormwater_monitoring_datasheet_extraction.lib.schema.schema.FormInvestigatorExtracted(*args, **kwargs)¶
Bases:
DataFrameModelInvestigators on each form extracted from the datasheets.
- Constraints:
PK: form_id, investigator (unenforced). FK: form_id: Form.form_id (unenforced).
- class Config¶
Bases:
objectThe configuration for the schema.
Not a strict schema at this stage since it’s the “raw” extracted data.
- multiindex_strict = False¶
- name = 'FormInvestigatorExtracted'¶
- strict = False¶
-
end_time:
Series[str] = 'end_time'¶ The end time of the investigation. Nullable.
-
form_id:
Index[str] = 'form_id'¶ The form ID.
-
investigator:
Index[str] = 'investigator'¶ The investigator, part of the primary key, but nullable at this stage.
-
start_time:
Series[str] = 'start_time'¶ The start time of the investigation. Nullable.
- class stormwater_monitoring_datasheet_extraction.lib.schema.schema.FormInvestigatorPrecleaned(*args, **kwargs)¶
Bases:
FormInvestigatorExtractedSchema for the investigators precleaned.
PK: form_id, investigator (unenforced). FK: form_id: Form.form_id (unenforced).
- class stormwater_monitoring_datasheet_extraction.lib.schema.schema.FormInvestigatorVerified(*args, **kwargs)¶
Bases:
FormInvestigatorPrecleanedInvestigators on each form verified by user.
- Constraints:
PK: form_id, investigator. FK: form_id: Form.form_id (unenforced).
- class Config¶
Bases:
objectThe configuration for the schema.
Strict schema, enforces the primary key.
- add_missing_columns = False¶
- multiindex_strict = True¶
- multiindex_unique = ['form_id', 'investigator']¶
- name = 'FormInvestigatorVerified'¶
- strict = True¶
-
end_time:
Series[str] = 'end_time'¶ MM”. start_time must be before end_time.
- Type:
The end time of the investigation. Must be “HH
- classmethod end_time_is_valid_time(end_time)¶
Every end_time parses with the given format.
- Return type:
Series[bool]
-
investigator:
Index[str] = 'investigator'¶ The investigator.
-
start_time:
Series[str] = 'start_time'¶ MM”. start_time must be before end_time.
- Type:
The start time of the investigation. Must be “HH
- classmethod start_time_before_end_time(df)¶
Every start_time is before end_time.
- Return type:
Series[bool]
- classmethod start_time_is_valid_time(start_time)¶
Every start_time parses with the given format.
- Return type:
Series[bool]
- class stormwater_monitoring_datasheet_extraction.lib.schema.schema.FormPrecleaned(*args, **kwargs)¶
Bases:
FormExtractedForm metadata precleaned.
- Constraints:
PK: form_id.
- class stormwater_monitoring_datasheet_extraction.lib.schema.schema.FormVerified(*args, **kwargs)¶
Bases:
FormPrecleanedForm metadata verified by the user.
- Constraints:
PK: form_id.
- class Config¶
Bases:
objectThe configuration for the schema.
Strict schema, enforces primary key.
- add_missing_columns = False¶
- multiindex_strict = True¶
- multiindex_unique = ['form_id']¶
- name = 'FormVerified'¶
- strict = True¶
-
city:
Series[Annotated[CategoricalDtype]] = 'city'¶ The city of observations.
-
date:
Series[str] = 'date'¶ The date of observations. Must be “YYYY-MM-DD”, on or before today. date and tide_time must be on or before now.
- classmethod date_le_today(date)¶
Every date is on or before today.
- Return type:
Series[bool]
-
form_type:
Series[Annotated[CategoricalDtype]] = 'form_type'¶ The form type.
-
form_version:
Series[str] = 'form_version'¶ The form version.
- classmethod is_valid_date(date)¶
Every date parses with the given format.
- Return type:
Series[bool]
- classmethod is_valid_time(tide_time)¶
Every value parses with the given format.
- Return type:
Series[bool]
-
notes:
Series[str] = 'notes'¶ Investigator notes.
-
past_24hr_rainfall:
Series[float] = 'past_24hr_rainfall'¶ The past 24-hour rainfall. Nullable.
- classmethod tide_datetime_le_now(df)¶
Every date:tide_time is before now.
- Return type:
Series[bool]
-
tide_height:
Series[float] = 'tide_height'¶ The tide height at the time of observations.
-
tide_time:
Series[str] = 'tide_time'¶ MM”. date and tide_time must be before now.
- Type:
The tide time at the time of observations. Must be “HH
-
weather:
Series[Annotated[CategoricalDtype]] = 'weather'¶ The weather at the time of observations. Nullable. Unenforced constants.Weather.
- class stormwater_monitoring_datasheet_extraction.lib.schema.schema.QualitativeObservationsCleaned(*args, **kwargs)¶
Bases:
QualitativeObservationsVerifiedQualitative site observations cleaned.
Only wet outfalls, but not necessarily all visits.
- Constraints:
PK: form_id, site_id, observation_type. FK: form_id, site_id: QuantitativeObservations(form_id, site_id) (unenforced).
- class stormwater_monitoring_datasheet_extraction.lib.schema.schema.QualitativeObservationsExtracted(*args, **kwargs)¶
Bases:
DataFrameModelQualitative site observations extracted from the datasheets.
Only wet outfalls, but not necessarily all visits.
- Constraints:
PK: form_id, site_id, observation_type (unenforced). FK: form_id, site_id: QuantitativeObservations(form_id, site_id) (unenforced).
- class Config¶
Bases:
objectThe configuration for the schema.
Not a strict schema at this stage since it’s the “raw” extracted data.
- multiindex_strict = False¶
- name = 'QualitativeObservationsExtracted'¶
- strict = False¶
-
description:
Series[str] = 'description'¶ The description of the observation. Nullable.
-
form_id:
Index[str] = 'form_id'¶ The form ID.
-
observation_type:
Index[str] = 'observation_type'¶ The observation type. Nullable. Unenforced constants.QualitativeSiteObservationTypes.
-
rank:
Series[int] = 'rank'¶ The rank of the observation. Nullable. Unenforced constants.Rank.
-
site_id:
Index[str] = 'site_id'¶ The site ID, part of the primary key, but nullable at this stage.
- class stormwater_monitoring_datasheet_extraction.lib.schema.schema.QualitativeObservationsPrecleaned(*args, **kwargs)¶
Bases:
QualitativeObservationsExtractedQualitative site observations precleaned.
Only wet outfalls, but not necessarily all visits.
- Constraints:
PK: form_id, site_id, observation_type (unenforced). FK: form_id, site_id: QuantitativeObservations(form_id, site_id) (unenforced).
- class stormwater_monitoring_datasheet_extraction.lib.schema.schema.QualitativeObservationsVerified(*args, **kwargs)¶
Bases:
QualitativeObservationsPrecleanedQualitative site observations verified by user.
Only wet outfalls, but not necessarily all visits.
- Constraints:
PK: form_id, site_id, observation_type. FK: form_id, site_id: QuantitativeObservations(form_id, site_id) (unenforced).
- class Config¶
Bases:
objectThe configuration for the schema.
Enforces the primary key.
- add_missing_columns = False¶
- multiindex_strict = True¶
- multiindex_unique = ['form_id', 'site_id', 'observation_type']¶
- name = 'QualitativeObservationsVerified'¶
- strict = True¶
-
description:
Series[str] = 'description'¶ The description of the observation.
-
observation_type:
Index[Annotated[CategoricalDtype]] = 'observation_type'¶ The observation type.
-
rank:
Series[Annotated[CategoricalDtype]] = 'rank'¶ The rank of the observation.
-
site_id:
Index[str] = 'site_id'¶ The site ID.
- class stormwater_monitoring_datasheet_extraction.lib.schema.schema.QuantitativeObservationsCleaned(*args, **kwargs)¶
Bases:
QuantitativeObservationsVerifiedQuantitative observations cleaned.
All site visits excluding for dry outfalls.
- Constraints:
PK: form_id, site_id. FK: form_id, site_id: SiteVisit(form_id, site_id) (unenforced). Unique: form_id, bottle_no.
- class stormwater_monitoring_datasheet_extraction.lib.schema.schema.QuantitativeObservationsExtracted(*args, **kwargs)¶
Bases:
DataFrameModelQuantitative observations extracted.
All site visits excluding for dry outfalls.
- Constraints:
PK: form_id, site_id (unenforced). FK: form_id, site_id: SiteVisit(form_id, site_id) (unenforced). Unique: form_id, bottle_no (unenforced).
- class Config¶
Bases:
objectThe configuration for the schema.
Not a strict schema at this stage since it’s the “raw” extracted data.
- multiindex_strict = False¶
- name = 'QuantitativeObservationsExtracted'¶
- strict = False¶
-
DO_mg_per_l:
Series[float] = 'DO_mg_per_l'¶ The dissolved oxygen.
-
SPS_micro_S_per_cm:
Series[float] = 'SPS_micro_S_per_cm'¶ The specific conductance.
-
air_temp:
Series[float] = 'air_temp'¶ The air temperature.
-
bottle_no:
Series[str] = 'bacteria_bottle_no'¶ The bottle number.
-
flow:
Series[str] = 'flow'¶ The flow. Unenforced constants.Flow.
-
flow_compared_to_expected:
Series[str] = 'flow_compared_to_expected'¶ The flow compared to expected. Unenforced constants.FlowComparedToExpected.
-
form_id:
Index[str] = 'form_id'¶ The form ID.
-
pH:
Series[float] = 'pH'¶ The pH. Nullable.
-
salinity_ppt:
Series[float] = 'salinity_ppt'¶ The salinity. Nullable.
-
site_id:
Index[str] = 'site_id'¶ The site ID, part of the primary key, but nullable at this stage.
-
water_temp:
Series[float] = 'water_temp'¶ The water temperature.
- class stormwater_monitoring_datasheet_extraction.lib.schema.schema.QuantitativeObservationsPrecleaned(*args, **kwargs)¶
Bases:
QuantitativeObservationsExtractedQuantitative observations precleaned.
All site visits excluding for dry outfalls.
- Constraints:
PK: form_id, site_id (unenforced). FK: form_id, site_id: SiteVisit(form_id, site_id) (unenforced). Unique: form_id, bottle_no (unenforced).
- class stormwater_monitoring_datasheet_extraction.lib.schema.schema.QuantitativeObservationsVerified(*args, **kwargs)¶
Bases:
QuantitativeObservationsPrecleanedQuantitative observations verified by user.
All site visits excluding for dry outfalls.
- Constraints:
PK: form_id, site_id. FK: form_id, site_id: SiteVisit(form_id, site_id) (unenforced). Unique: form_id, bottle_no.
- class Config¶
Bases:
objectThe configuration for the schema.
Strict schema, enforces primary key.
- add_missing_columns = False¶
- multiindex_strict = True¶
- multiindex_unique = ['form_id', 'site_id']¶
- name = 'QuantitativeObservationsVerified'¶
- strict = True¶
-
DO_mg_per_l:
Series[float] = 'DO_mg_per_l'¶ The dissolved oxygen.
-
SPS_micro_S_per_cm:
Series[float] = 'SPS_micro_S_per_cm'¶ The specific conductance.
-
air_temp:
Series[float] = 'air_temp'¶ The air temperature.
-
bottle_no:
Series[str] = 'bacteria_bottle_no'¶ The bottle number. Must be unique within each form_id.
- classmethod bottle_no_unique_by_form_id(df)¶
Every bottle_no is unique within each form_id.
- Return type:
Series[bool]
-
flow:
Series[Annotated[CategoricalDtype]] = 'flow'¶ The flow.
-
flow_compared_to_expected:
Series[Annotated[CategoricalDtype]] = 'flow_compared_to_expected'¶ The flow compared to expected.
-
pH:
Series[float] = 'pH'¶ The pH.
-
salinity_ppt:
Series[float] = 'salinity_ppt'¶ The salinity.
-
site_id:
Index[str] = 'site_id'¶ The site ID.
-
water_temp:
Series[float] = 'water_temp'¶ The water temperature.
- class stormwater_monitoring_datasheet_extraction.lib.schema.schema.Site(*args, **kwargs)¶
Bases:
DataFrameModelSite metadata.
All sites.
- Constraints:
PK: site_id. FK: creek_site_id: Creek(site_id) (unenforced). DEFERRABLE INITIALLY DEFERRED
- class Config¶
Bases:
objectThe configuration for the schema.
Strict schema enforcement.
- multiindex_strict = True¶
- multiindex_unique = ['site_id']¶
- name = 'Site'¶
- strict = True¶
- classmethod check_creek_site_id_valid(df)¶
Check that creek_site_id is valid.
- Return type:
Series[bool]
-
creek_site_id:
Series[str] = 'creek_site_id'¶ If a creek, site_id, else null.
-
outfall_type:
Series[Annotated[CategoricalDtype]] = 'outfall_type'¶ The outfall type. constants.OutfallType.
-
site_id:
Index[str] = 'site_id'¶ The site ID.
- class stormwater_monitoring_datasheet_extraction.lib.schema.schema.SiteVisitCleaned(*args, **kwargs)¶
Bases:
SiteVisitVerifiedSite visit cleaned.
All site visits, including dry outfalls with no observations.
- Constraints:
PK: form_id, site_id. FK: form_id: Form.form_id (unenforced). FK: site_id: Site.site_id (unenforced).
- class stormwater_monitoring_datasheet_extraction.lib.schema.schema.SiteVisitExtracted(*args, **kwargs)¶
Bases:
DataFrameModelSite visit extracted.
All site visits, including dry outfalls with no observations.
- Constraints:
PK: form_id, site_id (unenforced). FK: form_id: Form.form_id (unenforced). FK: site_id: Site.site_id (unenforced).
- class Config¶
Bases:
objectThe configuration for the schema.
Not a strict schema at this stage since it’s the “raw” extracted data.
- multiindex_strict = False¶
- name = 'SiteVisitExtracted'¶
- strict = False¶
-
arrival_time:
Series[str] = 'arrival_time'¶ The arrival time of the investigation. Nullable.
-
form_id:
Index[str] = 'form_id'¶ The form ID.
-
site_id:
Index[str] = 'site_id'¶ The site ID, part of the primary key, but nullable at this stage.
- class stormwater_monitoring_datasheet_extraction.lib.schema.schema.SiteVisitPrecleaned(*args, **kwargs)¶
Bases:
SiteVisitExtractedSite visit precleaned.
All site visits, including dry outfalls with no observations.
- Constraints:
PK: form_id, site_id (unenforced). FK: form_id: Form.form_id (unenforced). FK: site_id: Site.site_id (unenforced).
- class stormwater_monitoring_datasheet_extraction.lib.schema.schema.SiteVisitVerified(*args, **kwargs)¶
Bases:
SiteVisitPrecleanedSite visit verified by user.
All site visits, including dry outfalls with no observations.
- Constraints:
PK: form_id, site_id. FK: form_id: Form.form_id (unenforced). FK: site_id: Site.site_id (unenforced).
- class Config¶
Bases:
objectThe configuration for the schema.
Strict schema, enforces primary key.
- add_missing_columns = False¶
- multiindex_strict = True¶
- multiindex_unique = ['form_id', 'site_id']¶
- name = 'SiteVisitVerified'¶
- strict = True¶
-
arrival_time:
Series[str] = 'arrival_time'¶ MM”.
- Type:
The arrival time of the investigation. Must be “HH
- classmethod arrival_time_is_valid_time(arrival_time)¶
Every arrival_time parses with the given format.
- Return type:
Series[bool]
-
site_id:
Index[str] = 'site_id'¶ The site ID.
Module contents¶
Pandera schemas and validations for the stormwater monitoring datasheet extraction.