stormwater_monitoring_datasheet_extraction.lib package

Subpackages

Submodules

stormwater_monitoring_datasheet_extraction.lib.constants module

Constants for the lib module.

class stormwater_monitoring_datasheet_extraction.lib.constants.CharLimits

Bases: object

Character limits for fields.

DESCRIPTION: Final[int] = 250
NOTES: Final[int] = 500
class stormwater_monitoring_datasheet_extraction.lib.constants.City(*values)

Bases: StrEnum

Options for the city field.

BELLINGHAM = 'Bellingham'
class stormwater_monitoring_datasheet_extraction.lib.constants.Columns

Bases: object

Column name constants.

AIR_TEMP: Final[str] = 'air_temp'
ARRIVAL_TIME: Final[str] = 'arrival_time'
BACTERIA_BOTTLE_NO: Final[str] = 'bacteria_bottle_no'
CITY: Final[str] = 'city'
COLOR: Final[str] = 'color'
CREEK_SITE_ID: Final[str] = 'creek_site_id'
CREEK_TYPE: Final[str] = 'creek_type'
DATA_TYPE: Final[str] = 'data_type'
DATE: Final[str] = 'date'
DESCRIPTION: Final[str] = 'description'
DO_MG_PER_L: Final[str] = 'DO_mg_per_l'
END_TIME: Final[str] = 'end_time'
FLOW: Final[str] = 'flow'
FLOW_COMPARED_TO_EXPECTED: Final[str] = 'flow_compared_to_expected'
FORMAT: Final[str] = 'format'
FORMS: Final[str] = 'forms'
FORM_ID: Final[str] = 'form_id'
FORM_TYPE: Final[str] = 'form_type'
FORM_VERSION: Final[str] = 'form_version'
HABITAT: Final[str] = 'habitat'
INCLUSIVE: Final[str] = 'inclusive'
INVESTIGATOR: Final[str] = 'investigator'
INVESTIGATORS: Final[str] = 'investigators'
LOWER: Final[str] = 'lower'
METADATA: Final[str] = 'metadata'
MIGRATE: Final[str] = 'migrate'
NOTES: Final[str] = 'notes'
OBSERVATIONS: Final[str] = 'observations'
OBSERVATION_TYPE: Final[str] = 'observation_type'
ODOR: Final[str] = 'odor'
OPTIONS: Final[str] = 'options'
OUTFALL_TYPE: Final[str] = 'outfall_type'
PAST_24HR_RAINFALL: Final[str] = 'past_24hr_rainfall'
PH: Final[str] = 'pH'
RANK: Final[str] = 'rank'
REAR: Final[str] = 'rear'
REFERENCE_VALUE: Final[str] = 'reference_value'
SALINITY_PPT: Final[str] = 'salinity_ppt'
SITE: Final[str] = 'site'
SITE_ID: Final[str] = 'site_id'
SPAWN: Final[str] = 'spawn'
SPS_MICRO_S_PER_CM: Final[str] = 'SPS_micro_S_per_cm'
START_TIME: Final[str] = 'start_time'
THRESHOLDS: Final[str] = 'thresholds'
TIDE_HEIGHT: Final[str] = 'tide_height'
TIDE_TIME: Final[str] = 'tide_time'
UNITS: Final[str] = 'units'
UPPER: Final[str] = 'upper'
VALUE: Final[str] = 'value'
VISUAL: Final[str] = 'visual'
WATER_TEMP: Final[str] = 'water_temp'
WEATHER: Final[str] = 'weather'
class stormwater_monitoring_datasheet_extraction.lib.constants.CreekType(*values)

Bases: StrEnum

Options for the creek type field.

HABITAT = 'habitat'
MIGRATE = 'migrate'
REAR = 'rear'
SPAWN = 'spawn'
class stormwater_monitoring_datasheet_extraction.lib.constants.DocStrings

Bases: object

Docstrings for top-level modules.

RUN_ETL: Final[DocString] = <comb_utils.lib.docs.DocString object>
class stormwater_monitoring_datasheet_extraction.lib.constants.Flow(*values)

Bases: StrEnum

Options for the flow field.

H = 'H'
M = 'M'
T = 'T'
class stormwater_monitoring_datasheet_extraction.lib.constants.FlowComparedToExpected(*values)

Bases: StrEnum

Options for the flow compared to expected field.

HIGHER = 'Higher'
LOWER = 'Lower'
NORMAL = 'Normal'
class stormwater_monitoring_datasheet_extraction.lib.constants.FormType(*values)

Bases: StrEnum

Options for the form type field.

FIELD_DATASHEET_FOSS = 'field_datasheet_FOSS'
class stormwater_monitoring_datasheet_extraction.lib.constants.OutfallType(*values)

Bases: StrEnum

Options for the outfall type field.

CREEK = 'creek'
OUTFALL = 'outfall'
class stormwater_monitoring_datasheet_extraction.lib.constants.QualitativeSiteObservationTypes(*values)

Bases: StrEnum

Options for the qualitative site observation types field.

COLOR = 'color'
ODOR = 'odor'
VISUAL = 'visual'
class stormwater_monitoring_datasheet_extraction.lib.constants.Rank(*values)

Bases: IntEnum

Options for the rank field.

ONE = 1
THREE = 3
TWO = 2
ZERO = 0
class stormwater_monitoring_datasheet_extraction.lib.constants.Units(*values)

Bases: StrEnum

Options for the units field.

CELSIUS = 'Celsius'
FEET = 'feet'
INCHES = 'inches'
MG_PER_L = 'mg/l'
MICRO_S_PER_CM = 'microS/cm'
PH = 'pH'
PPT = 'ppt'
class stormwater_monitoring_datasheet_extraction.lib.constants.Weather(*values)

Bases: StrEnum

Options for the weather field.

CLOUD_CLEAR = 'cloud_clear'
CLOUD_OVER = 'cloud_over'
CLOUD_PART = 'cloud_part'
PRECIP_RAIN_HEAVY = 'precip_rain_heavy'
PRECIP_RAIN_LIGHT = 'precip_rain_light'
PRECIP_RAIN_MOD = 'precip_rain_mod'
PRECIP_SNOW = 'precip_snow'

stormwater_monitoring_datasheet_extraction.lib.load_datasheets module

Top-level module for stormwater monitoring datasheet ETL.

stormwater_monitoring_datasheet_extraction.lib.load_datasheets.clean(verified_form_metadata, verified_investigators, verified_site_visits, verified_quantitative_observations, verified_qualitative_observations, verified_site_type_map, verified_creek_type_map)

Clean the user-verified extraction.

Clean and validates the user-verified extraction data, ensuring it is in a consistent format, appropriate data types, within specified ranges, etc., and ready to load.

Parameters:
  • verified_form_metadata (DataFrame[FormVerified]) – The user-verified metadata.

  • verified_investigators (DataFrame[FormInvestigatorVerified]) – The user-verified investigators.

  • verified_site_visits (DataFrame[SiteVisitVerified]) – The user-verified site observations.

  • verified_quantitative_observations (DataFrame[QuantitativeObservationsVerified]) – The user-verified quantitative site observations.

  • verified_qualitative_observations (DataFrame[QualitativeObservationsVerified]) – The user-verified qualitative site observations.

  • verified_site_type_map (DataFrame[Site]) – The user-verified site type map.

  • verified_creek_type_map (DataFrame[Creek]) – The user-verified creek type map.

Return type:

tuple[DataFrame[FormCleaned], DataFrame[FormInvestigatorCleaned], DataFrame[SiteVisitCleaned], DataFrame[QuantitativeObservationsCleaned], DataFrame[QualitativeObservationsCleaned], DataFrame[Site], DataFrame[Creek]]

Returns:

Cleaned relational tables, with full enforcement.

stormwater_monitoring_datasheet_extraction.lib.load_datasheets.extract(input_dir)

Extracts data from the images in the input directory.

Using computer vision, extracts data from datasheets.

Parameters:

input_dir (Path) – Path to the directory containing the datasheet images.

Return type:

tuple[DataFrame[FormExtracted], DataFrame[FormInvestigatorExtracted], DataFrame[SiteVisitExtracted], DataFrame[QuantitativeObservationsExtracted], DataFrame[QualitativeObservationsExtracted]]

Returns:

Raw extraction split into normalized relational tables, with no enforcement.

stormwater_monitoring_datasheet_extraction.lib.load_datasheets.load(restructured_json, output_dir)

Load the cleaned data into the output directory.

Saves the cleaned data to the specified output directory in a structured format. If the output directory does not exist, it will be created.

Parameters:
  • restructured_json (dict[str, Any]) – The restructured JSON schema.

  • output_dir (Path) – The directory where the cleaned data will be saved. If empty path, defaults to a dated directory in the current working directory.

Return type:

Path

Returns:

Path to the saved cleaned data file.

stormwater_monitoring_datasheet_extraction.lib.load_datasheets.preclean(raw_form_metadata, raw_investigators, raw_site_visits, raw_quantitative_observations, raw_qualitative_observations)

Preclean the raw extraction.

Parameters:
Return type:

tuple[DataFrame[FormPrecleaned], DataFrame[FormInvestigatorPrecleaned], DataFrame[SiteVisitPrecleaned], DataFrame[QuantitativeObservationsPrecleaned], DataFrame[QualitativeObservationsPrecleaned]]

Returns:

Precleaned relational tables, with no enforcement.

stormwater_monitoring_datasheet_extraction.lib.load_datasheets.restructure_extraction(cleaned_form_metadata, cleaned_investigators, cleaned_site_visits, cleaned_quantitative_observations, cleaned_qualitative_observations, cleaned_site_type_map, cleaned_creek_type_map)

Restructure the cleaned extraction into a JSON schema.

Parameters:
  • cleaned_form_metadata (DataFrame[FormCleaned]) – The cleaned metadata.

  • cleaned_investigators (DataFrame[FormInvestigatorCleaned]) – The cleaned investigators.

  • cleaned_site_visits (DataFrame[SiteVisitCleaned]) – The cleaned site observations.

  • cleaned_quantitative_observations (DataFrame[QuantitativeObservationsCleaned]) – The cleaned quantitative site observations.

  • cleaned_qualitative_observations (DataFrame[QualitativeObservationsCleaned]) – The cleaned qualitative site observations.

  • cleaned_site_type_map (DataFrame[Site]) – The cleaned site type map.

  • cleaned_creek_type_map (DataFrame[Creek]) – The cleaned creek type map.

Return type:

dict[str, Any]

Returns:

Cleaned relational tables restructured into JSON schema.

stormwater_monitoring_datasheet_extraction.lib.load_datasheets.run_etl(input_dir, output_dir)

Extracts, verifies, cleans, and loads datasheet images.

Extracts data from the images in the input directory, verifies the extraction with the user, cleans and validates the data, and loads it into the output directory.

Parameters:
  • input_dir (Path) – Path to the input directory containing datasheet images.

  • output_dir (Path) – Path to the output directory where processed data will be saved. If empty path, defaults to a dated directory in the current working directory.

Return type:

Path

Returns:

Path to the saved cleaned data file.

stormwater_monitoring_datasheet_extraction.lib.load_datasheets.verify(precleaned_form_metadata, precleaned_investigators, precleaned_site_visits, precleaned_quantitative_observations, precleaned_qualitative_observations)

Verifies the raw extraction with the user.

Prompts user to check each image against each extraction and edit as needed.

Parameters:
Return type:

tuple[DataFrame[FormVerified], DataFrame[FormInvestigatorVerified], DataFrame[SiteVisitVerified], DataFrame[QuantitativeObservationsVerified], DataFrame[QualitativeObservationsVerified], DataFrame[Site], DataFrame[Creek]]

Returns:

User-verified relational tables, with some enforcement.

Module contents

Library for loading stormwater monitoring datasheets.