stormwater_monitoring_datasheet_extraction.lib package¶
Subpackages¶
- stormwater_monitoring_datasheet_extraction.lib.db package
- stormwater_monitoring_datasheet_extraction.lib.schema package
- Subpackages
- Submodules
- stormwater_monitoring_datasheet_extraction.lib.schema.schema module
CreekFormCleanedFormExtractedFormInvestigatorCleanedFormInvestigatorExtractedFormInvestigatorPrecleanedFormInvestigatorVerifiedFormPrecleanedFormVerifiedFormVerified.ConfigFormVerified.cityFormVerified.dateFormVerified.date_le_today()FormVerified.form_typeFormVerified.form_versionFormVerified.is_valid_date()FormVerified.is_valid_time()FormVerified.notesFormVerified.past_24hr_rainfallFormVerified.tide_datetime_le_now()FormVerified.tide_heightFormVerified.tide_timeFormVerified.weather
QualitativeObservationsCleanedQualitativeObservationsExtractedQualitativeObservationsPrecleanedQualitativeObservationsVerifiedQuantitativeObservationsCleanedQuantitativeObservationsExtractedQuantitativeObservationsExtracted.ConfigQuantitativeObservationsExtracted.DO_mg_per_lQuantitativeObservationsExtracted.SPS_micro_S_per_cmQuantitativeObservationsExtracted.air_tempQuantitativeObservationsExtracted.bottle_noQuantitativeObservationsExtracted.flowQuantitativeObservationsExtracted.flow_compared_to_expectedQuantitativeObservationsExtracted.form_idQuantitativeObservationsExtracted.pHQuantitativeObservationsExtracted.salinity_pptQuantitativeObservationsExtracted.site_idQuantitativeObservationsExtracted.water_temp
QuantitativeObservationsPrecleanedQuantitativeObservationsVerifiedQuantitativeObservationsVerified.ConfigQuantitativeObservationsVerified.DO_mg_per_lQuantitativeObservationsVerified.SPS_micro_S_per_cmQuantitativeObservationsVerified.air_tempQuantitativeObservationsVerified.bottle_noQuantitativeObservationsVerified.bottle_no_unique_by_form_id()QuantitativeObservationsVerified.flowQuantitativeObservationsVerified.flow_compared_to_expectedQuantitativeObservationsVerified.pHQuantitativeObservationsVerified.salinity_pptQuantitativeObservationsVerified.site_idQuantitativeObservationsVerified.water_temp
SiteSiteVisitCleanedSiteVisitExtractedSiteVisitPrecleanedSiteVisitVerified
- Module contents
Submodules¶
stormwater_monitoring_datasheet_extraction.lib.constants module¶
Constants for the lib module.
- class stormwater_monitoring_datasheet_extraction.lib.constants.CharLimits¶
Bases:
objectCharacter limits for fields.
-
DESCRIPTION:
Final[int] = 250¶
-
NOTES:
Final[int] = 500¶
-
DESCRIPTION:
- class stormwater_monitoring_datasheet_extraction.lib.constants.City(*values)¶
Bases:
StrEnumOptions for the city field.
- BELLINGHAM = 'Bellingham'¶
- class stormwater_monitoring_datasheet_extraction.lib.constants.Columns¶
Bases:
objectColumn name constants.
-
AIR_TEMP:
Final[str] = 'air_temp'¶
-
ARRIVAL_TIME:
Final[str] = 'arrival_time'¶
-
BACTERIA_BOTTLE_NO:
Final[str] = 'bacteria_bottle_no'¶
-
CITY:
Final[str] = 'city'¶
-
COLOR:
Final[str] = 'color'¶
-
CREEK_SITE_ID:
Final[str] = 'creek_site_id'¶
-
CREEK_TYPE:
Final[str] = 'creek_type'¶
-
DATA_TYPE:
Final[str] = 'data_type'¶
-
DATE:
Final[str] = 'date'¶
-
DESCRIPTION:
Final[str] = 'description'¶
-
DO_MG_PER_L:
Final[str] = 'DO_mg_per_l'¶
-
END_TIME:
Final[str] = 'end_time'¶
-
FLOW:
Final[str] = 'flow'¶
-
FLOW_COMPARED_TO_EXPECTED:
Final[str] = 'flow_compared_to_expected'¶
-
FORMAT:
Final[str] = 'format'¶
-
FORMS:
Final[str] = 'forms'¶
-
FORM_ID:
Final[str] = 'form_id'¶
-
FORM_TYPE:
Final[str] = 'form_type'¶
-
FORM_VERSION:
Final[str] = 'form_version'¶
-
HABITAT:
Final[str] = 'habitat'¶
-
INCLUSIVE:
Final[str] = 'inclusive'¶
-
INVESTIGATOR:
Final[str] = 'investigator'¶
-
INVESTIGATORS:
Final[str] = 'investigators'¶
-
LOWER:
Final[str] = 'lower'¶
-
METADATA:
Final[str] = 'metadata'¶
-
MIGRATE:
Final[str] = 'migrate'¶
-
NOTES:
Final[str] = 'notes'¶
-
OBSERVATIONS:
Final[str] = 'observations'¶
-
OBSERVATION_TYPE:
Final[str] = 'observation_type'¶
-
ODOR:
Final[str] = 'odor'¶
-
OPTIONS:
Final[str] = 'options'¶
-
OUTFALL_TYPE:
Final[str] = 'outfall_type'¶
-
PAST_24HR_RAINFALL:
Final[str] = 'past_24hr_rainfall'¶
-
PH:
Final[str] = 'pH'¶
-
RANK:
Final[str] = 'rank'¶
-
REAR:
Final[str] = 'rear'¶
-
REFERENCE_VALUE:
Final[str] = 'reference_value'¶
-
SALINITY_PPT:
Final[str] = 'salinity_ppt'¶
-
SITE:
Final[str] = 'site'¶
-
SITE_ID:
Final[str] = 'site_id'¶
-
SPAWN:
Final[str] = 'spawn'¶
-
SPS_MICRO_S_PER_CM:
Final[str] = 'SPS_micro_S_per_cm'¶
-
START_TIME:
Final[str] = 'start_time'¶
-
THRESHOLDS:
Final[str] = 'thresholds'¶
-
TIDE_HEIGHT:
Final[str] = 'tide_height'¶
-
TIDE_TIME:
Final[str] = 'tide_time'¶
-
UNITS:
Final[str] = 'units'¶
-
UPPER:
Final[str] = 'upper'¶
-
VALUE:
Final[str] = 'value'¶
-
VISUAL:
Final[str] = 'visual'¶
-
WATER_TEMP:
Final[str] = 'water_temp'¶
-
WEATHER:
Final[str] = 'weather'¶
-
AIR_TEMP:
- class stormwater_monitoring_datasheet_extraction.lib.constants.CreekType(*values)¶
Bases:
StrEnumOptions for the creek type field.
- HABITAT = 'habitat'¶
- MIGRATE = 'migrate'¶
- REAR = 'rear'¶
- SPAWN = 'spawn'¶
- class stormwater_monitoring_datasheet_extraction.lib.constants.DocStrings¶
Bases:
objectDocstrings for top-level modules.
-
RUN_ETL:
Final[DocString] = <comb_utils.lib.docs.DocString object>¶
-
RUN_ETL:
- class stormwater_monitoring_datasheet_extraction.lib.constants.Flow(*values)¶
Bases:
StrEnumOptions for the flow field.
- H = 'H'¶
- M = 'M'¶
- T = 'T'¶
- class stormwater_monitoring_datasheet_extraction.lib.constants.FlowComparedToExpected(*values)¶
Bases:
StrEnumOptions for the flow compared to expected field.
- HIGHER = 'Higher'¶
- LOWER = 'Lower'¶
- NORMAL = 'Normal'¶
- class stormwater_monitoring_datasheet_extraction.lib.constants.FormType(*values)¶
Bases:
StrEnumOptions for the form type field.
- FIELD_DATASHEET_FOSS = 'field_datasheet_FOSS'¶
- class stormwater_monitoring_datasheet_extraction.lib.constants.OutfallType(*values)¶
Bases:
StrEnumOptions for the outfall type field.
- CREEK = 'creek'¶
- OUTFALL = 'outfall'¶
- class stormwater_monitoring_datasheet_extraction.lib.constants.QualitativeSiteObservationTypes(*values)¶
Bases:
StrEnumOptions for the qualitative site observation types field.
- COLOR = 'color'¶
- ODOR = 'odor'¶
- VISUAL = 'visual'¶
- class stormwater_monitoring_datasheet_extraction.lib.constants.Rank(*values)¶
Bases:
IntEnumOptions for the rank field.
- ONE = 1¶
- THREE = 3¶
- TWO = 2¶
- ZERO = 0¶
- class stormwater_monitoring_datasheet_extraction.lib.constants.Units(*values)¶
Bases:
StrEnumOptions for the units field.
- CELSIUS = 'Celsius'¶
- FEET = 'feet'¶
- INCHES = 'inches'¶
- MG_PER_L = 'mg/l'¶
- MICRO_S_PER_CM = 'microS/cm'¶
- PH = 'pH'¶
- PPT = 'ppt'¶
- class stormwater_monitoring_datasheet_extraction.lib.constants.Weather(*values)¶
Bases:
StrEnumOptions for the weather field.
- CLOUD_CLEAR = 'cloud_clear'¶
- CLOUD_OVER = 'cloud_over'¶
- CLOUD_PART = 'cloud_part'¶
- PRECIP_RAIN_HEAVY = 'precip_rain_heavy'¶
- PRECIP_RAIN_LIGHT = 'precip_rain_light'¶
- PRECIP_RAIN_MOD = 'precip_rain_mod'¶
- PRECIP_SNOW = 'precip_snow'¶
stormwater_monitoring_datasheet_extraction.lib.load_datasheets module¶
Top-level module for stormwater monitoring datasheet ETL.
- stormwater_monitoring_datasheet_extraction.lib.load_datasheets.clean(verified_form_metadata, verified_investigators, verified_site_visits, verified_quantitative_observations, verified_qualitative_observations, verified_site_type_map, verified_creek_type_map)¶
Clean the user-verified extraction.
Clean and validates the user-verified extraction data, ensuring it is in a consistent format, appropriate data types, within specified ranges, etc., and ready to load.
- Parameters:
verified_form_metadata (
DataFrame[FormVerified]) – The user-verified metadata.verified_investigators (
DataFrame[FormInvestigatorVerified]) – The user-verified investigators.verified_site_visits (
DataFrame[SiteVisitVerified]) – The user-verified site observations.verified_quantitative_observations (
DataFrame[QuantitativeObservationsVerified]) – The user-verified quantitative site observations.verified_qualitative_observations (
DataFrame[QualitativeObservationsVerified]) – The user-verified qualitative site observations.verified_site_type_map (
DataFrame[Site]) – The user-verified site type map.verified_creek_type_map (
DataFrame[Creek]) – The user-verified creek type map.
- Return type:
tuple[DataFrame[FormCleaned],DataFrame[FormInvestigatorCleaned],DataFrame[SiteVisitCleaned],DataFrame[QuantitativeObservationsCleaned],DataFrame[QualitativeObservationsCleaned],DataFrame[Site],DataFrame[Creek]]- Returns:
Cleaned relational tables, with full enforcement.
- stormwater_monitoring_datasheet_extraction.lib.load_datasheets.extract(input_dir)¶
Extracts data from the images in the input directory.
Using computer vision, extracts data from datasheets.
- Parameters:
input_dir (
Path) – Path to the directory containing the datasheet images.- Return type:
tuple[DataFrame[FormExtracted],DataFrame[FormInvestigatorExtracted],DataFrame[SiteVisitExtracted],DataFrame[QuantitativeObservationsExtracted],DataFrame[QualitativeObservationsExtracted]]- Returns:
Raw extraction split into normalized relational tables, with no enforcement.
- stormwater_monitoring_datasheet_extraction.lib.load_datasheets.load(restructured_json, output_dir)¶
Load the cleaned data into the output directory.
Saves the cleaned data to the specified output directory in a structured format. If the output directory does not exist, it will be created.
- Parameters:
restructured_json (
dict[str,Any]) – The restructured JSON schema.output_dir (
Path) – The directory where the cleaned data will be saved. If empty path, defaults to a dated directory in the current working directory.
- Return type:
Path- Returns:
Path to the saved cleaned data file.
- stormwater_monitoring_datasheet_extraction.lib.load_datasheets.preclean(raw_form_metadata, raw_investigators, raw_site_visits, raw_quantitative_observations, raw_qualitative_observations)¶
Preclean the raw extraction.
- Parameters:
raw_form_metadata (
DataFrame[FormExtracted]) – Metadata extracted from the datasheets.raw_investigators (
DataFrame[FormInvestigatorExtracted]) – FormInvestigator extracted from the datasheets.raw_site_visits (
DataFrame[SiteVisitExtracted]) – Site observations extracted from the datasheets.raw_quantitative_observations (
DataFrame[QuantitativeObservationsExtracted]) – Quantitative site observations extracted from the datasheets.raw_qualitative_observations (
DataFrame[QualitativeObservationsExtracted]) – Qualitative site observations extracted from the datasheets.
- Return type:
tuple[DataFrame[FormPrecleaned],DataFrame[FormInvestigatorPrecleaned],DataFrame[SiteVisitPrecleaned],DataFrame[QuantitativeObservationsPrecleaned],DataFrame[QualitativeObservationsPrecleaned]]- Returns:
Precleaned relational tables, with no enforcement.
- stormwater_monitoring_datasheet_extraction.lib.load_datasheets.restructure_extraction(cleaned_form_metadata, cleaned_investigators, cleaned_site_visits, cleaned_quantitative_observations, cleaned_qualitative_observations, cleaned_site_type_map, cleaned_creek_type_map)¶
Restructure the cleaned extraction into a JSON schema.
- Parameters:
cleaned_form_metadata (
DataFrame[FormCleaned]) – The cleaned metadata.cleaned_investigators (
DataFrame[FormInvestigatorCleaned]) – The cleaned investigators.cleaned_site_visits (
DataFrame[SiteVisitCleaned]) – The cleaned site observations.cleaned_quantitative_observations (
DataFrame[QuantitativeObservationsCleaned]) – The cleaned quantitative site observations.cleaned_qualitative_observations (
DataFrame[QualitativeObservationsCleaned]) – The cleaned qualitative site observations.cleaned_site_type_map (
DataFrame[Site]) – The cleaned site type map.cleaned_creek_type_map (
DataFrame[Creek]) – The cleaned creek type map.
- Return type:
dict[str,Any]- Returns:
Cleaned relational tables restructured into JSON schema.
- stormwater_monitoring_datasheet_extraction.lib.load_datasheets.run_etl(input_dir, output_dir)¶
Extracts, verifies, cleans, and loads datasheet images.
Extracts data from the images in the input directory, verifies the extraction with the user, cleans and validates the data, and loads it into the output directory.
- Parameters:
input_dir (
Path) – Path to the input directory containing datasheet images.output_dir (
Path) – Path to the output directory where processed data will be saved. If empty path, defaults to a dated directory in the current working directory.
- Return type:
Path- Returns:
Path to the saved cleaned data file.
- stormwater_monitoring_datasheet_extraction.lib.load_datasheets.verify(precleaned_form_metadata, precleaned_investigators, precleaned_site_visits, precleaned_quantitative_observations, precleaned_qualitative_observations)¶
Verifies the raw extraction with the user.
Prompts user to check each image against each extraction and edit as needed.
- Parameters:
precleaned_form_metadata (
DataFrame[FormPrecleaned]) – The precleaned metadata.precleaned_investigators (
DataFrame[FormInvestigatorPrecleaned]) – The precleaned investigators.precleaned_site_visits (
DataFrame[SiteVisitPrecleaned]) – The precleaned site observations.precleaned_quantitative_observations (
DataFrame[QuantitativeObservationsPrecleaned]) – The precleaned quantitative site observations.precleaned_qualitative_observations (
DataFrame[QualitativeObservationsPrecleaned]) – The precleaned qualitative site observations.
- Return type:
tuple[DataFrame[FormVerified],DataFrame[FormInvestigatorVerified],DataFrame[SiteVisitVerified],DataFrame[QuantitativeObservationsVerified],DataFrame[QualitativeObservationsVerified],DataFrame[Site],DataFrame[Creek]]- Returns:
User-verified relational tables, with some enforcement.
Module contents¶
Library for loading stormwater monitoring datasheets.