ctdam.parser package

Submodules

ctdam.parser.bottlefile module

class ctdam.parser.bottlefile.BottleFile(path_to_file, only_header=False)[source]

Bases: DataFile

Class that represents a SeaBird Bottle File. Organizes the files table information into a pandas dataframe. This allows the usage of this powerful library for statistics, visualization, data manipulation, export, etc.

create_dataframe()[source]

Creates a dataframe out of the btl file. Manages the double data header correctly.

adding_timestamp_column()[source]

Creates a timestamp column that holds both, Date and Time information.

setting_dataframe_dtypes()[source]

Sets the types for the column values in the dataframe.

selecting_rows(df=None, statistic_of_interest=['avg'])[source]

Creates a dataframe with the given row identifier, using the statistics column. A single string or a list of strings can be processed.

Parameters:
  • df (pandas.Dataframe :) – the files Pandas representation (Default value = self.df)

  • statistic_of_interest (list | str) –

    collection of values of the ‘statistics’ column in self.df

    (Default value = [‘avg’])

reading_data_header()[source]

Identifies and separatly collects the rows that specify the data tables headers.

add_station_and_event_column()[source]
add_position_columns()[source]

ctdam.parser.bottlelogfile module

class ctdam.parser.bottlelogfile.BottleLogFile(path_to_file, create_dataframe=False)[source]

Bases: DataFile

Bottle Log file representation, that extracts the three different data types from the file: reset time and the table with bottle IDs and corresponding data ranges.

data_whitespace_removal()[source]

Strips the input from whitespace characters, in this case especially newline characters.

Return type:

list

obtaining_reset_time()[source]

Reading reset time with small input check.

Return type:

datetime

create_list()[source]

Creates a list of usable data from the list specified in self.data. the list consists of: an array of ID’s representing the bottles, the date and time of the data sample and the lines of the cnv corresponding to the bottles

Return type:

list

convert_date(date)[source]

Converts the Dates of the .bl files to an ISO 8601 standard

Return type:

a string with the date in the form of “yymmddThhmmss”

create_dataframe()[source]

Creates a dataframe from the list specified in self.data.

Return type:

DataFrame

ctdam.parser.cnvfile module

class ctdam.parser.cnvfile.CnvFile(path_to_file, only_header=False, create_dataframe=False, absolute_time_calculation=False, event_log_column=False, coordinate_columns=False)[source]

Bases: DataFile

A representation of a cnv-file as used by SeaBird.

This class intends to fully extract and organize the different types of data and metadata present inside of such a file. Downstream libraries shall be able to use this representation for all applications concerning cnv files, like data processing, transformation or visualization.

To achieve that, the metadata header is organized by the parent-class, DataFile, while the data table is extracted by this class. The data representation can be a numpy array or pandas dataframe. The handling of the data is mostly done inside parameters, a representation of the individual measurement parameter data and metadata.

This class is also able to parse the edited data and metadata back to the original .cnv file format, allowing for custom data processing using this representation, while still being able to use Sea-Birds original software on that output. It also allows to stay comparable with other parsers or methods in general.

Parameters:
  • path_to_file (Path | str) – the path to the file

  • only_header (bool) – Whether to stop reading the file after the metadata header.

  • create_dataframe (bool) – Whether to create a pandas DataFrame from the data table.

  • absolute_time_calculation (bool) – whether to use a real timestamp instead of the second count

  • event_log_column (bool) – whether to add a station and device event column from DSHIP

  • coordinate_columns (bool) – whether to add longitude and latitude from the extra metadata header

create_dataframe()[source]

Plain dataframe creator.

Return type:

DataFrame

absolute_time_calculation()[source]

Replaces the basic cnv time representation of counting relative to the casts start point, by real UTC timestamps. This operation will act directly on the dataframe.

Return type:

bool

add_start_time()[source]

Adds the Cast start time to the dataframe. Necessary for joins on the time.

Return type:

bool

get_processing_step_infos()[source]

Collects the individual validation modules and their respective information, usually present in key-value pairs.

Return type:

CnvProcessingSteps

df2cnv(df=None)[source]

Parses a pandas dataframe into a list that represents the lines inside of a cnv data table.

Parameters:

df (DataFrame | None)

Return type:

list

array2cnv()[source]
Return type:

list

to_cnv(file_name=None, use_dataframe=False)[source]

Writes the values inside of this instance as a new cnv file to disc.

Parameters:
  • file_name (Path | str | None) – the new file name to use for writing

  • use_current_df (bool:) – whether to use the current dataframe as data table

  • use_current_validation_header (bool:) – whether to use the current processing module list

  • header_list (list:) – the data columns to use for the export

to_ctd_data()[source]
add_processing_metadata(module, key, value)[source]

Adds new processing lines to the list of processing module information

Parameters:
  • module (str) – the name of the processing module

  • key (str) – the description of the value

  • value (str) – the information

add_station_and_event_column()[source]

Adds a column with the DSHIP station and device event numbers to the dataframe. These must be present inside the extra metadata header.

Return type:

bool

add_position_columns()[source]

Adds a column with the longitude and latitude to the dataframe. These must be present inside the extra metadata header.

Return type:

bool

add_cast_number(number=None)[source]

Adds a column with the cast number to the dataframe.

Parameters:

number (int | None) – the cast number of this files cast

Return type:

bool

ctdam.parser.ctddata module

class ctdam.parser.ctddata.CTDData(parameters, metadata_source, processing_steps=[])[source]

Bases: object

get_cast_borders_dict()[source]
Return type:

dict

update_salinity()[source]
array2cnv(parameters=None, bad_flag=-9.99e-29)[source]
Return type:

list

parse_output_sensor_info()[source]
Return type:

list

get_processing_info()[source]
Return type:

list

create_header(parameters=None, reduced_header=False)[source]

Re-creates the cnv header.

Return type:

list

extra_data_table_desc(data_table_description, system_utc)[source]
Return type:

list

drop_flagged_rows(parameters=None)[source]
pick_output_columns(parameters, mode='all')[source]
to_cnv(file_path='', remove_flags=True, output_parameters='all', reduced_header=False, bad_flag=-9.99e-29)[source]
Return type:

Tuple[Parameters, list]

ctdam.parser.datafiles module

class ctdam.parser.datafiles.DataFile(path_to_file, only_header=False)[source]

Bases: object

The base class for all Sea-Bird data files, which are .cnv, .btl, and .bl . One instance of this class, or its children, represents one data text file. The different information bits of such a file are structured into individual lists or dictionaries. The data table will be loaded as numpy array and can be converted to a pandas DataFrame. Datatype-specific behavior is implemented in the subclasses.

Parameters:
  • path_to_file (Path | str) – The file to the data file.

  • only_header (bool) – Whether to stop reading the file after the metadata header.

read_event_information(regex_string='(?P<c>[a-z]{1,3}\\\\d{1,3})(-|_|\\\\/)?(?P<cn>1|2)?(-|_)(?P<s>\\\\d{1,4})(-|_)(?P<e>\\\\d{1,2})', leading_zeroes=False)[source]
read_file()[source]

Reads and structures all the different information present in the file. Lists and Dictionaries are the data structures of choice. Uses basic prefix checking to distinguish different header information.

reading_start_time()[source]

Extracts the Cast start time from the metadata header.

Return type:

datetime | None

sensor_xml_to_flattened_dict(sensor_data)[source]

Reads the pure xml sensor input and creates a multilevel dictionary, dropping the first two dictionaries, as they are single entry only

Parameters:

sensor_data (str) – The raw xml sensor data.

Return type:

list[dict] | dict

structure_metadata(metadata_list)[source]

Creates a dictionary to store custom metadata, of which Sea-Bird allows 12 lines in each file.

Parameters:

metadata_list (list) – a list of the individual lines of metadata found in the file

Return type:

dict

define_output_path(file_path=None, file_name=None, file_type='.csv')[source]

Creates a Path object holding the desired output path.

Parameters:
  • file_path (Path | str | None) – directory the file sits in (Default value = self.file_dir)

  • file_name (str | None) – the original file name (Default value = self.file_name)

  • file_type (str) – the output file type (Default = ‘.csv’)

Return type:

Path

to_csv(data, with_header=True, output_file_path=None, output_file_name=None)[source]

Writes a csv from the given data.

Parameters:
  • data (DataFrame | ndarray) – The source data to use.

  • with_header (bool) –

    indicating whether the header shall appear in the output

    (Default value = True)

  • output_file_path (Path | str | None) – file directory (Default value = None)

  • output_file_name (str | None) – original file name (Default value = None)

selecting_columns(list_of_columns, df)[source]

Alters the dataframe to only hold the given columns.

Parameters:
  • list_of_columns (list | str)

  • df (DataFrame) – Dataframe (Default value = None)

ctdam.parser.file_collection module

ctdam.parser.file_collection.get_collection(path_to_files, file_suffix='cnv', only_metadata=False, pattern='', sorting_key=None)[source]

Factory to create instances of FileCollection, depending on input type.

Parameters:
  • path_to_files (Path | str) – The path to the directory to search for files.

  • file_suffix (str) – The suffix to search for. (Default value = “cnv”)

  • only_metadata (bool) – Whether to read only metadata. (Default value = False)

  • pattern (str) – A filter for file selection. (Default value = ‘’)

  • sorting_key (Callable | None) – A callable that returns the filename-part to use to sort the collection. (Default value = None)

Return type:

Type[FileCollection]

class ctdam.parser.file_collection.FileCollection(path_to_files, file_suffix, only_metadata=False, pattern='', sorting_key=None)[source]

Bases: UserList

A representation of multiple files of the same kind. These files share the same suffix and are otherwise closely connected to each other. A common use case would be the collection of CNVs to allow for easier processing or integration of field calibration measurements.

Parameters:
  • path_to_files (str | Path) – The path to the directory to search for files.

  • file_suffix (str) – The suffix to search for. (Default value = “cnv”)

  • only_metadata (bool) – Whether to read only metadata. (Default value = False)

  • pattern (str) – A filter for file selection. (Default value = ‘’)

  • sorting_key (Callable | None) – A callable that returns the filename-part to use to sort the collection. (Default value = None)

extract_file_type(suffix)[source]

Determines the file type using the input suffix.

Parameters:

suffix (str) – The file suffix.

Return type:

Type[DataFile]

collect_files(pattern='', sorting_key=<function FileCollection.<lambda>>)[source]

Creates a list of target files, recursively from the given directory. These can be sorted with the help of the sorting_key parameter, which is a Callable that identifies the part of the filename that shall be used for sorting.

Parameters:
  • pattern (str) – A filter for file selection. Is given to rglob. (Default value = ‘’)

  • sorting_key (Callable | None) – The part of the filename to use in sorting. (Default value = lambda file: int(file.stem.split(“_”)[3]))

Return type:

list[Path]

load_files(only_metadata=False)[source]

Creates python instances of each file.

Parameters:

only_metadata (bool) – Whether to load only file metadata. (Default value = False)

Return type:

list[DataFile]

get_dataframes(event_log=False, coordinates=False, time_correction=False, cast_identifier=False)[source]

Collects all individual dataframes and allows additional column creation.

Parameters:
  • event_log (bool) – (Default value = False)

  • coordinates (bool) – (Default value = False)

  • time_correction (bool) – (Default value = False)

  • cast_identifier (bool) – (Default value = False)

Return type:

list[DataFrame]

get_collection_dataframe(list_of_dfs=None)[source]

Creates one DataFrame from the individual ones, by concatenation.

Parameters:

list_of_dfs (list[DataFrame] | None) – A list of the individual DataFrames. (Default value = None)

Return type:

DataFrame

tidy_collection_dataframe(df)[source]

Apply the different dataframe edits to the given dataframe.

Parameters:

df (DataFrame) – A DataFrame to edit.

Return type:

DataFrame

use_bad_flag_for_nan(df)[source]

Replace all Nan values by the bad flag value, defined inside the files.

Parameters:

df (DataFrame) – The dataframe to edit.

Return type:

DataFrame

set_dtype_to_float(df)[source]

Use the float-dtype for all DataFrame columns.

Parameters:

df (DataFrame) – The dataframe to edit.

Return type:

DataFrame

select_real_scan_data(df)[source]

Drop data rows have no ‘Scan’ value, if that column exists.

Parameters:

df (DataFrame) – The dataframe to edit.

Return type:

DataFrame

to_csv(file_name)[source]

Writes a csv file with the given filename.

Parameters:

file_name – The new csv file name.

class ctdam.parser.file_collection.CnvCollection(*args, **kwargs)[source]

Bases: FileCollection

Specific methods to work with collections of .cnv files.

get_dataframes(event_log=False, coordinates=False, time_correction=False, cast_identifier=False)[source]

Collects all individual dataframes and allows additional column creation.

Parameters:
  • event_log (bool) – (Default value = False)

  • coordinates (bool) – (Default value = False)

  • time_correction (bool) – (Default value = False)

  • cast_identifier (bool) – (Default value = False)

Return type:

list[DataFrame]

get_data_table_meta_info()[source]

Ensures the same data description in all input cnv files and returns it. Acts as an early alarm when working on different kinds of files, which cannot be concatenated together.

Return type:

list[dict]

get_array()[source]

Creates a collection array of all individual file arrays.

Return type:

ndarray

get_processing_steps()[source]

Checks the processing steps in the different files for consistency. Returns the steps of the first file, which should be the same as for all other files.

Return type:

list

class ctdam.parser.file_collection.HexCollection(*args, xmlcon_pattern='', path_to_xmlcons='', **kwargs)[source]

Bases: FileCollection

Specific methods to work with collections of .hex files.

Especially concerned with the detection of corresponding .XMLCON files.

get_xmlcons()[source]

Returns all .xmlcon files found inside the root directory and its children, matching a given pattern.

Does use the global sorting_key to attempt to also sort the xmlcons the same way. This is meant to be used in the future for a more specific hex-xmlcon matching.

Return type:

list[str]

ctdam.parser.geomar_ctd_file_parser module

class ctdam.parser.geomar_ctd_file_parser.GEOMARCTDFile(path_to_file, only_header=False, create_dataframe=True)[source]

Bases: object

A parser to read .ctd files created by the GEOMAR ctdam.proc software.

Goes through the file line by line and sorts the individual lines in corresponding lists. That way, data and different types of metadata are structured on a basic level. In general, this parser is meant to stick close to the way the Seabird- Parsers are written.

read_file()[source]
create_dataframe()[source]

ctdam.parser.hexfile module

class ctdam.parser.hexfile.HexFile(path_to_file, path_to_xmlcon='', *args, **kwargs)[source]

Bases: DataFile

A representation of a .hex file as used by SeaBird.

Parameters:

path_to_file (Path | str) – the path to the file

get_corresponding_xmlcon(path_to_xmlcon='')[source]

Finds the best matching .xmlcon file inside the same directory.

Parameters:

path_to_xmlcon (Path | str) – A fixed path to a xmlcon file. Will be checked.

Return type:

XMLCONFile | None

ctdam.parser.parameter module

class ctdam.parser.parameter.Parameters(data, metadata, only_header=False, bad_flag=-9.99e-29)[source]

Bases: UserDict

A collection of all the parameters in a CnvFile.

Allows for a much cleaner handling of parameter data and their metadata. Will be heavily expanded.

Parameters:
  • data (list) – The raw data as extraced by DataFile

  • metadata (list) – The raw metadata as extraced by DataFile

get_param_types()[source]
Return type:

list[str]

get_data_length()[source]
Return type:

int

get_full_data_array()[source]
Return type:

ndarray

get_names()[source]
Return type:

list[str]

get_metadata()[source]
Return type:

dict[str, dict]

get_parameter_list()[source]
Return type:

list[Parameter]

set_sample_rate(rate, unit)[source]
get_sample_rate(raw_interval_info='')[source]
Return type:

float

create_full_ndarray(data_table=[])[source]

Builds a numpy array representing the data table in a cnv file.

Parameters:

data_table (list) –

The data to work with

(Default value = [])

Return type:

ndarray

sort_parameters(top=['depSM', 'prDM', 't090C', 't190C', 'sal00', 'sal11', 'sbox0Mm/Kg', 'sbox1Mm/Kg', 'flECO-AFL', 'turbWETntu0', 'par', 'spar'], bottom=['gsw_densityA0', 'gsw_densityA1', 'gsw_saA0', 'gsw_saA1', 'gsw_ctA0', 'gsw_ctA1', 'sbeox0ML/L', 'sbeox1ML/L', 'c0mS/cm', 'c1mS/cm', 'latitude', 'longitude', 'flag'])[source]
Return type:

dict

create_parameter_instances(array_data, metadata)[source]

Differentiates the individual parameter columns into separate parameter instances.

Parameters:

metadata (dict[str, dict]) –

The structured metadata dictionary

(Default value = {})

Return type:

dict[str, Parameter]

add_parameter(parameter, position='')[source]

Adds one parameter instance to the collection.

Parameters:

parameter (Parameter) – The new parameter

create_parameter(data, metadata={}, name='', position='')[source]

Creates a new parameter instance with the given data and metadata.

The input data is either a numpy array or a single value. The single value will be broadcasted to the shape of the data table. A use-case would be the addition of an ‘event’ or ‘cast’ column.

Parameters:
  • data (ndarray | int | float | str | None) – Data to use or expand

  • metadata (dict) –

    Metadata for the new parameter

    (Default value = {})

  • name (str) –

    Name to use for missing metadata values

    (Default value = “”)

Return type:

Parameter

add_default_metadata(name, metadata={}, list_of_keys=['shortname', 'longinfo', 'name', 'metainfo', 'unit'])[source]

Fills up missing metadata points with a default value.

Parameters:
  • name (str) – The value to use as default

  • metadata (dict) –

    The present metadata

    (Default value = {})

  • list_of_keys (list) – The expected metadata keys

Return type:

dict

update_spans()[source]

Updates all spans of the parameters.

get_spans()[source]

Returns all span tuples of the parameters.

Return type:

list[tuple[int, int]]

get_pandas_dataframe()[source]

Returns a pandas DataFrame of the current parameter data.

Return type:

DataFrame

with_name_type(name_type='shortname')[source]

Uses the given name_type as column descriptors.

Parameters:

name_type (str) –

The metadata name to use

(Default value = “shortname”)

reading_data_header(header_info=[])[source]

Reads the tables header data from the header.

Parameters:
  • header_info (list :) – the header values from the file

  • header_info – (Default value = [])

Return type:

Tuple[dict[str, dict], list[int]]

class ctdam.parser.parameter.Parameter(data, metadata, bad_flag=-9.99e-29)[source]

Bases: object

A representation of one parameter in a cnv file.

Consists of the values of the parameter as well as the metadata.

get_pandas_series()[source]

Returns a pandas Series of the current parameter data.

Return type:

Series

use_name(name_type='shortname')[source]

Uses the given name as parameter descriptor.

Parameters:

name_type (str) –

The metadata name to use

(Default value = “shortname”)

parse_to_float()[source]

Tries to parse the data array type to float.

update_span()[source]

Updates the data span.

Uses the first value if dtype is not numeric.

set_output_format()[source]

ctdam.parser.processing_steps module

class ctdam.parser.processing_steps.CnvProcessingSteps(raw_processing_info)[source]

Bases: UserList

A python representation of the individual ctdam.proc steps conducted in the process of a cnv file creation. These modules are stored in a dictionary structure, together with all the variables/metadata/etc. given in the header of a cnv file.

get_names()[source]
Return type:

list[str]

extract_individual_modules(raw_info)[source]
Return type:

list

create_step_instance(module, raw_info)[source]
Parameters:

module (str)

Return type:

ProcessingStep

get_step(step)[source]
Parameters:

module (str :)

Return type:

ProcessingStep | None

add_info(module, key, value)[source]

Adds new processing lines to the list of processing module information

Parameters:
  • module (str) – the name of the processing module

  • key (str) – the description of the value

  • value (str) – the information

Return type:

ProcessingStep | None

class ctdam.parser.processing_steps.ProcessingStep(name, metadata)[source]

Bases: object

Class that is meant to represent one individual processing step, that lead to the current status of the cnv file. Can be a custom processing step or one of the original Sea-Bird ones.

ctdam.parser.xmlfiles module

class ctdam.parser.xmlfiles.XMLFile(path_to_file)[source]

Bases: UserDict

Parent class for XML and psa representation that loads XML as a python-internal tree and as a dict.

Parameters:

path_to_file (Path | str) – the path to the xml file

to_xml(file_name=None, file_path=None)[source]

Writes the dictionary to xml.

Parameters:
  • file_name (str :) – the original files name (Default value = self.file_name)

  • file_path (pathlib.Path :) – the directory of the file (Default value = self.file_dir)

to_json(file_name=None, file_path=None)[source]

Writes the dictionary representation of the XML input to a json file.

Parameters:
  • file_name (str :) – the original files name (Default value = self.file_name)

  • file_path (pathlib.Path :) – the directory of the file (Default value = self.file_dir)

class ctdam.parser.xmlfiles.XMLCONFile(path_to_file)[source]

Bases: XMLFile

get_sensor_info()[source]

Creates a multilevel dictionary, dropping the first four dictionaries, to retrieve pure sensor information.

Return type:

list[dict]

class ctdam.parser.xmlfiles.PsaFile(path_to_file)[source]

Bases: XMLFile

Module contents