ctdam.parser package¶
Submodules¶
ctdam.parser.bottlefile module¶
- class ctdam.parser.bottlefile.BottleFile(path_to_file, only_header=False)[source]¶
Bases:
DataFileClass that represents a SeaBird Bottle File. Organizes the files table information into a pandas dataframe. This allows the usage of this powerful library for statistics, visualization, data manipulation, export, etc.
- create_dataframe()[source]¶
Creates a dataframe out of the btl file. Manages the double data header correctly.
- adding_timestamp_column()[source]¶
Creates a timestamp column that holds both, Date and Time information.
- selecting_rows(df=None, statistic_of_interest=['avg'])[source]¶
Creates a dataframe with the given row identifier, using the statistics column. A single string or a list of strings can be processed.
- Parameters:
df (pandas.Dataframe :) – the files Pandas representation (Default value = self.df)
statistic_of_interest (
list|str) –- collection of values of the ‘statistics’ column in self.df
(Default value = [‘avg’])
ctdam.parser.bottlelogfile module¶
- class ctdam.parser.bottlelogfile.BottleLogFile(path_to_file, create_dataframe=False)[source]¶
Bases:
DataFileBottle Log file representation, that extracts the three different data types from the file: reset time and the table with bottle IDs and corresponding data ranges.
- data_whitespace_removal()[source]¶
Strips the input from whitespace characters, in this case especially newline characters.
- Return type:
list
- create_list()[source]¶
Creates a list of usable data from the list specified in self.data. the list consists of: an array of ID’s representing the bottles, the date and time of the data sample and the lines of the cnv corresponding to the bottles
- Return type:
list
ctdam.parser.cnvfile module¶
- class ctdam.parser.cnvfile.CnvFile(path_to_file, only_header=False, create_dataframe=False, absolute_time_calculation=False, event_log_column=False, coordinate_columns=False)[source]¶
Bases:
DataFileA representation of a cnv-file as used by SeaBird.
This class intends to fully extract and organize the different types of data and metadata present inside of such a file. Downstream libraries shall be able to use this representation for all applications concerning cnv files, like data processing, transformation or visualization.
To achieve that, the metadata header is organized by the parent-class, DataFile, while the data table is extracted by this class. The data representation can be a numpy array or pandas dataframe. The handling of the data is mostly done inside parameters, a representation of the individual measurement parameter data and metadata.
This class is also able to parse the edited data and metadata back to the original .cnv file format, allowing for custom data processing using this representation, while still being able to use Sea-Birds original software on that output. It also allows to stay comparable with other parsers or methods in general.
- Parameters:
path_to_file (
Path|str) – the path to the fileonly_header (
bool) – Whether to stop reading the file after the metadata header.create_dataframe (
bool) – Whether to create a pandas DataFrame from the data table.absolute_time_calculation (
bool) – whether to use a real timestamp instead of the second countevent_log_column (
bool) – whether to add a station and device event column from DSHIPcoordinate_columns (
bool) – whether to add longitude and latitude from the extra metadata header
- absolute_time_calculation()[source]¶
Replaces the basic cnv time representation of counting relative to the casts start point, by real UTC timestamps. This operation will act directly on the dataframe.
- Return type:
bool
- add_start_time()[source]¶
Adds the Cast start time to the dataframe. Necessary for joins on the time.
- Return type:
bool
- get_processing_step_infos()[source]¶
Collects the individual validation modules and their respective information, usually present in key-value pairs.
- Return type:
- df2cnv(df=None)[source]¶
Parses a pandas dataframe into a list that represents the lines inside of a cnv data table.
- Parameters:
df (
DataFrame|None)- Return type:
list
- to_cnv(file_name=None, use_dataframe=False)[source]¶
Writes the values inside of this instance as a new cnv file to disc.
- Parameters:
file_name (
Path|str|None) – the new file name to use for writinguse_current_df (bool:) – whether to use the current dataframe as data table
use_current_validation_header (bool:) – whether to use the current processing module list
header_list (list:) – the data columns to use for the export
- add_processing_metadata(module, key, value)[source]¶
Adds new processing lines to the list of processing module information
- Parameters:
module (
str) – the name of the processing modulekey (
str) – the description of the valuevalue (
str) – the information
- add_station_and_event_column()[source]¶
Adds a column with the DSHIP station and device event numbers to the dataframe. These must be present inside the extra metadata header.
- Return type:
bool
ctdam.parser.ctddata module¶
- class ctdam.parser.ctddata.CTDData(parameters, metadata_source, processing_steps=[])[source]¶
Bases:
object- create_header(parameters=None, reduced_header=False)[source]¶
Re-creates the cnv header.
- Return type:
list
- to_cnv(file_path='', remove_flags=True, output_parameters='all', reduced_header=False, bad_flag=-9.99e-29)[source]¶
- Return type:
Tuple[Parameters,list]
ctdam.parser.datafiles module¶
- class ctdam.parser.datafiles.DataFile(path_to_file, only_header=False)[source]¶
Bases:
objectThe base class for all Sea-Bird data files, which are .cnv, .btl, and .bl . One instance of this class, or its children, represents one data text file. The different information bits of such a file are structured into individual lists or dictionaries. The data table will be loaded as numpy array and can be converted to a pandas DataFrame. Datatype-specific behavior is implemented in the subclasses.
- Parameters:
path_to_file (
Path|str) – The file to the data file.only_header (
bool) – Whether to stop reading the file after the metadata header.
- read_event_information(regex_string='(?P<c>[a-z]{1,3}\\\\d{1,3})(-|_|\\\\/)?(?P<cn>1|2)?(-|_)(?P<s>\\\\d{1,4})(-|_)(?P<e>\\\\d{1,2})', leading_zeroes=False)[source]¶
- read_file()[source]¶
Reads and structures all the different information present in the file. Lists and Dictionaries are the data structures of choice. Uses basic prefix checking to distinguish different header information.
- reading_start_time()[source]¶
Extracts the Cast start time from the metadata header.
- Return type:
datetime|None
- sensor_xml_to_flattened_dict(sensor_data)[source]¶
Reads the pure xml sensor input and creates a multilevel dictionary, dropping the first two dictionaries, as they are single entry only
- Parameters:
sensor_data (
str) – The raw xml sensor data.- Return type:
list[dict] |dict
- structure_metadata(metadata_list)[source]¶
Creates a dictionary to store custom metadata, of which Sea-Bird allows 12 lines in each file.
- Parameters:
metadata_list (
list) – a list of the individual lines of metadata found in the file- Return type:
dict
- define_output_path(file_path=None, file_name=None, file_type='.csv')[source]¶
Creates a Path object holding the desired output path.
- Parameters:
file_path (
Path|str|None) – directory the file sits in (Default value = self.file_dir)file_name (
str|None) – the original file name (Default value = self.file_name)file_type (
str) – the output file type (Default = ‘.csv’)
- Return type:
Path
- to_csv(data, with_header=True, output_file_path=None, output_file_name=None)[source]¶
Writes a csv from the given data.
- Parameters:
data (
DataFrame|ndarray) – The source data to use.with_header (
bool) –- indicating whether the header shall appear in the output
(Default value = True)
output_file_path (
Path|str|None) – file directory (Default value = None)output_file_name (
str|None) – original file name (Default value = None)
ctdam.parser.file_collection module¶
- ctdam.parser.file_collection.get_collection(path_to_files, file_suffix='cnv', only_metadata=False, pattern='', sorting_key=None)[source]¶
Factory to create instances of FileCollection, depending on input type.
- Parameters:
path_to_files (
Path|str) – The path to the directory to search for files.file_suffix (
str) – The suffix to search for. (Default value = “cnv”)only_metadata (
bool) – Whether to read only metadata. (Default value = False)pattern (
str) – A filter for file selection. (Default value = ‘’)sorting_key (
Callable|None) – A callable that returns the filename-part to use to sort the collection. (Default value = None)
- Return type:
Type[FileCollection]
- class ctdam.parser.file_collection.FileCollection(path_to_files, file_suffix, only_metadata=False, pattern='', sorting_key=None)[source]¶
Bases:
UserListA representation of multiple files of the same kind. These files share the same suffix and are otherwise closely connected to each other. A common use case would be the collection of CNVs to allow for easier processing or integration of field calibration measurements.
- Parameters:
path_to_files (
str|Path) – The path to the directory to search for files.file_suffix (
str) – The suffix to search for. (Default value = “cnv”)only_metadata (
bool) – Whether to read only metadata. (Default value = False)pattern (
str) – A filter for file selection. (Default value = ‘’)sorting_key (
Callable|None) – A callable that returns the filename-part to use to sort the collection. (Default value = None)
- extract_file_type(suffix)[source]¶
Determines the file type using the input suffix.
- Parameters:
suffix (
str) – The file suffix.- Return type:
Type[DataFile]
- collect_files(pattern='', sorting_key=<function FileCollection.<lambda>>)[source]¶
Creates a list of target files, recursively from the given directory. These can be sorted with the help of the sorting_key parameter, which is a Callable that identifies the part of the filename that shall be used for sorting.
- Parameters:
pattern (
str) – A filter for file selection. Is given to rglob. (Default value = ‘’)sorting_key (
Callable|None) – The part of the filename to use in sorting. (Default value = lambda file: int(file.stem.split(“_”)[3]))
- Return type:
list[Path]
- load_files(only_metadata=False)[source]¶
Creates python instances of each file.
- Parameters:
only_metadata (
bool) – Whether to load only file metadata. (Default value = False)- Return type:
list[DataFile]
- get_dataframes(event_log=False, coordinates=False, time_correction=False, cast_identifier=False)[source]¶
Collects all individual dataframes and allows additional column creation.
- Parameters:
event_log (
bool) – (Default value = False)coordinates (
bool) – (Default value = False)time_correction (
bool) – (Default value = False)cast_identifier (
bool) – (Default value = False)
- Return type:
list[DataFrame]
- get_collection_dataframe(list_of_dfs=None)[source]¶
Creates one DataFrame from the individual ones, by concatenation.
- Parameters:
list_of_dfs (
list[DataFrame] |None) – A list of the individual DataFrames. (Default value = None)- Return type:
DataFrame
- tidy_collection_dataframe(df)[source]¶
Apply the different dataframe edits to the given dataframe.
- Parameters:
df (
DataFrame) – A DataFrame to edit.- Return type:
DataFrame
- use_bad_flag_for_nan(df)[source]¶
Replace all Nan values by the bad flag value, defined inside the files.
- Parameters:
df (
DataFrame) – The dataframe to edit.- Return type:
DataFrame
- set_dtype_to_float(df)[source]¶
Use the float-dtype for all DataFrame columns.
- Parameters:
df (
DataFrame) – The dataframe to edit.- Return type:
DataFrame
- class ctdam.parser.file_collection.CnvCollection(*args, **kwargs)[source]¶
Bases:
FileCollectionSpecific methods to work with collections of .cnv files.
- get_dataframes(event_log=False, coordinates=False, time_correction=False, cast_identifier=False)[source]¶
Collects all individual dataframes and allows additional column creation.
- Parameters:
event_log (
bool) – (Default value = False)coordinates (
bool) – (Default value = False)time_correction (
bool) – (Default value = False)cast_identifier (
bool) – (Default value = False)
- Return type:
list[DataFrame]
- class ctdam.parser.file_collection.HexCollection(*args, xmlcon_pattern='', path_to_xmlcons='', **kwargs)[source]¶
Bases:
FileCollectionSpecific methods to work with collections of .hex files.
Especially concerned with the detection of corresponding .XMLCON files.
- get_xmlcons()[source]¶
Returns all .xmlcon files found inside the root directory and its children, matching a given pattern.
Does use the global sorting_key to attempt to also sort the xmlcons the same way. This is meant to be used in the future for a more specific hex-xmlcon matching.
- Return type:
list[str]
ctdam.parser.geomar_ctd_file_parser module¶
- class ctdam.parser.geomar_ctd_file_parser.GEOMARCTDFile(path_to_file, only_header=False, create_dataframe=True)[source]¶
Bases:
objectA parser to read .ctd files created by the GEOMAR ctdam.proc software.
Goes through the file line by line and sorts the individual lines in corresponding lists. That way, data and different types of metadata are structured on a basic level. In general, this parser is meant to stick close to the way the Seabird- Parsers are written.
ctdam.parser.hexfile module¶
- class ctdam.parser.hexfile.HexFile(path_to_file, path_to_xmlcon='', *args, **kwargs)[source]¶
Bases:
DataFileA representation of a .hex file as used by SeaBird.
- Parameters:
path_to_file (
Path|str) – the path to the file
- get_corresponding_xmlcon(path_to_xmlcon='')[source]¶
Finds the best matching .xmlcon file inside the same directory.
- Parameters:
path_to_xmlcon (
Path|str) – A fixed path to a xmlcon file. Will be checked.- Return type:
XMLCONFile|None
ctdam.parser.parameter module¶
- class ctdam.parser.parameter.Parameters(data, metadata, only_header=False, bad_flag=-9.99e-29)[source]¶
Bases:
UserDictA collection of all the parameters in a CnvFile.
Allows for a much cleaner handling of parameter data and their metadata. Will be heavily expanded.
- Parameters:
data (
list) – The raw data as extraced by DataFilemetadata (
list) – The raw metadata as extraced by DataFile
- create_full_ndarray(data_table=[])[source]¶
Builds a numpy array representing the data table in a cnv file.
- Parameters:
data_table (
list) –- The data to work with
(Default value = [])
- Return type:
ndarray
- sort_parameters(top=['depSM', 'prDM', 't090C', 't190C', 'sal00', 'sal11', 'sbox0Mm/Kg', 'sbox1Mm/Kg', 'flECO-AFL', 'turbWETntu0', 'par', 'spar'], bottom=['gsw_densityA0', 'gsw_densityA1', 'gsw_saA0', 'gsw_saA1', 'gsw_ctA0', 'gsw_ctA1', 'sbeox0ML/L', 'sbeox1ML/L', 'c0mS/cm', 'c1mS/cm', 'latitude', 'longitude', 'flag'])[source]¶
- Return type:
dict
- create_parameter_instances(array_data, metadata)[source]¶
Differentiates the individual parameter columns into separate parameter instances.
- Parameters:
metadata (
dict[str,dict]) –- The structured metadata dictionary
(Default value = {})
- Return type:
dict[str,Parameter]
- add_parameter(parameter, position='')[source]¶
Adds one parameter instance to the collection.
- Parameters:
parameter (
Parameter) – The new parameter
- create_parameter(data, metadata={}, name='', position='')[source]¶
Creates a new parameter instance with the given data and metadata.
The input data is either a numpy array or a single value. The single value will be broadcasted to the shape of the data table. A use-case would be the addition of an ‘event’ or ‘cast’ column.
- Parameters:
data (
ndarray|int|float|str|None) – Data to use or expandmetadata (
dict) –- Metadata for the new parameter
(Default value = {})
name (
str) –- Name to use for missing metadata values
(Default value = “”)
- Return type:
- add_default_metadata(name, metadata={}, list_of_keys=['shortname', 'longinfo', 'name', 'metainfo', 'unit'])[source]¶
Fills up missing metadata points with a default value.
- Parameters:
name (
str) – The value to use as defaultmetadata (
dict) –- The present metadata
(Default value = {})
list_of_keys (
list) – The expected metadata keys
- Return type:
dict
- get_pandas_dataframe()[source]¶
Returns a pandas DataFrame of the current parameter data.
- Return type:
DataFrame
- class ctdam.parser.parameter.Parameter(data, metadata, bad_flag=-9.99e-29)[source]¶
Bases:
objectA representation of one parameter in a cnv file.
Consists of the values of the parameter as well as the metadata.
- get_pandas_series()[source]¶
Returns a pandas Series of the current parameter data.
- Return type:
Series
ctdam.parser.processing_steps module¶
- class ctdam.parser.processing_steps.CnvProcessingSteps(raw_processing_info)[source]¶
Bases:
UserListA python representation of the individual ctdam.proc steps conducted in the process of a cnv file creation. These modules are stored in a dictionary structure, together with all the variables/metadata/etc. given in the header of a cnv file.
- get_step(step)[source]¶
- Parameters:
module (str :)
- Return type:
ProcessingStep|None
- add_info(module, key, value)[source]¶
Adds new processing lines to the list of processing module information
- Parameters:
module (
str) – the name of the processing modulekey (
str) – the description of the valuevalue (
str) – the information
- Return type:
ProcessingStep|None
ctdam.parser.xmlfiles module¶
- class ctdam.parser.xmlfiles.XMLFile(path_to_file)[source]¶
Bases:
UserDictParent class for XML and psa representation that loads XML as a python-internal tree and as a dict.
- Parameters:
path_to_file (
Path|str) – the path to the xml file