Organization,and,Management,of,Scientific,Survey,Cruise,Data

Abstract: [Objective]This paper presents a systematic approach to manage scientific survey cruise data of different disciplines,with different data structures and data processing methods.[Methods] Dataset and metadata in data organization,and common management modes of combining electronic and paper files in data management are used in this study.[Results]Big data technology,including relational databases for managing structured data while adopting unstructured data for the file management mode in the data management stage,is gradually introduced into the data management mode to ensure that data can be managed in a unified and efficient way.Management difficulties in scientific survey cruise data,such as a wide range of specialties,multiple types,and complex data structures are solved.[Conclusions]This work provides a theoretical basis and builds a convenient retrieval channel for data sharing services.Future investigations can focus on extracting useful information from scientific survey cruise data,which is the basis of our activities and decision-making.

Keywords: data management;resource organization;metadata;electronic document management

Scientific survey cruises are comprehensive investigations that involve hundreds of research institutes and universities and cover physical oceanography,geologic oceanography,marine meteorology,marine remote sensing,marine ecology,marine chemistry,and other disciplines.These surveys are widely used crossdisciplinary studies on Bohai and Yellow Seas,East China Sea,Yangtze Estuary,Taiwan Strait,South China Sea,West Pacific Ocean,East Indian Ocean,and other sea areas.Complex types of scientific data involve time serialization and varied survey elements[1].Driven by the data collection of scientific survey cruises,the volume of data has increased sharply annually and exceeded the TB level.Data collection resources include electronic and paper documents.Key tasks of the Data Sharing Service Center are organizing and managing raw data from different disciplines,data structures,and data processing methods in a systematic manner.The organization and management of scientific survey cruise data are assessed and discussed to provide a theoretical foundation for data resources and build a convenient retrieval channel for sharing services of later cruise data.The sharing of scientific survey cruise data is an important means to improve research efficiency and accelerate research output,and an important driving factor to promote scientific progress.It can give full play to and add value to the scientific value of scientific survey cruise data,and promote the collection and sharing of scientific survey cruise data gradually to enter a virtuous cycle.

(1) Classification: Establish and distinguish datasets according to attributes and data groups.

(2) Recognition: Use attributes to describe individuals (such as voyage number,voyage name,survey sea,survey time,survey task,and data attributes).

(3) Organization: Use metadata to describe individual datasets and indicate relationships between datasets.

(4) Storage: Determine the physical location of each record according to a certain organizational structure to support fast retrieval,update,insertion,and deletion.

(5) Retrieval: Establish various indexes to determine eligible record sets rapidly.

Scientific survey cruise data include experimental and analytical data,electronic image data,and related documents recorded by scientific research teams of different institutions.

2.1 Heterogeneity

The heterogeneity of scientific survey cruise data demonstrates the difference in data storage structure of computer environments and is determined by various objects described and observation instruments,including types of instruments,measurement parameters and methods,resolution,and accuracy[2].Obtained datasets are different although the same instrument with different resolutions is used.For example,commonly used CTD instruments include models SBE 911 plus and SBE 37 from Seabird (United States),xr620 and xr420 from RBR (Canada),Concerto3,and model compact series from Alec (Japan).This paper obtains different data types and accuracies with varied settings when these instruments are used.CTD data are typically in con and hex formats.Biochemical data are mostly in Excel,Word,or chromatogram jpg formats produced by analytical instruments.Existing data formats primarily include dbf,mdb,xls,doc,txt,pdf,gif,jpg,tif,img,zip,and rar.Apart from numerical,text-described,space vector,and raster data,video,image,picture,and graphic data also exist when considering different data types.

Data heterogeneity is required in describing marine survey elements.We select metadata technology according to the actual management needs of heterogeneous scientific survey cruise data types.Metadata is used to manage heterogeneous data by establish-ing structured standards without considering data storage formats[3–5].

2.2 Intensity

The scientific survey cruise program implemented 79 cruises and funded more than 1500 projects related to marine disciplines.Subsequent cruises will continue to increase the intensity of funding and the number of projects carried out that gradually enhance the density and breadth of survey stations and routes.Hence,survey data will also increase.Features of scientific survey cruise data are limited by long time series,large spatial scale,and high data intensity.

2.3 Interdisciplinary data needs

The research content of scientific survey cruise projects includes many disciplines,such as hydrology,meteorology,surveying and mapping,remote sensing,geology,and biochemistry.A single scientific research method can only obtain observation data of specific objects in a particular space range and time period.For example,satellite remote sensing monitoring technology can only work in the wide space above sea level but show difficulty in observing underwater elements.Moreover,single-point buoys and other in situ observations fail to meet the needs of large-scale observation,even if they can compensate for the deficiency in underwater detection capability of remote sensing.Therefore,on-board research projects generally present a mutual and multidirectional need for data support from other research projects.

Scientific survey cruise data should be classified according to their attributes to establish a dataset.The classification code is formulated and the data directory structure is organized after determining the classification standard to help users with query data according to the classification code.The interaction with classification standards of many nodes in the existing scientific data-sharing project should be considered in the construction of classification coding,and the expansion interface should be reserved for future integration or integration.The following classification and coding rules should be followed :

(1) The purpose of data classification is to classify and index the dataset,and the basic unit of classification is the dataset.

(2) The purpose of data classification is to manage and organize data in the data center effectively and allow users to find data rapidly.

(3) Data classification is based on the planning of the scientific data-sharing project,combined with subject and industry classification standards.

(4) Data classification and coding standards will continue to expand and revise with the continuous advancement of data sharing.

(5) The combination of classification and coding is grouped according to the order of subordination of category,subfamily,large class,and middle class.Each category that corresponds to the dataset content can be determined from the coding in the data classification and indexing process.

Scientific survey cruise data involves many subjects.The Classification and Coding Scheme Of Engineering Data For Scientific Data Sharing[6]summa-rized in Table 1 showed that the scientific survey cruise dataset belongs to “Resources and Environment Science”(category: R) - >“Marine Science” (subcategory:S) - >“Marine Environment” (major category:13).The specific code is then determined as a one-digit English alphabetic code for the category,a one-digit English alphabetic code for the subcategory,and a two-digit numeric code for major and minor categories.For example,the code “RS13xx.” shown in Fig.1 is used as the name of the five-level folder in the scientific survey cruise data directory structure.

Table 1 Part of the classification and coding table of scientific data [6]

Table 2 Metadata of data from scientific survey cruise

Fig.1 Coding rules of the scientific survey cruise data

The dataset is the basic unit or object of scientific data management based on metadata.Entity data of scientific survey cruises are presented as datasets and metadata according to data expression forms.Metadata is associated with the dataset as the metadata database after standardization and formatting.The metadata database is a collection of data and a search engine for datasets.This method solves technical difficulties in managing scientific data with various types,scattering,and complex structures and provides a convenient retrieval channel for future data sharing and distribution[7].

Dataset partitioning must maintain the systematism of logic relationships of data entities between datasets or within a dataset.Data content differences,data entities are integrated into different datasets according to research purposes,contents,and data content differences to ensure the systematism and integrity of the data content.Survey seas divide scientific survey cruise datasets into level logic relationships.The data setting of survey data is illustrated in Fig.2.

Fig.2 The dataset of survey data from scientific survey cruise

No.Name Data type Remark 16 Start time datetime Start observation time,Format: YYYY - MM - DD (HH: MM: SS)(Please record the first station observation time during discontinuous observation)17 End time datetime End observation time,Format: YYYY - MM - DD (HH: MM: SS)(Please record the last station observation time during discontinuous observation)18 Sampling method text Description of instrument type/ method of observation 19 Sampling elements text Temperature /F8.3、salinity/ F7.4 20 Data storage type text ASC、CNV、DOCX、GIF、HEX、JPEG、XLSX、TXT etc.21 Date of data submission date Date of data submission 22 Way of data submission text USB flash disk/hard disk/CD/Email /FTP etc.23 Content of data submission text The content of data submission 24 Directory structure of data submission text Text/ screenshot 25 Data protection period date Date of data protection yyyy-mm-dd (restricted)/disclosure 26 Data storage location text Data storage institute/data storage location 27 The data contact text Data manager/exchange liaison man 28 Contact method text Contact phone/Email 29 Acknowledgement for data reference text Data used in this study was collected onboard of R/V xxxx implementing the open research cruise cyyyy-xx supported by Shiptime Sharing Project(project number: XXXX)

Metadata is provided by the data producer according to rules set by the data center (type,length,and name) and submitted together with data entities.Users understand characteristics of measurement data,such as time,space,instruments and methods,resolution,and quality through metadata and then correctly and efficiently use basic datasets.Metadata records form a metadata database according to database specifications.The metadata database is relational and its elements are fields in the relational database.For example,dataset title,names of the data collector,collection places,starting and ending times,collection and other metadata elements constitute a metadata database.Metadata of survey data from scientific survey cruises is presented in Table 2.

5.1.Considerations for the data management of scientific survey cruise

(1) Coexistence of electronic and paper documents

Scientific survey cruise data for submission comprise not only an electronic version of data documents,such as field survey,observation data,and laboratory test data,but also paper documents,including signed documents,such as data submission lists,records,and receipts.Paper documents need to be digitized for longterm preservation while physically storing and managing these files.Hence,storage conditions and management methods of both electronic and paper documents must be considered[8-9].

Electronic documents rely on storage devices because they require a large amount of storage,but they are separatable from storage devices.However,mediums,such as hard disks and CD-ROM,which store electronic documents,have specific requirements on the storage environment,such as temperature,humidity,antimagnetic,and antivirus.Moistureproof,antimagnetic,and fireproof file cabinets can be used to protect the stability of mediums.Although paper documents are more reliable and visually easier to access,they require stricter storage conditions and present limited storage capacity compared with their electronic counterpart.

Understanding the basic characteristics of electronic and paper documents is essential for the data management team to formulate specific management rules and implement effective management and technical measures.Consequently,the authenticity,integrity,and effectiveness of documents can be ensured.

(2) File-type dataset

Scientific survey cruise data exist as the document type in electronic documents.The dataset takes a data file as its basic unit for storing,describing,adding,deleting,displaying,and exchanging data.The format and content of data files are set according to recording requirements in the form of data charts,text files,pictures,and videos.For example,biochemical experiment data is present as a file-type dataset in the form of Excel tables.Contrastively,the status of a voyage station is in a photo or video,with its route stored as to be a file-type dataset.Scientific data management uses these data to ensure that granularity determination of datasets is based on the convenience of data reference and acquisition as well as integrity and systematism of data content.

5.2 Organization form of scientific survey cruise data in the storage medium

Storage of scientific survey cruise data should ensure data security and sufficient space to meet the increasing demand for data storage management.

In this paper,a basic method of data storage is initially adopted before the implementation of a high-end mass storage system.

The naming rule of normalized files after dataset partitioning is illustrated in Fig.3.A six-level directory is designed to categorize files for easy retrieval and application.The organization of scientific survey cruise datasets in the storage medium is shown in Fig.4.

Fig.3 Naming rule for scientific survey cruise electronic documents

Fig.4 The organization form of scientific survey cruise data in storage medium

5.3 Rule for storage order of scientific survey cruise data

The preservation work of scientific survey cruise data should abide by national archive laws and regulations and conform to the actual needs of marine undertakings to ensure that scientific-research data entities and their information are complete,accurate,systematic,and safe[10].Archived data include electronic and paper documents.

(1) Method of electronic document management

Electronic document filing is a new file management mode that has emerged with the development of information and network technologies in recent years.Electronic documents based on magnetic materials can theoretically be preserved for a long time because their information is read without touching and abrasion.However,existing long-term electronic documents and their backup in different places must be replicated regularly to prevent information loss due to their short-time formation and the lack of actual storage verification.At present,electronic documents of scientific survey cruise data are stored in CD-ROMs located in different places.

A practical example of a CD number is “C2016_01_Q2015012_AU_JC_RS1313_20170905_1/2”.The mea-ning of this number is illustrated in Fig.5.

Fig.5 CD number of electronic documents

(2) Method of paper document management

A series of working procedures,methods,and principles in the management of paper archives,such as the document filing system,has been established after extended trial-and-error learning.Paper documents are archived and numbered according to existing management rules of standard archives[11-12].

The archive number is set as the category–file serial number.A practical example of a file number of paper documents is “C2016_01_20170905_2/3”.The meaning of this number is illustrated in Fig.6.

Fig.6 The file box number of paper documents

This paper realized the online collaborative off-ice of application and approval of scientific survey cruise data,by designing and implementing function modules based on B/S architecture[13].Users can submit applications for data use through the network,and data managers can conduct online approval according to established approval principles.This work not only promotes the efficiency and effectiveness of scientific survey cruise data management and application but also improves the service level and work efficiency of the data management department.

The organization and management of scientific survey cruise data can be divided into stages of file and data management.Long-term data preservation can be achieved in the stage of file management.Big data technology including relational database for managing structured data while unstructured data adopt the file management mode in the data management stage,is gradually introduced in the data management mode to ensure that data can be managed in a unified and efficient way.The management mode of scientific survey cruise data is still in its infancy.Advanced management concepts,such as polyglot persistence and multi-mode database,will be gradually applied to data storage and management due to the increase in data and change in demand.

Data processing is composed of a series of activities,such as data acquisition,sorting,storage,classification,ranking,retrieval,maintenance,processing,statistics,and transmission.The finished first seven items make the organization and management of scientific survey cruise data have rules to follow,normalize the organizational structure of hierarchical and classified management of data in the process of scientific survey cruise data sorting.All the results provide a solid foundation for the next three remaining items,including the provision of data sharing and services,such as obtaining necessary and extracting useful information from scientific survey cruise data,which are the basis of our activities and decision-making.

Conflict of interest statement

All authors declare that there is no conflict of interest.

推荐访问:Scientific Management Organization