Data files: general conventions


Introduction

This document describes a few common attributes of data files used by programs available on this site.

Fortran Conventions

Software are developed in Fortran and use Fortran sequential input or output files which implies that Fortran sequential input files can be read either in free or fixed format.

Fortran free format

Fortran fixed format

Operating system issues

Text encoding

Only ASCII characters are accepted in input, and the text encoding must be compatible with it. Best choice on recent systems is UTF-8, but many older encodings are also compatible, such as Mac OS Roman, Windows-1252, and Latin-1. Compatible encoding is usually achieved when saving as a plain text file from a text editor, a word processor, or a spreadsheet, and can be verified and modified in text editor.

End of line

The special character used to mark the end of line (EOL) in the input file must be consistent with the system used to run the software (Table 1). If the end of line is not recognized, the whole input file may appear as a single line to the program. Problems tend to arise when the file is transferred from one operating system to another, or when the file is exported from a word processor or spreadsheet. If ftp is used between systems, setting text, instead of binary, transfer of data files should translate the end of line. It is therefore recommended to use a text editor to check, and eventually correct, end of line characteristics of data files that have been exchanged between systems or that have been exported from a word processor or a spreadsheet.

Table 1: End of line (EOL) coding in
common operating systems.


Operating system EOL symbol EOL description
MacOS X LF Line Feed
Unix LF Line Feed
Windows CRLF Carriage Return + Line Feed
MacOS Classic CR Carriage Return

End of file

Creating input files

Input files can be created in a text editor or exported from a word processor or spreadsheet.

Text editors

Preparing an input file with a text editor has the advantage of directly creating a plain text file. There then only remain two issues to deal with when saving the file: Here are a few text editors that allow to verify and alter end of line and text encoding of text files:

Spreadsheets

Input files that are read in free format can be exported from spreadsheet. There are then four issues to deal with:
  1. exporting the file as tab delimited text,
  2. making sure that character strings are enclosed within quotes,
  3. checking and eventually correcting end of line coding, and
  4. making sure the end of file is below the last data line.
The last three issues are best dealt with by importing the file into a text editor.

Word processors

Finally, input files can also be exported from a word processor. Three issues must be dealt with:
  1. exporting the file in plain text (tab delimited columns recommended),
  2. checking and eventually correcting end of line coding, and
  3. making sure the end of file is below the last data line.
Again, the last two issues are best dealt with by importing the file into a text editor.

Software Conventions

Further conventions resulting from development choices are given below.

Standard file

A file format designed to be easily exchanged with spreadsheets is called 'standard file' in this documents and used as much as possible by the software. The standard file is made of a standard header followed by standard data lines.

Header

Standard header

The standard header is made of two lines:
  1. the first line contains the title, and
  2. the second line contains the columns headers.
The standard header is read in Fortran free format: title and column headers are character strings and need be delimited by single quotes ', so as to be read properly.

Non standard header

The input module dialog often allows to read files with non standard headers after adjusting

Data line

Each data correspond to one line.

Standard data lines

Other data lines

In some cases it is convenient to use

Standard file example

Three first lines of a standard file example with 2 reals, 1 integer, and 1 character string per data:
  1. 'Data title'
  2. 'Data_column1' 'Data_column2' 'Data_column3' 'Data_column4'
  3. 12000.6 2999.4567 245 'label_of_data1'

🏠   Homepage Software page Contact:   📪