Data files: general conventions

Introduction

This document describes a few common attributes of data files used by programs available on this site.

Fortran Conventions

Software are developed in Fortran and use Fortran sequential input or output files which implies that

data files are ASCII text files;
if prepared within a software other than a pure text editor, such as a word processor or a spreadsheet, they must be saved a text only files.

Fortran sequential input files can be read either in free or fixed format.

Fortran free format

Reading rules:
- Numerical data are separated by blanks or tabulations (more suited for spreadsheet)
- Character strings MUST be delimited by quotes, ', to be read properly
- If all the data to read within a line are found, the rest of the line is not read. This implies that extra information can be added AFTER the required data without affecting the input.
- If all the data are not found within a line, the missing data will be sought in the next line. This implies that an incomplete line will be completed by the probably misinterpreted next line.
Possible file preparation:
- in a text editor and saved as text only
- in a spreadsheet and saved as text only (tab separated text output is most convenient)

Fortran fixed format

Reading rules:
- Numerical data and character strings are read within a specific column location.
  Column here means the number of characters (or spaces) counted from the beginning of the line
- No delimiters are necessary: character strings are not delimited by quotes but by their column location
- A column offset in the data file will result in misinterpreted data
Possible file preparation:
- in a text editor and saved as text only; each character position must be counted from the beginning of each line.

Operating system issues

Text encoding

Only ASCII characters are accepted in input, and the text encoding must be compatible with it. Best choice on recent systems is UTF-8, but many older encodings are also compatible, such as Mac OS Roman, Windows-1252, and Latin-1. Compatible encoding is usually achieved when saving as a plain text file from a text editor, a word processor, or a spreadsheet, and can be verified and modified in text editor.

End of line

The special character used to mark the end of line (EOL) in the input file must be consistent with the system used to run the software (Table 1). If the end of line is not recognized, the whole input file may appear as a single line to the program. Problems tend to arise when the file is transferred from one operating system to another, or when the file is exported from a word processor or spreadsheet. If ftp is used between systems, setting text, instead of binary, transfer of data files should translate the end of line. It is therefore recommended to use a text editor to check, and eventually correct, end of line characteristics of data files that have been exchanged between systems or that have been exported from a word processor or a spreadsheet.

Table 1: End of line (EOL) coding in
common operating systems.
Operating system	EOL symbol	EOL description
MacOS X	LF	Line Feed
Unix	LF	Line Feed
Windows	CRLF	Carriage Return + Line Feed
MacOS Classic	CR	Carriage Return

End of file

Always terminate the file with an empty line (extra line with no space). This avoids putting the end of file (EOF) tag in a data line. In some systems such a situation can result in the last data line not being read.
Always check that the number of data stored in the program is exactly the expected number of data. If one datum is missing, the above most likely applies.

Creating input files

Input files can be created in a text editor or exported from a word processor or spreadsheet.

Text editors

Preparing an input file with a text editor has the advantage of directly creating a plain text file. There then only remain two issues to deal with when saving the file:

checking and eventually correcting end of line coding, and
making sure the end of file is below the last data line.

Here are a few text editors that allow to verify and alter end of line and text encoding of text files:

under MacOS:
- TextWrangler,
- BBEdit,
- Smultron, or
- Plain Text Editor.
under Windows:
- ConTEXT.

Spreadsheets

Input files that are read in free format can be exported from spreadsheet. There are then four issues to deal with:

exporting the file as tab delimited text,
making sure that character strings are enclosed within quotes,
checking and eventually correcting end of line coding, and
making sure the end of file is below the last data line.

The last three issues are best dealt with by importing the file into a text editor.

Word processors

Finally, input files can also be exported from a word processor. Three issues must be dealt with:

exporting the file in plain text (tab delimited columns recommended),
checking and eventually correcting end of line coding, and
making sure the end of file is below the last data line.

Again, the last two issues are best dealt with by importing the file into a text editor.

Software Conventions

Further conventions resulting from development choices are given below.

Standard file

A file format designed to be easily exchanged with spreadsheets is called 'standard file' in this documents and used as much as possible by the software. The standard file is made of a standard header followed by standard data lines.

Header

Standard header

The standard header is made of two lines:

the first line contains the title, and
the second line contains the columns headers.

The standard header is read in Fortran free format: title and column headers are character strings and need be delimited by single quotes ', so as to be read properly.

Non standard header

The input module dialog often allows to read files with non standard headers after adjusting

the number of header lines to skip,
the number of the line to read title from (0 if none), and
the number of the line to read column headers from (0 if none).

Data line

Each data correspond to one line.

Standard data lines

Standard data lines are read in Fortran free format.

Other data lines

In some cases it is convenient to use

Fortran fixed format: exact column location is required and there are no delimiters.

Standard file example

Three first lines of a standard file example with 2 reals, 1 integer, and 1 character string per data:

'Data title'
'Data_column1' 'Data_column2' 'Data_column3' 'Data_column4'
12000.6 2999.4567 245 'label_of_data1'

🏠 Homepage

Software page

Contact: 📪