Data files: general conventions
Introduction
This document describes a few common attributes of data files used by
programs available on this site.
Fortran Conventions
Software are developed in Fortran and
use Fortran sequential input or output files which implies that
Fortran sequential input files can be read either in free or fixed format.
Fortran free format
- Reading rules:
- Numerical data are separated by blanks or tabulations (more suited for spreadsheet)
- Character strings MUST be delimited by quotes, ', to be read properly
- If all the data to read within a line are found, the rest of the
line is not read. This implies that extra information can be added AFTER the required
data without affecting the input.
- If all the data are not found within a line, the missing data will
be sought in the next line. This implies that an incomplete line will be completed
by the probably misinterpreted next line.
- Possible file preparation:
- in a text editor and saved as text only
- in a spreadsheet and saved as text only
(tab separated text output is most convenient)
Fortran fixed format
- Reading rules:
- Numerical data and character strings are read within a specific column location.
Column here means the number of characters (or spaces) counted from the beginning of the line
- No delimiters are necessary: character strings are not delimited by quotes but by their column location
- A column offset in the data file will result in misinterpreted data
- Possible file preparation:
- in a text editor and saved as text only;
each character position must be counted from the beginning of each line.
Operating system issues
Text encoding
Only ASCII characters are accepted in input,
and the text encoding must be compatible with it.
Best choice on recent systems is UTF-8,
but many older encodings are also compatible,
such as Mac OS Roman, Windows-1252, and Latin-1.
Compatible encoding is usually achieved when saving as a plain text file from
a text editor,
a word processor,
or a spreadsheet,
and can be verified and modified in text editor.
End of line
The special character used to mark the end of line (EOL) in the input file
must be consistent with the system used to run the software
(Table 1).
If the end of line is not recognized,
the whole input file may appear as a single line to the program.
Problems tend to arise when the file is transferred
from one operating system to another,
or when the file is exported from
a word processor
or spreadsheet.
If ftp is used between systems, setting text, instead of binary,
transfer of data files should translate the end of line.
It is therefore recommended to use a text editor
to check, and eventually correct, end of line characteristics of data files
that have been exchanged between systems
or that have been exported from
a word processor
or a spreadsheet.
Table 1: End of line (EOL) coding in
common operating systems.
Operating system |
EOL symbol |
EOL description |
MacOS X |
LF |
Line Feed |
Unix |
LF |
Line Feed |
Windows |
CRLF |
Carriage Return + Line Feed |
MacOS Classic |
CR |
Carriage Return |
End of file
- Always terminate the file with an empty line (extra line with no space).
This avoids putting the end of file (EOF) tag in a data line.
In some systems such a situation can result in the last data line not being read.
- Always check that the number of data stored in the program
is exactly the expected number of data.
If one datum is missing, the above most likely applies.
Creating input files
Input files can be created in a text editor or exported from a word processor or spreadsheet.
Text editors
Preparing an input file with a text editor has the advantage
of directly creating a plain text file.
There then only remain two issues to deal with when saving the file:
Here are a few text editors that allow to verify
and alter end of line and text encoding of text files:
- under MacOS:
- under Windows:
Spreadsheets
Input files that are read in
free format
can be exported from spreadsheet.
There are then four issues to deal with:
- exporting the file as tab delimited text,
- making sure that character strings are enclosed within quotes,
- checking and eventually correcting
end of line coding, and
- making sure the
end of file
is below the last data line.
The last three issues are best dealt with by importing the file
into a text editor.
Word processors
Finally, input files can also be exported from a word processor.
Three issues must be dealt with:
- exporting the file in plain text (tab delimited columns recommended),
- checking and eventually correcting
end of line coding, and
- making sure the
end of file
is below the last data line.
Again, the last two issues are best dealt with by importing the file
into a text editor.
Software Conventions
Further conventions resulting from development choices are given below.
Standard file
A file format designed to be easily exchanged with
spreadsheets
is called 'standard file' in this documents
and used as much as possible by the software.
The standard file is made of a standard header
followed by standard data lines.
Header
Standard header
The standard header is made of two lines:
- the first line contains the title, and
- the second line contains the columns headers.
The standard header is read in Fortran free format:
title and column headers are character strings and need be delimited
by single quotes ', so as to be read properly.
Non standard header
The input module dialog often allows to read files with non standard headers
after adjusting
- the number of header lines to skip,
- the number of the line to read title from (0 if none), and
- the number of the line to read column headers from (0 if none).
Data line
Each data correspond to one line.
Standard data lines
Other data lines
In some cases it is convenient to use
Standard file example
Three first lines of a standard file example
with 2 reals, 1 integer, and 1 character string per data:
- 'Data title'
- 'Data_column1' 'Data_column2' 'Data_column3' 'Data_column4'
- 12000.6 2999.4567 245 'label_of_data1'