By Edward Sternin, Brock University
This is an introductory notebook to various ways of plotting and analyzing experimental data, as appropriate to second-year Physics majors.  Feel free to choose your own preferred package, but mastering one of the tools in this selection is strongly recommended, as the demands on the software will become more significant in later years of the program, and so other packages (notably, excel) will not have sufficient capabilities.  Choose a package that has "long legs".
A large number of software packages exist, and personal preferences play a great role in selecting which one will end up as your favourite go-to tool. Here we will restrict ourselves to a comparison of a few packages. The criteria for selecting them are simple:
extrema (or its predecessor, physica), gnuplot, and octave (a free alternative to Matlab). All of these are inherently interactive programs, and so they are normally used through a command-line interface (CLI) or their own native Graphics User Interface (GUI).  In this notebook, we will emulate that behaviour by using scripts, which is actually an advanced use mode for most programs. To get a true feel for the programs, please attempt to use their native interfaces, guided by the sample command selections below.
From a wide field of significant competitors not included here are Maple (a commercial package) and grace (a.k.a xmgrace), and Mac-world's Igor. These are reasonable alternatives so if you have significant skills in their scripting, they will be appropriate to continue to use, though they do not satisfy all of the selection criteria.
To begin with, configure jupyter's work directory and import some libraries that enable us to display graphics files in this notebook.
%%bash
WORKDIR=~/5P10/Lab2
# if the directory exists, remove it and all its contents
if [ -d $WORKDIR ]; then
  rm -rf $WORKDIR
fi
# and re-create it as a blank work directory
mkdir -p $WORKDIR
# now tell jupyter where to work
%cd ~/5P10/Lab2
%pwd
/home/esternin/5P10/Lab2
'/home/esternin/5P10/Lab2'
# import necessary libraries
from IPython.display import display
from PIL import Image
The two basic ways are: direct entry, where you embed the data in the script, and external file read. The former has the advantage of portability (one file to keep track of for any project), the latter has the advantage of compactness (a small script acting on a series of potentially large data files).
All three programs allow direct entry of data, with minor syntactic differences:
x=[1:8]
y=[0.05;0.10;0.14;0.19;0.25;0.30;0.34;0.40]
dy=[0.02;0.07;0.01;0.04;0.05;0.10;0.02;0.04]
set xlabel `Time, s'
set ylabel `Distance, m'
set pchar -12
graph x,y,dy
\$DATA << EOD
1 0.05 0.02
2 0.10 0.07
3 0.14 0.01
4 0.19 0.04
5 0.25 0.05
6 0.30 0.10
7 0.34 0.02
8 0.40 0.04
EOD
set xlabel 'Time, s'
set ylabel 'Distance, m'
plot \$DATA with errorbars pt 6
x=linspace(1,8,8);
y=[0.05 0.10 0.14 0.19 0.25 0.30 0.34 0.40];
dy=[0.02 0.07 0.01 0.04 0.05 0.10 0.02 0.04];
plot(x,y,'go');
xlabel("Time, s");
ylabel("Distance, m");
To demonstrate the plots within this notebook, we need to communicate to the external program, like extrema, or load on-the-fly a kernel magic that will enable execution of an external program directly embedded in this python-based notebook.
We will default all of the settings that control the appearance of the graphs, for simplicity. All of these programs can produce publication-quality plots through a series of adjustments to their appearance. This will be addressed in the following sections.
An experimental eXtrema kernel exists, but it is not yet installed system-wide, so each user must install it into their own file space. A separate jupyter notebook shows how to do it. That notebook needs to be run only once, and then this notebook's kernel needs to be restarted, to read in the newly added eXtrema kernel. If you have already done that at some point, the eXtrema kernel should included in the drop-down list of kernels under the Kernel menu above.
ls -l /work/extrema_kernel/
total 16
-rw-r--r-- 1 esternin LinuxEmployees 5406 Sep 12 10:17 extrema_kernel.py
-rw-r--r-- 1 esternin LinuxEmployees  234 Sep 12 10:17 kernel.json
drwxr-xr-x 2 esternin LinuxEmployees 4096 Oct 15 14:03 __pycache__/
import sys
sys.path.append('/work/extrema_kernel')
import extrema_kernel
Once extrema kernel is available in the drop-down menu of kernels, you do not need to repeat the above and the line magic %eXtrema and cell magic %%eXtrema should be available. Note that this kernel uses a simplified batch-mode version of extrema which may not have some commands implemented yet. The kernel returns an image file which we can then show within this notebook.
%%eXtrema
x=[1:8]
y=[0.05;0.10;0.14;0.19;0.25;0.30;0.34;0.40]
dy=[0.02;0.07;0.01;0.04;0.05;0.10;0.02;0.04]
set xlabel `Time, s'
set ylabel `Distance, m'
set plotsymbol -17
graph x,y,dy
Kernel magic for gnuplot exists:
# This loads the gnuplot kernel extension, for embedded gnuplot use
%load_ext gnuplot_kernel
%gnuplot inline pngcairo size 640,480 font "Palatino,16"
%%gnuplot
$DATA << EOD
1 0.05 0.02
2 0.10 0.07
3 0.14 0.01
4 0.19 0.04
5 0.25 0.05
6 0.30 0.10
7 0.34 0.02
8 0.40 0.04
EOD
set xlabel 'Time, s'
set ylabel 'Distance, m'
plot $DATA with errorbars pt 7
$DATA << EOD
Kernel magic for octave exists:
# This loads the octave kernel extension, for embedded octave use
%load_ext oct2py.ipython
%%octave
x=linspace(1,8,8);
y=[0.05 0.10 0.14 0.19 0.25 0.30 0.34 0.40];
dy=[0.02 0.07 0.01 0.04 0.05 0.10 0.02 0.04];
errorbar(x,y,dy,'bo');
xlabel("Time, s");
ylabel("Distance, m");
legend("data");
With only minor changes the same scripts can be applied to external data files. Scripts below all accomplish the same basic read from a data file, a plot, followed by a number of adjustments of settings that approach the publication-quality standards of appearance. The output is not shown, but saved into into a vector-graphics file, ready to be uploaded/included into overleaf. Encapsulated PostScript (.eps) or SVG (.svg) vector-graphics formats work best, as they scale without introducing "jaggies" and always display at the resolution of the device (typically, at about 100dpi on a monitor screen, and at 600dpi on paper).
As of 2019, overleaf has support for SVG inclusion (\usepackage{svg} and \includesvg{image.svg}).
First, let's create the file of data to be plotted by each of the programs in turn:
%%file exp1.dat
1 0.05 0.02
2 0.10 0.07
3 0.14 0.01
4 0.19 0.04
5 0.25 0.05
6 0.30 0.10
7 0.34 0.02
8 0.40 0.04
Writing exp1.dat
If you installed the experimental jupyter eXtrema kernel that introduces cell-level magic %%eXtrema, you can run eXtrema commands directly within this notebook.  However, some functionality is still missing, so a better way at the moment is to generate a macro file and then execute it from within eXtrema invoked outside of jupyter. This uses a full interactive version of eXtrema, so at the end when it processes your basic.pcm script that ends with a quit choose Yes when it offers to terminate the run. The resulting .eps file can be seen with a PDF viewer (e.g. evince) and is ready to be uploaded to overleaf.
%%file basic.pcm
clear
defaults
read exp1.dat x,y,dy
set  xlabel `Time, s'
set  ylabel `Distance, m'
set  yleadz 1
set  %xlaxis 16
set  legendon 1
set  legendframeon 0
set  %legendframe 18 65 50 85
set  legendsymbols 3
set  legendentrylineon 0
set  %textheight 3
set  plotsymbol -17
set  %plotsymbolsize 1.7
set  curvecolor red
set  box 1
scales 0 9 9 0 0.45 9
graph `data' x,y
scalar\vary a,b
fit\e2 y=a*x^2+b
generate xx x[1],,x[#] 1000
set plotsymbol 0
set  legendentrylineon 1
graph\overlay rchar(a,`y=%6.4ft<^>2<_>+')//rchar(b,`%6.4f') xx,a*xx^2+b
replot
hardcopy\POSTSCRIPT extrema.eps
hardcopy\PNG basic.png
quit
Writing basic.pcm
%%bash
extrema --script basic.pcm
ls -la extrema.eps basic.png
-rw-r--r-- 1 esternin LinuxEmployees 37470 Oct 17 16:22 basic.png -rw-r--r-- 1 esternin LinuxEmployees 39103 Oct 17 16:22 extrema.eps
Data importation is actually difficult to achieve in gnuplot because it's primarily a plotting program, and data manipulation is not really its strength. On the other hand, plotting data from external ASCII data files is particularly easy, as this is precisely what gnuplot was originally designed for.
Using the same data file as above:
%gnuplot postscript eps colour size 6.4,4.8 font "Palatino,32" 
%%gnuplot
reset
set output 'gnuplot.eps'
set encoding utf8
set style data points
set dummy x, y
f(x) = a*x**2+b
# gnuplot's fitting variables MUST be initialized to non-zero values!
a=1.0
b=1.0
fit f(x) 'exp1.dat' via a,b
set xlabel 'Time, s'
set ylabel 'Distance, m'
set xrange [0.5:8.5]
set yrange [0:0.5]
plot 'exp1.dat' with errorbars pt 7 ps 2 title "data from exp1.dat",\
     f(x) title sprintf("best fit: %.4f t^2 + %.4f",a,b)
fit f(x) 'exp1.dat' via a,b
iter      chisq       delta/lim  lambda   a             b            
   0 9.0568363000e+03   0.00e+00  2.34e+01    1.000000e+00   1.000000e+00
   1 3.3906186412e+01  -2.66e+07  2.34e+00    4.446168e-02   9.725188e-01
   2 1.0093949569e+00  -3.26e+06  2.34e-01   -7.587122e-03   6.416660e-01
   3 4.6402022173e-03  -2.17e+07  2.34e-02    5.086764e-03   9.528535e-02
   4 4.3637955261e-03  -6.33e+03  2.34e-03    5.301084e-03   8.607298e-02
   5 4.3637955182e-03  -1.80e-04  2.34e-04    5.301120e-03   8.607143e-02
iter      chisq       delta/lim  lambda   a             b            
After 5 iterations the fit converged.
final sum of squares of residuals : 0.0043638
rel. change during last iteration : -1.80068e-09
degrees of freedom    (FIT_NDF)                        : 6
rms of residuals      (FIT_STDFIT) = sqrt(WSSR/ndf)    : 0.0269685
variance of residuals (reduced chisquare) = WSSR/ndf   : 0.000727299
Final set of parameters            Asymptotic Standard Error
=======================            ==========================
a               = 0.00530112       +/- 0.0004514    (8.514%)
b               = 0.0860714        +/- 0.01495      (17.36%)
correlation matrix of the fit parameters:
                a      b      
a               1.000 
b              -0.770  1.000 
The use of comments makes the following script self-explanatory, hopefully. octave kernel actually will return a PNG graphic that will show up when you execute the script, in addition to writing out the EPS file.
%%octave
### read in the data from a file, 3 columns
[x,y,dy]=textread('exp1.dat',"%f %f %f");
### this is needed if the graph is to contain multiple plots
hold on
g=errorbar(x,y,dy,'go');
### refine: some properties belong to the plot, identified by its "handle" g
set(g, 'MarkerSize', 10, 'MarkerFaceColor', [.3 1 .3], 'MarkerEdgeColor', [0 .5 0]);
### some others belong to the Axes of all plots, gca = Get Current Axes handle
set(gca, "linewidth", 4, "fontsize", 18, "box", "on");
### fit to n-degree polynomial yields a vector of n+1 coefficients of fit
### a logical vector indicates whether to skip some terms, e.g. to n=2, skipping the linear term
cf = polyfit(x,y,[true,false,true]);
### which can be used to generate a polynomial function that can be plotted
#plot(V,polyval(cf,V),'r-');
### except there aren't enough points to define a smooth curve, 
### so apply the same fit coefficients to a much finer-grid vector
xx = linspace(min(x),max(x),1000);
plot(xx,polyval(cf,xx),'r-');
xlabel("Time, s");
ylabel("Distance, m");
legend({"data",sprintf("fit %.3dt^2+%.3d",cf(1),cf(3))},"location","northwest");
legend boxoff;
### in Octave v<4.2.2 there was a bug in the legend() command for errorbar() entries
### One could use title() as a workaround 
#title(sprintf("Best fit %.3dt^2+%.3d",cf(1),cf(3)));
hold off
print -deps -color octave.eps
Use an external PDF viewer to examine and compare the three EPS files that got created, physica.eps, gnuplot.eps and octave.eps.
%%bash
ls -la extrema.eps gnuplot.eps octave.eps
-rw-r--r-- 1 esternin LinuxEmployees 39103 Oct 17 16:22 extrema.eps -rw-r--r-- 1 esternin LinuxEmployees 26688 Oct 17 16:22 gnuplot.eps -rw-r--r-- 1 esternin LinuxEmployees 36770 Oct 17 16:22 octave.eps
This is an obvious question to ask, since we are already using a python-based jupyter notebook to run this discussion and comparison. python is a powerful programming environment, with a large number of packages available, of varying quality. python is fully capable of generating simple graphs. However, in my experience, extending this to publication-quality outputs is more difficult in python than in almost any other environment.
More importantly, its object-oriented style of coding is difficult to master on a casual basis, and is easily forgotten, if used rarely. As an illustration, I find that this
   data=numpy.loadtxt('exp1.dat')
   x = data[:,0]
   y = data[:,1]
   dy= data[:,2]
   def f(x, a, b): return a*x**2 + b
   scipy.optimize.curve_fit(f, x, y, p0=[1, 0])
a lot less readable than this (for gnuplot)
f(x) = a*x**2+b a=1 b=1 fit f(x) 'exp1.dat' via a,bor this (for physica)
read exp1.dat x,y,dy scalar\vary a,b fit y=a*x^2+b
In other courses of the Physics program, python may prove more suitable, especially if specific-purpose libraries are available to help, but for the everyday scientific tasks of analyzing and plotting data, procedurally-based languages offer an easier way to perform them.
For an interesting, if somewhat discouraging read, look at https://realpython.com/python-matplotlib-guide/ just to see how idiosyncratic python's interface can be, from stucture to terminology. Without comment, below is a brief example of a properly-syntaxed "simple" plot in python.
# multiple libraries for non-core functions need to be imported: 
#   plotting, numerical functions, least squares fits
import matplotlib.pyplot as plt
import numpy as np
from scipy import optimize
# inline declaration of x,y,dy values
#x=np.linspace(1,8,num=8)
#y=[0.05,0.10,0.14,0.19,0.25,0.30,0.34,0.40]
#dy=[0.02,0.07,0.01,0.04,0.05,0.10,0.02,0.04]
# reading data from a file
data=np.loadtxt('exp1.dat')
x = data[:,0]
y = data[:,1]
dy= data[:,2]
for i in range(len(x)): print('{:.2f} {:.2f} {:.2f}'.format(x[i], y[i], dy[i]))
fig, ax = plt.subplots(figsize=(6,4))
ax.set_ylabel('Distance, m')
ax.set_xlabel('Time, s')
ax.errorbar(x=x, y=y, yerr=dy, fmt='ro', label="data")
#define the fit equation
def f(x, a, b): return a*x**2 + b
#use it to perform the best fit to the data
params, params_covariance = optimize.curve_fit(f, x, y, p0=[1, 0])
# add the smooth curve on the data, using a much finer point grid across the same range
xx=np.linspace(x[0],x[-1],num=1000)
ax.plot(xx, f(xx, params[0], params[1]), label='Fit: {:.4f}$t^2$+{:.4f}'.format(params[0], params[1]))
ax.legend(loc='best')
plt.savefig('python.eps', format='eps', dpi=600)
1.00 0.05 0.02 2.00 0.10 0.07 3.00 0.14 0.01 4.00 0.19 0.04 5.00 0.25 0.05 6.00 0.30 0.10 7.00 0.34 0.02 8.00 0.40 0.04
The PostScript backend does not support transparency; partially transparent artists will be rendered opaque. The PostScript backend does not support transparency; partially transparent artists will be rendered opaque.
%%bash
ls -sh gnuplot.eps octave.eps extrema.eps python.eps
file extrema.eps gnuplot.eps octave.eps python.eps
40K extrema.eps 28K gnuplot.eps 36K octave.eps 24K python.eps extrema.eps: PostScript document text conforming DSC level 3.0, type EPS, Level 2 gnuplot.eps: PostScript document text conforming DSC level 2.0, type EPS octave.eps: PostScript document text conforming DSC level 2.0, type EPS python.eps: PostScript document text conforming DSC level 3.0, type EPS
The output files are shown below as bitmap images, at the resolution of the web interface of jupyter, just for illustration. Their bitmap renderings may not do them justice. The images themselves are vector graphics and will scale properly, without "jaggies", when included in a LaTeX/overleaf document.
img1 = Image.open("gnuplot.eps")
img1.load(scale=2)
img2 = Image.open("octave.eps")
img2.load(scale=1)
img3 = Image.open("extrema.eps")
img3.load(scale=1)
#img3 = img.rotate(90, expand=-1)
img4 = Image.open("python.eps")
img4.load(scale=2)
display(img1,img2,img3,img4)