This page contains several scripts for creating publication-quality charts. They have been built mostly using Ubuntu and should run fine using Python 3.x. They have been created for specific reasons, for my research, by combining various tips and suggestions found on the internet. Some prerequisite libraries must be installed such as matplotlib, pandas, numpy. To read xlsx files, openpyxl should be installed, but most input files are in csv format. Input files can be downloaded for practice. The codes include tweaks to add specific visual formats or to deal with certain types of data. To change colors as needed, I recommend inspiring from xkcd color scheme or Coolors. Saving output as svg is also highly recommended. The svg files can be further formatted in tools such as Inkscape to obtain high-quality final images. Full or partial screeshots of csv and xlsx data are provided, for reference.
Line chart
The following code creates a publication-quality line by importing data from two different csv files.


import sys
import os
import matplotlib as mpl
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
mpl.rc('font',**{'family':'sans-serif','sans-serif':['Arial']})
valori1=pd.read_csv("RMSD_ligand.csv", sep=',')
valori2=pd.read_csv("RMSD_protein.csv", sep=',')
plt.plot(valori1.time, valori1.RMSD_ligand, valori2.time, valori2.RMSD_protein)
plt.xlabel('Time (ns)', fontsize=16)
plt.ylabel('RMSD (nm)', fontsize=16)
plt.xticks(fontsize=16)
plt.yticks(fontsize=16)
plt.axis([0, 1.5, 0, 0.3])
plt.savefig("line_chart.svg")
plt.show()

Line shaded standard deviation chart
In this case, the csv file includes six repetitions of a measurement and the standard deviation is calculated automatically.

import sys
import os
import matplotlib as mpl
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
valori = pd.read_csv("val.csv")
valori['average'] = valori[['rmsf1', 'rmsf2', 'rmsf3', 'rmsf4', 'rmsf5', 'rmsf6']].mean(axis=1)
valori['STDEV'] = valori[['rmsf1', 'rmsf2', 'rmsf3', 'rmsf4', 'rmsf5', 'rmsf6']].std(axis=1)
valori['upper'] = valori['average'] + valori['STDEV']
valori['lower'] = valori['average'] - valori['STDEV']
valori['Residue'] = range(1, len(valori) + 1)
mpl.rc('font',**{'family':'sans-serif','sans-serif':['Arial']})
plt.figure()
plt.yticks(np.arange(1, 9, step=1))
plt.xticks(np.arange(1, 10, step=1))
plt.ylim(ymin=0, ymax=9)
x = valori['Residue']
y1 = valori['average']
y2 = valori['lower']
y3 = valori['upper']
plt.plot(x, y1)
plt.plot(x, y2, 'k--', linewidth = 0.5)
plt.plot(x, y3, 'k--', linewidth = 0.5)
plt.xlabel('Residue', fontsize=16)
plt.ylabel('RMSF (Å)', fontsize=16)
plt.fill_between(x, y2, y3,
facecolor="blue", # The fill color
color='blue', # The outline color
alpha=0.1)
plt.xticks(fontsize=16)
plt.yticks(fontsize=16)
plt.show()

Bar and line chart
The script applies to this example csv file to build a bar and line chart with significant visual aids.

import sys
import os
import matplotlib as mpl
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
valori=pd.read_csv("bar_line_chart.csv")
date = valori.Date.tolist()
total = valori.Total.tolist()
freq = valori.Frequency.tolist()
x = np.arange(len(date))
width = 0.7
fig, ax = plt.subplots()
rects1 = ax.bar(x, total, width, label='total number of processed sequences', color='#305252')
ax2 = ax.twinx()
ax2.set_ylabel('% sequences processed', color='#F34213', fontsize=13)
ax.set_ylabel('total number of processed sequences', color='#305252', fontsize=13)
ax2.plot(date, freq, color='#F34213')
ax.tick_params(axis='x', length = 0)
ax.set_xticklabels(date, rotation = 70)
ax2.set_yticks([0.02, 0.04, 0.06, 0.08, 0.10, 0.12])
ax2.set_yticklabels(['0.02%', '0.04%', '0.06%', '0.08%', '0.10%', '0.12%'])
colors = ['b', 'r', 'g', 'b', 'r', 'g', 'b', 'r', 'g' ]
for ytick, color in zip(ax.get_yticklabels(), colors):
ytick.set_color('#305252')
for ytick, color in zip(ax2.get_yticklabels(), colors):
ytick.set_color('#F34213')
ax.yaxis.label.set_color('black')
ax2.yaxis.label.set_color('black')
ax.spines['top'].set_visible(False)
ax2.spines['top'].set_visible(False)
ax.tick_params(axis='y', color='#305252')
ax2.spines['left'].set_color('#305252')
ax2.tick_params(axis='y', color='#F34213')
ax2.spines['right'].set_color('#F34213')
plt.show()

Grouped bar chart
The following code creates a publication-quality bar chart from an xlsx input. In the xlsx, the percentages are missing, so they will be added using tweaks in the code.

import sys
import os
import matplotlib as mpl
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
mpl.rc('font',**{'family':'sans-serif','sans-serif':['Arial']})
valori=pd.read_excel("bar_chart.xlsx")
severity = valori.Severity.tolist()
before = valori.Before.tolist()
after = valori.After.tolist()
x = np.arange(len(severity))
width = 0.35
fig, ax = plt.subplots()
rects1 = ax.bar(x - width/2, before, width, label='Before', color='#003f5c')
rects2 = ax.bar(x + width/2, after, width, label='After', color='#bc5090')
for rect in rects1:
height = rect.get_height()
ax.text(rect.get_x() + rect.get_width()/2., 0.99*height,
'%d' % int(height) + "%", ha='center', va='bottom')
for rect in rects2:
height = rect.get_height()
ax.text(rect.get_x() + rect.get_width()/2., 0.99*height,
'%d' % int(height) + "%", ha='center', va='bottom')
plt.ylabel('Frequency', fontsize=13)
plt.xlabel('Severity', fontsize=13)
plt.xticks(x, severity)
plt.yticks(ticks=(10, 20, 30, 40),labels=('10%', '20%', '30%', '40%'))
leg = plt.legend()
leg.get_frame().set_edgecolor('black')
leg.get_frame().set_linewidth(0.5)
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
plt.tight_layout()
plt.show()

Grouped bar chart with error bars (pre-calculated values)
For this example we have a csv file with precalculated means and standard errors, for three variables, each comprising five different patient visits. The script will subtract the subset for glycogen and create a grouped bar chart with standard errors.

import sys
import os
import matplotlib as mpl
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
plt.rcParams["figure.figsize"]=(10, 5)
mpl.rc('font',**{'family':'sans-serif','sans-serif':['Arial']})
valori=pd.read_csv("grouped_error_precalculated.csv")
df = valori[valori['variable'] == "Glycogen"]
df_piv = df.pivot(index='Visit',columns='Treatment',values='mean')
df_err = df.pivot(index='Visit',columns='Treatment',values='se').values.T
df_piv.plot(kind='bar', width = 0.6, yerr=df_err, capsize=2, color = {'#653700', '#e6daa6', '#ad8150'})
plt.title('Glycogen', fontsize=16)
plt.ylabel('mg/dl', fontsize=12)
plt.xlabel('Visit', fontsize=12)
plt.legend(loc='center left', bbox_to_anchor=(1, 0.5))
plt.xticks(rotation=0)
plt.savefig("glycogen.svg")
plt.show()

Grouped bar chart with error bars (values to be calculated)
For this example we have a csv file with multiple measurements for which we group values and auto-calculate means and standard deviation. The labels will be put in alphabetical order; one may need to manually specify the exact order of the data, if needed.

import sys
import os
import matplotlib as mpl
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
mpl.rc('font',**{'family':'sans-serif','sans-serif':['Arial']})
valori=pd.read_csv("bigfile.csv")
valorigrupate=valori.groupby(['Residue']).agg({'Elec':['mean', 'std'],'VdW':['mean', 'std'], 'Total':['mean','std']})
valorigrupate.columns = ['Elecmean','Elecstd','VdWmean','VdWstd','Totalmean','Totalstd']
#valorisortate=valorigrupate.sort_values('Totalmean')
ind=np.arange(len(valorigrupate))
width=0.28
ax = plt.gca()
ax.set_aspect('auto')
fig, ax = plt.subplots()
rects=ax.bar(ind - width, valorigrupate.Totalmean, width, color='blue', yerr=valorigrupate.Totalstd, capsize=2)
rects1=ax.bar(ind, valorigrupate.Elecmean, width, color='r', yerr=valorigrupate.Elecstd, capsize=2)
rects2=ax.bar(ind + width, valorigrupate.VdWmean, width, color='y', yerr=valorigrupate.VdWstd, capsize=2)
ax.set_xticks(ind)
ax.set_yticks((10, 0, -10, -20, -30, -40, -50))
ax.set_xticklabels(valorigrupate.index)
ax.hlines(0, -2, 21, linestyle='-', linewidth=0.5)
axes = plt.gca()
axes.set_xlim([-1,13])
axes.set_ylim([-53,13])
plt.xticks(rotation='vertical')
ax.legend((rects[0], rects1[0], rects2[0]), ('Total', 'Elec', 'VdW'))
plt.savefig("bar_calculated_values.svg")
plt.show()
