Course theme

Type of course

Banner
BannerShape

Bioinformatics with Linux and Python

  • 11 - 18 March 2024
  • Mornings 9:00-12:30
  • Online
  • Methodology
  • 1.5 ECTS

Course description

Linux and Python, a dynamic, readable programming language, is a popular combination for all types of bioinformatics work, from simple one-off scripts to large, complex software projects. This workshop is aimed at complete beginners and assumes no prior programming experience. It gives an overview of the language with an emphasis on practical problem-solving, using examples and exercises drawn from various aspects of bioinformatics work. The workshop is structured so that the parts of the language most useful for bioinformatics are introduced as early as possible, and that students can start writing plausibly-useful programs after the first few sessions. After completing the workshop, students should be in a position to (1) apply the skills they have learned to tackling problems in their own research and (2) continue their Linux and Python education in a self-directed way.

Course programme (subject to small changes):

  • Session 1 –  connecting to the server and basic Linux commands

In the first session we briefly cover the design of Linux: how is it different from Windows/OSX and how is it best used? We’ll then jump straight onto the command line and learn about the layout of the Linux filesystem and how to navigate it. We’ll describe Linux’s file permission system (which often trips up beginners), how paths work, and how we actually run programs on the command line. We’ll learn a few tricks for using the command line more efficiently, and how to deal with programs that are misbehaving. We’ll finish this session by looking at the built in help system and how to read and interpret manual pages.

  • Session 2 – assembling Linux commands into pipelines

Many data types we want to work with in bioinformatics are stored as tabular plain text files, and here we learn all about manipulating tabular data on the command line. We’ll start with simple things like extracting columns, filtering and sorting, searching for text before moving on to more complex tasks like searching for duplicated values, summarizing large files, and combining simple tools into long commands. Aliases, shell redirection, pipes, and shell scripting will all be introduced here.

  • Session 3 – introduction to bash scripting and variables

In this session we will introduce the idea of a script –  a text file that combines commands to be run as a batch. We will get to grips with the basic idea by converting some of the complex command lines that we composed in the previous session into scripts. This gives us an opportunity to discuss the pros and cons of scripting. An important idea introduced in this session is that of a variable – a bit of information that can be passed into scripts. Sometimes variables can be files, or lists of files, which allows us to build our own custom command line tools.

  • Session 4 – biological pipelines and data formats

In this session we will apply the approaches that we learned in the previous three sessions to biology-specific tools, looking at Eutils for sequence retrieval and EMBOSS for biological data file manipulation. A discussion of file format, focussing on FASTA and genbank format, will be necessary.

  • Session 5 – introduction to Python, text and files

In this session students learn to write very simple programs that produce output to the terminal, and in doing so become comfortable with editing and running Python code. This session also introduces many of the technical terms that we’ll rely on in future sessions. I run through some examples of tools for working with text and show how they work in the context of biological sequence manipulation. We also cover different types of errors and error messages, and learn how to go about fixing them methodically.  We’ll finish by looking at how to get data in and out of our programs using files.

  • Session 6 – lists and loops in Python

A discussion of the limitations of the techniques learned in session 3 quickly reveals that flow control is required to write more sophisticated file-processing programs, and I introduce the concept of loops. We look at the way in which Python loops work, and how they can be used in a variety of contexts. We explore the use of loops and lists together to tackle some more difficult problems.

  • Session 7 – conditions in Python

I use the idea of decision-making as a way to introduce conditional tests, and outline the different building-blocks of conditions before showing how conditions can be combined in an expressive way. We look at the different ways that we can use conditions to control program flow, and how we can structure conditions to keep programs readable.

  • Session 8 – writing functions in Python

We discuss functions that we’d like to see in Python before considering how we can add to our computational toolbox by creating our own. We examine the nuts and bolts of writing functions before looking at best-practice ways of making them usable. We also look at a couple of advanced features of Python – named arguments and defaults.

  • Session 9 – paired data and dicts in Python

We discuss a few examples of key-value data and see how the problem of storing them is a common one across bioinformatics and programming in general. We learn about the syntax for dictionary creation and manipulation before talking about the situations in which dictionaries are a better fit that the data structures we have learned about thus far.

  • Session 10 – programming workshop

Course organisation

Course coordinator (for more information)

Elackiya Sithamparanathan, e-mail: elackiya.sithamparanathan@wur.nl, tel: +31 (0) 6 8731 6681

Course fee

  • WUR PhDs with TSP: €310 (early bird) / €360 (regular)
  • SENSE PhDs with TSP: €620 (early bird) / €670 (regular)
  • All other PhD candidates: €700 (early bird) / €750 (regular)
  • Staff of WUR Graduate Schools including Postdocs / graduate schools mentioned above: €700 (early bird) / €750 (regular)
  • All others: €740 (early bird) / €790 (regular)

Registration

  • Early bird registration deadline: 11 February 2024
  • Regular registration deadline: 25 February 2024
  • Go to registration form

N.B.: This course gives priority to VLAG and WIMEK PhD candidates due to the limited space available.

  • WIMEK and VLAG, Wageningen University
  • Yearly
  • Dr. Martin Jones (founder of Python for Biologists)
  • 12-15 participants
  • Entree level programming. No modelling skills are needed. However, this course is heavy on bioinformatics and programming, so the interest to improve your modelling and programming skills is a requirement.