BSPC Training

These pages are developed by NICHD’s Bioinformatics and Scientific Programming Core (BSPC) as a resource to help train NICHD staff and fellows in various computational and bioinformatics topics.

Note: the opinions on these pages do not reflect those of NICHD or NIH.

There’s a lot of excellent training material out there, so rather than repeat it, this website acts as a central location of curated resources. We’ve found the good stuff so you can get right to learning.

Start here

Bioinformatics and scientific programming is a big field, and it can be difficult to know where to start.

If you are at NIH, you can schedule a meeting (ryan.dale@nih.gov) to discuss your training goals and expectations so we can help develop a customized training plan.

Ready? Head to First steps.

Changelog

If you haven’t been here in a while and want to know what’s new, see the Changelog.

Currently-written topics

First steps gives you an introduction to the content here and some context. If you’re just starting out, head there first.

Initial training

These sections help you learn the basics of programming, and include some examples of beginner/intermediate/advanced skills to help you figure out where you are in your learning and how to advance.

Next steps

Once you have the basics of programming, these sections will broaden your skills.

Genomics

These sections point to resources to learn about some specific genomics topics:

Additional resources

Changelog

TODO

Scattered throughout the documentation are .. todo:: entries, which are collected here for reference. This demonstrates which parts of the documentation are still in progress as well as serving as a one-stop-shop for what topics to write about next.

Todo

Identify what should stay on the rnaseq page and what should be moved here

(The original entry is located in /home/runner/work/training/training/source/deseq2.rst, line 6.)

Todo

add info and links about emacs, especially org-mode

(The original entry is located in /home/runner/work/training/training/source/emacs.rst, line 6.)

Todo

To tie everything together, add examples of figures from papers, and explain how all of these steps come together.

(The original entry is located in /home/runner/work/training/training/source/genomics-formats.rst, line 324.)

Todo

For genomics, write the following:

  • Aligners (Bowtie2, HISAT2, BWA, STAR)?

  • Links to example RNA-seq and ChIP-seq workflows (possibly from https://hbctraining.github.io/main/)

  • bedGraph, wig, bigBed, bigWig, chromsizes

  • example RNA-seq and ChIP-seq bash scripts scale that up to Snakemake workflows?

(The original entry is located in /home/runner/work/training/training/source/genomics-formats.rst, line 329.)

Todo

Describe our workflow more (using issues, merge requests, etc)

(The original entry is located in /home/runner/work/training/training/source/gitlab.rst, line 32.)

Todo

Write about lcdb-wf, why we use it, how to learn it

(The original entry is located in /home/runner/work/training/training/source/lcdb-wf.rst, line 6.)

Todo

write data science learner profile

(The original entry is located in /home/runner/work/training/training/source/learner-profiles.rst, line 64.)

Todo

write bioinformatician learner profile

(The original entry is located in /home/runner/work/training/training/source/learner-profiles.rst, line 71.)

Todo

The learner profiles link to the following pages, many of which still need writing:

  • DESeq2

  • Galaxy

  • Reproducibility

  • Installation

  • Collaborating with BSPC

  • Biowulf

  • lcdb-wf

  • RNA-seq

(The original entry is located in /home/runner/work/training/training/source/learner-profiles.rst, line 94.)

Todo

Needs more content!

(The original entry is located in /home/runner/work/training/training/source/programming.rst, line 21.)

Todo

Need to write on the following topics for reproducibility:

  • conda

  • git

  • requirements.txt

  • make sure you add anything you install to requirements.txt

  • conda create -p ./env for shared directories

(The original entry is located in /home/runner/work/training/training/source/reproducibility.rst, line 6.)

Todo

Topics needed for UCSC:

  • track hubs

  • udcTimeout=1

  • hosting files on datashare

  • custom tracks and tracklines

  • useful built-in tracks

(The original entry is located in /home/runner/work/training/training/source/ucsc.rst, line 6.)

Todo

This section could use some better organization

(The original entry is located in /home/runner/work/training/training/source/variant-calling.rst, line 177.)

Todo

Visualization topics needed:

  • genome browser screenshots

  • inkscape

  • Tufte-isms (chartjunk; data to ink ratio)

  • why not pie charts

  • colormaps (jet vs viridis)

  • colorblindness

(The original entry is located in /home/runner/work/training/training/source/visualization.rst, line 29.)

Indices and tables