Dataset Open Access

Pre-processed B cell receptor sequences from BioProject PRJNA349143

Gupta, Namita; Laserson, Uri; Vander Heiden, Jason

Processed sequencing data from BioProject PRJNA349143.

Study Design

Samples were collected from human volunteers as described in Laserson and Vigneault et al, 2014 (1). Briefly, blood samples were collected from three individuals both pre- and post-vaccination for seasonal influenza. Samples were collected for sequencing at time points -8 days, -2 days, -1 hour, +1 hour, +1 day, +3 days, +7 days, +14 days, +21 days and +28 days relative to injection with seasonal influenza vaccine.

Library Preparation and Sequencing

The original samples from Laserson and Vigneault et al, 2014 (1) were re-sequenced as described in Gupta et al, 2017 (2). Briefly, sequencing libraries were prepared from mRNA using 5'RACE with addition of 17-nucleotide unique molecular identifiers (UMIs). Amplification was performed using constant region primers specific to IGHA, IGHD, IGHE, IGHG, IGHM, IGKC and IGLC. Sequencing was conducted on the Illumina MiSeq platform using the 600 cycle kit with 325 cycles for read 1 and 275 cycles for read 2. A 10% PhiX spike-in was added for sequencing.

Data Processing

Sequences were processed using the pRESTO (3) and Change-O (4) toolkits as described in Gupta et al, 2017 (2).

Note, the provided data has been filtered significantly, including the removal of sequences that fail V(D)J alignment and the exclusion of non-functional sequences.


Processed sequences are provided in FASTA format annotated using the pRESTO scheme.

Annotations included are as follows:

  • CONSCOUNT: Raw read count from which UMI consensus sequences were generated, summed over all UMIs for the given unique sequence.
  • DUPCOUNT: UMI count for the given unique sequence.
  • PRCONS: Constant region primer (isotype).
  • SUBJECT: Subject identifier.
  • TIME_POINT: Time point label.


  1. Laserson U and Vigneault F, et al. High-resolution antibody dynamics of vaccine-induced immune responses. Proc Natl Acad Sci USA 111, 4928-33 (2014).
  2. Gupta NT, et al. Hierarchical Clustering Can Identify B Cell Clones with High Confidence in Ig Repertoire Sequencing Data. J Immunol 1601850 (2017).
  3. Vander Heiden JA and Yaari G, et al. pRESTO: a toolkit for processing high-throughput sequencing raw reads of lymphocyte receptor repertoires. Bioinformatics 30, 1930–2 (2014).
  4. Gupta NT and Vander Heiden JA, et al. Change-O: a toolkit for analyzing large-scale B cell immunoglobulin repertoire sequencing data. Bioinformatics 31, 3356–8 (2015).

Files (67.6 MB)
Name Size
67.6 MB Download


Cite as