The 3 nature papers on DNA structure published in 1953: The Watson-Crick paper is one of the best written manuscripts I have ever read. Please read the delicate three lines on p. 558 by Watson & Crick starting with ("It has not escaped our notice.....", rarely has ONE sentence hidden so much depth and comprehension).
Guides:
How to become a scientist by Pr. Yewdell: First, Second
All laboratory staff NEED (obligatory) to have completed Inserm's Néo : accueil et prévention (expect 4x20min).
Always wear lab coat and gloves when experimenting.
Make sure to treat waste correctly. Ask your supervisor or colleagues if in doubt. A. Liquid waste should be marked with your initials, U1135, date and a description of the content (should be placed in the corridor on Wednesday afternoon). B. Solid waste should be marked with your initials, U1135 and the date.
Antibodies
Antibodies are playing a major role in our research. Both directly as our target of interest as well as indirectly, because they represent essential tools for the detection of various biomolecules (Flow cytometry, ELISA etc.). Antibody reagents used for research can be unconjugated or conjugated with biotin, HRP or various fluorochromes. They are generally very expensive (3-600 Euros per bottle/tube). HOW should you preserve these reagents.
1. Always work on ice (when you take the tube out of the fridge or freezer keep them on ice at all times).
2. Avoid freeze-thaw cycles. For antibodies that requires to be stored frozen make aliquotes. When aliquotes are thawn (only after verifying that an aliquote is not already available in fridge) do NOT re-freeze them (keep in fridge).
3. Antibodies are generally very stable and can be stored for quit a long time in fridge (follow manufactures recommendation)
4. For long-term storage antibodies can be frozen at -80dC (AVOID for PE conjugated antibodies). Make aliquotes of a reasonable size, which can be left in fridge after thawing for a few months (avoid freeze-thaw cycles). NEVER freeze a completely batch of antibody.
5. Rarely (only after discussing with your supervisor) antibodies can be frozen at -20dC, but in this case adding upto 50% glycerol is an advantage as it will benefit from the low temperature but avoid crystalization (even at -20dC the antibody solution is still liquid).
6. Samples containing antibodies (serum, fecal water, breastmilk etc.) are generally stored at -80dC (long-term storage), but if they have been diluted or plated out for experimental use with no need for long-term storage they can be stored at -20dC (please consult your supervisor).
Beyond the impact the above guidelines could have on the experimental quality of your work, antibodies are also a major part of our lab budget. Please take care of them.
Exhaustive boolean gating can be done in Flow Jo using the "Combination gates" function. Exporting this data can then be analyzed with the FunkyCells software.
Of note, since my video presentation I have added a few slides (you can download above). In particular I realised that a ChatGPT detection tool exist (GPT-2 Output Detector). I tried the tool and it works well for texts completely generated by ChatGPT, but if you ask it to improve a text not written by ChatGPT, it doesn't seem to be able to detect the improvements. Generally, it seems to work best with long texts. In conclusion, the tool works but does suffer from a significant number of false positives and false negatives. I'm not convinced that we have the time to verify all texts and hand written text (copied from ChatGPT) would be difficult to test.
DADA2 Tutorial - Remy VILLETTE
Lab Guru Tutorial - Manon CHAUVIN
Microscopy - FIJI macros - Alice PASCAULT (script)
A ChatGPT detection tool exist (GPT-2 Output Detector). Probably many more will arrive over time. I tried the tool and it works well for texts completely generated by ChatGPT, but if you ask it to improve a text not written by ChatGPT, it doesn't seem to be able to detect the improvements. Generally, it seems to work best with long texts. In conclusion, the tool works but does suffer from a significant number of false positives and false negatives. I'm not convinced that a teacher has the time to verify all texts and hand written text (copied from ChatGPT) would be difficult to test. Contrarily, one could imagine that ChatGPT would one day be able to correct our students work (that would take away a heavy evaluation burden from teachers and allow them to spend more time on their primary mission - to teach).
Copyright: Interestingly, what derives from ChatGPT is owned by the one that asked the question resulting in the production. Therefore you own the copyright to the material you produce with ChatGPT (link to OpenAI's website).
Can I use output from ChatGPT for commercial uses?
Subject to the Content Policy and Terms, you own the output you create with ChatGPT, including the right to reprint, sell, and merchandise – regardless of whether output was generated through a free or paid plan.
Steven Gee has a website with various examples of how to use ChatGPT for bioinformatics projects.
dbBact is a tool that can associate microbiota or individual ASVs with a database of other studies and identify metadata features that may be relevant for your own study. Several types of output, including "word clouds".
MixOmics Workshop by Sebastien DEJEAN (08/09/2023)
Sebastien DEJEAN is a research engineer in the mathematical department at the University of Toulouse, France. He is specialised in statistical models and involved in the creation of the MixOmics R package. He has created the presentation and associated R script presented below. Sebastien DEJEAN has agreed that the video and documents are made available to the public domain. We wish to thank Sebastien for supporting our team with his advice and expertise - THANK YOU.
NCBI made a very useful tool to download quickly and accurately large sequencing data stored on SRA, ENA or DDBJ servers. You will need to install, configure and get accession list for your samples. The toolkit will download SRA formatted files and then transform them in fastq.
Rapidly you can just run this in your terminal and chose the directory you want the SRA toolkit to download into. If you don’t perform this step the toolkit will download in the directory were you put it, usually in your root. So change this to a directory with a large storage capacity.
First get the filenames in .txt format from the run selector or the sra entrez search. You need to go to the SRA entrez on NCBI and use the accession number to find your samples. Paste the accession number in the searching zone, press enter. Then on the top right of the panel you’ll find “Send to” button. Select “Run selector”. Select .SRA entrez usually covers for SRA, ENA, DDBJ and more.
path <-"/home/remy/sratoolkit.current-centos_linux64/sratoolkit.2.10.7-centos_linux64/bin"#make the path to the toolkit directory
func_fetch<-paste0(path, "/prefetch --option-file") #prepare the function
Here you can chose the directory where you want your SRA formatted files to be stored. If you already did it in the vdb -configure -i you don’t need to redo it. You can also use this code if you want to change temporarily the output-directory.
# This section of code is to make sure that files do not contain WGS data
library(tidyverse)
# give an output file to where the files will be downloaded
f <-list.files("/home/remy/Documents/SOP_early_life_R/Datas", pattern ="SraRunTable.txt")
cat(tmp, "\n", "column",paste0(colnames(metadata[[tmp]])[nb], collapse =" | "), "\n", "contains a pattern related to SHOTGUN sequencing, there is shotgun innit \n")
“PRJEB26419” project contains hidden WGS and transcriptomic datas. For this specific paper we need to subset the data manually
You can now proceed with the function. It will download SRA formatted files (not compressed however) in the directory you chose. Give the accession list you downloaded from NCBI. For this tutorial we will use the accession number : PRJEB2079.
cmd1<-paste(func_fetch, accession , "--output-directory", outdir, "--progress") # here goes the function
system(cmd1) # launch it, it will take some time depending on the number of sample that you are downloading
#Troubleshooting In case you need to redownload some files or you’re code stopped. If the utils function find an existing fastq it will stop the loop. For that we remove these fastqs from the list. For the files that stopped downloading
library(tidyverse)
dir2<- outdir #Get the directory were files are stored
cmd1<-paste( paste0(path, "/prefetch "), x , "--output-directory", outdir, "--progress") # here goes the function
system(cmd1)
}
Transform SRA formatted data into fastq.
There is a subtility there : be aware that some files will be uploaded as single-end sequence files, meaning there is no Forward and Reverse files. In that case you just need to adjust the functions gzip. Presently fasterq-dump doesn’t allow compression so we need to use external compression function.
# You may encounter a problem and the code stopped. If a file has allready been done the utils function will block the loop
# In that manner there is a character vector to detect the files allready done
files<-list.files(dir2) #make a lit of the samples
Finally, we want to remove SRA formatted files as they are big and useless now.
outdir2 <-list.files(outdir)
outdir2 <-paste0(outdir,"/",outdir2)
for (dl in outdir2){
file.remove(dl)
}
Work metadata from multiple SRA projects.
The challenge is now to homogenize the metadata in order to merge them later. Metadata are given by authors and can vary from 6 columns to 50, with different column names for the same information (ex: “Host_age” and “Age”). For now we just want to import the metadata files in R to analyze the names of the columns and their frequencies before considering any modifications.
# Get frequencies of data types - used for metadata harmonization
colname <-NULL
data_name=NULL
dim_metadata=NULL
for (i innames(new_meta)){
colname <-c(colname, colnames(new_meta[[i]]))
data_name[[i]]=colnames(new_meta[[i]])
dim_metadata=rbind(dim_metadata,data_frame(project= i, factors=dim(new_meta[[i]])[2], samples=dim(new_meta[[i]])[1]))
}
tmp =as.data.frame(table(colname))
tmp = tmp[order(-tmp$Freq),]
sum(tmp$Freq==49)
filter(tmp, Freq==49)%>%View()
write.csv(. ,"shared_col.csv")
filter(tmp, Freq==1)%>%View()
sum(tmp$Freq==1)
sum(dim_metadata$samples)
Change the column names work on progress.
metadat= metadata
for(i innames(new_meta)){
nb =str_detect(colnames(new_meta[[i]]), "Age|_age_|AGE")