Categories
Uncategorized

DeepVariant: Genomics Meet Deep Learning

Collage of DNA images and people

Yesterday I attended a talk on deep learning and genomics by Pi-Chauan Chang, a software engineer at Google.  Pi-Chauan gave a high level overview of deep learning and how her team formulates a problem in genomics to successfully apply deep learning techniques. She also discussed DeepVariant – a software built by Google to enable community efforts to progress genomic sequencing.

What is deep learning?:

Deep Learning is a subfield of machine learning concerned with algorithms inspired by the structure and function of the brain called artificial neural networks.

Deep learning is playing a huge role in advancements in genomic research such as high processing of sequencing techniques.  This information era where we continue to be presented with an outpouring of data has truly began to challenge conventional methods used in genomics. While deep learning has succeeded in a variety of fields such as vision, speech, and text processing it is now presented with the unique challenge of helping us to explore beyond our current knowledge to interpret the genome.

Pi-Chauan Chang shared that genome sequencing is a core technology in biology.. It allows us to ask how can we personalize medicine  based on genome?

What is a Genome?:

A genome is an organism’s complete set of genetic instructions. Each genome contains all of the information needed to build that organism and allow it to grow and develop.

There are 23 chromosome we inherent from our parents.

Most of our DNA is similar.. 99.9% of our DNA are the same– this makes us human..

Its the .1 pecent that makes us unique. 

The Human Genome Project was a milestone of genome sequencing . This was the massive international collaboration to map the complete human genome. This project outputted a  genome dictionary ~ 3.2 million characters.

A decade ago it was expensive to sequence people..now it cost ~$1000 to sequence an individual. This creates much opportunity for precision medicine.

There is, however, a trade off.

The new sequencing technology has errors! From blood draw computational biologists get raw data(characters of ACTG) which are really short snippets of  the whole genome.. much like puzzle pieces. They try to map the puzzle pieces but are faced having to find the variants.

Variant calling:

Variant calling is the process by which we identify variants from sequence data.

Typically variant calling consist of a three step process:

  1. Carry out whole genome or whole exome sequencing to create FASTQ files.
  2. Align the sequences to a reference genome, creating BAM or CRAM files.
  3. Identify where the aligned reads differ from the reference genome and write to a VCF file.
A CRAM file aligned to a reference genomic region as visualised in Ensembl. Differences are highlighted in red in the reads, and will be called as variants.

The audience was informed that it is pretty common that computational biologists regularly inspect genomic data..

The question at hand is can we teach machines to perform the same task? Can we teach a machine to detect the variants?

This is where deep learning steps in.

DeepVariant

DeepVariant is a deep learning technology to reconstruct the true genome sequence from HTS(high-throughput sequencing)  sequencer data with significantly greater accuracy than previous classical methods. DeepVariant transforms the task of variant calling, as this reconstruction problem is known in genomics, into an image classification problem well-suited to Google’s existing technology and expertise.

DeepVariant is now an open source software to encourage collaboration and to accelerate the use of this technology to solve real world problems!

https://github.com/google/deepvariant

Categories
Uncategorized

Grace Hopper PitcHer Contest -2018

IMG_2935

Yesterday I attended Grace Hopper’s inaugural PitcHer contest.  The goal of this pitch competition is to support, encourage, and provide new funding opportunities to women entrepreneurs. The top ten finalist competed for a grand total of $65,000.  I was elated to find that the first place prize went to my personal favorite, Shakeia Kegler.  Her business idea accompanied by her amazing stage presence sealed the deal! After chatting with her at the end of the event it was clear that she is a brilliant and down to earth woman with much to offer to the startup community. I was lucky enough to get a selfie with her at the end of the event! I’d love to invite her to Startup Milwaukee Week this year or next! Below are bios/business summaries of the winners.

 Shakeia Kegler – First Place!

Shakeia Kegler is from Saint Petersburg, Florida, and is the eldest of five girls. After graduating from high school in 2011, she joined the U.S. Navy. While enlisted, she gained experience in purchasing, compliance, and quality assurance while earning a bachelor’s degree in business management and her Lean Six Sigma Certifications.

After her honorable discharge, Shakeia worked as a compliance and contract specialist in the government, contracting department of a pharmaceutical company. Her experience in both the Navy and government led her to found GovLia in 2017. GovLia is a cloud-based platform that simplifies state and local government procurement processes to help increase small business participation in order to foster economic opportunity and growth for diverse companies and communities.

Hannah Meyer – Second Place

As COO of Pie for Providers, Hannah builds tools that aim to measurably and significantly strengthen small childcare businesses and empower the entrepreneurs that operate them. She is committed to not only building a profitable and scalable business, but doing so in a way that leads to better outcomes for women business owners, parents, children, and their communities.

Hannah holds an MBA from the University of Chicago Booth School of Business, and was awarded the Tarrson Fellowship for social entrepreneurs by the University of Chicago. She was also a Summer Associate at Ashoka in the Social Financial Services Department. Prior to coming to the University of Chicago, Hannah earned an MPPA from Northwestern University.

Charu Sharma – Third Place

Charu Sharma is the Founder & CEO of NextPlay.ai. While working at LinkedIn, Charu started a mentorship program for women at the company as a passion project. This eventually inspired her to start NextPlay and to create meaningful mentorship relationships, especially for women and underrepresented minorities. NextPlay’s investors and advisors include 500 Startups, LinkedIn’s SVP Engineering, Techcrunch’s former CEO, and Microsoft’s former Chief Design Officer.

Companies like Square, Lyft, Asurion, and Splunk use the NextPlay mobile app to build sticky and measurable mentorship programs. After six months of using NextPlay’s app, mentees felt that their preparedness to achieve their goals at their companies had doubled, and mentors reported that they significantly developed their critical leadership and coaching skills. Collectively, the number of employees who strongly recommended working for their companies increased by 25%.

Charu previously built two startups. She has educated one million women on how to start their own businesses through her nonprofit, books, and documentary film “Go Against the Flow.”

Samantha J. Letscher – Audience Favorite

Sam Letscher is the Co-founder and CEO of Bossy, a platform that connects feminist consumers with women-owned businesses to drive revenue to women entrepreneurs. She launched Bossy in Chicago in the spring of 2017 while pursuing her bachelor’ degree in Integrated Engineering Studies at Northwestern University.

Sam is inspired by products, services, experiences, brands, and workplaces built by women, for women, and from which women profit. She is now a recent college graduate with a bachelor’s degree in human-centered design and entrepreneurship.

Sam lives in Chicago where she is building and bootstrapping Bossy while working part-time in local politics. She strives to always stay curious and optimistic.

https://ghc.anitab.org/2018-pitcher/2018-finalists/

Categories
Uncategorized

Grace Hopper 2018 – Training generative adversarial networks: A challenge?

Our-text-conditional-convolutional-GAN-architecture-Text-encoding-pht-is-used-by-bothToday I had the pleasure of attending a very interesting workshop on generative adversarial networks. The goal of the workshop was to teach attendees about deep learning and Generative Adversarial Networks (GANS).  In the lab we used PyTorch, an open source deep learning framework, used to demonstrate and explore this type of neural network architecture. The lab was comprised of two major parts an introduction to both PyTorch and GANs followed by text-to-image generation.

The first part of the lab began by importing torch modules, creating a simple linear transformation model creating a loss function to understand the difference between our model and the ground truth.

Next we ran our model on a GPU! Earlier in the session we learned that GPUs work well for deep learning because they are inherently parallel.
With GPUs, trained neural networks can occur in minutes.

We then began to focus more on GANs. The facilitators of the workshop shared that GANs are getting widespread attention in the deep learning community for their image generation and style transfer capabilities.
This deep learning technique uses two neural networks in a adversarial way to complete its objective.

One network is called the generator and the other the discriminator. The discriminator network is trained with a dataset comprised of real data and output from the generator network, and its objective is to discriminate between the two. The generator network’s objective is to fool the discriminator into classifying its output as real data. While training the generator is updated to generate data that mimics the real data and fool the discriminator.

In this part of the lab attendees were tasked to :

  • Feed data into PyTorch using Numpy
  • Create a multi-layer network
  • Configure the generator and discriminator network
    • Learn how to update the generator network

The second part of the lab  we built upon the popular Deep Convolution Generative Adversarial Network (DCGAN)  to enable text to image generation. This part of the lab was based on the paper Generative Adversarial Text to Image Synthesis by Reed et, al.  Captions of images were encoded and concatenated with the input noise vector before being propagated to the generator. Then the encoded caption was concatenated again with a feature map in the discriminator network after the fourth leaky Rectified Linear Unit (ReLU) layer.The goal of the second half of the lab was to create a text to image model by using the GAN+CLS technique.

We demonstrated the capability of our model to generate plausible images of pizzas and broccoli from detailed text descriptions/captions. While this was just a case for learning purposes its clear that there are many powerful applications to this deep learning technique.