Research
Genomics
My research in genomics has focused on methods development to accelerate translation of high end genomics platforms from the lab to the clinic.
A novel neural network architecture for multi-modal data integration
I developed a novel Stacked Variational Autoencoder called maui to integrate data from different omics experiments, and used it to get state of the art results in cancer sub-typing and patient survival prediction.
Read the paperUsing gene networks to increase the signal-to-noise ratio in noisy experiments
I used a random walks process on a protein-protein interaction network to smooth noisy data coming from single cell RNA sequencing experiments. The method is an alternative to commonly used imputation methods, which tend to introduce many artificial artifacts in the data.
The netSmooth method improves identifiability of cell types from single cell RNA sequencing, but only when using real gene-gene networks (i.e. this is not a random result obtained by a lucky shuffle of the data)
While netSmooth was developed with single cell RNA sequencing in mind, it also works well on other omics data types. For instance, it further improves patient survival prediction (using maui) when applied to sparse mutation data.
Read the paperPolitical Science
My political science research has focused on political participation on social media.
Social Networks and Protest Participation: Evidence from 130 Million Twitter Users
I collected a dataset representing the social network of twitter users who attended the 2015 Charlie Hebdo protest in Paris, as well as two plausible control sets (twitter users who had shown interest in the march but did not attend, as well as french twitter users in general).
Together with my collaborators at NYU, I analyzed the large scale network, providing the first ever empirical evidence for social protest theory. The study was awarded the American Journal of Political Science's Best Article of 2019, as well as the 2020 American Political Science Association's Best Article in the APSA Information Technology and Politics Section.
Read the paperPolitical Knowledge and Misinformation in the Era of Social Media: Evidence from the 2015 U.K. Election
In collaboration with YouGov, my co-authors and I collected information about voters' knowledge of political hot topics, using a panel survey design. We also used the survey respondents' twitter data to see their exposure to tweets by news media and political figures.
Using this design, we were able to see how voters' knowledge on political issues changed during the election campaign, and show that while twitter use generally leads to increased knowledge about political issues, exposure to messages from political parties shifts voters' knowledge from the truth in a direction that benefits the parties' political agendas.
Read the paperProduction optimization
My M.Sc. thesis in Control Engineering tackled the problem of optimal production of oil from large gas-cap reservoirs under the gas coning phenomenon. This phenomenon happens when the gas-oil interface drifts towards the well in reservoirs with a thin layer of oil under a large gas cap. It was written under joint supervision with a co-advisor from Statoil (the Norwegian state petroleum giant).
Other than potentially saving Statoil a few millions of dollars on oil production from the Troll A field1, this thesis has received several citations in the petroleum litterature for proving the stability of the PDE model of the gas-oil (or gas-water) interface, under a positive flow assumption (for the math inclinded, it's on page 19 of the thesis).
Open source software
maui - Multiomics autoencoder integration
Multi-omics Autoencoder Integration (maui) is a python package for multi-omics data analysis. It is based on a bayesian latent factor model, with inference done using artificial neural networks. |
https://github.com/BIMSBbioinfo/maui
netSmooth - Network smoothing for omics data denoising
netSmooth is an R package for network smoothing of single cell RNA sequencing data. Using gene interaction networks such as protein- protein interactions as priors for gene co-expression, netsmooth improves cell type identification from noisy, sparse scRNA-seq data. The smoothing method is suitable for other gene-based omics data sets such as proteomics, copy-number variation, etc. |
https://github.com/BIMSBbioinfo/netSmooth
PiGx - Reproducible pipelines in genomics using GNU Guix
PiGx is a collection of genomics pipelines. All pipelines are easily configured with a simple sample sheet and a descriptive settings file. The result is a set of comprehensive, interactive HTML reports with interesting findings about your samples. |
http://bioinformatics.mdc-berlin.de/pigx/
More software
Check out my github profile!
Selected publications
Peer-reviewed articles
- Evaluation of colorectal cancer subtypes and cell lines using deep learning. Life science alliance, 2(6), 2019
- netSmooth: Network-smoothing based imputation for single cell RNA-seq. F1000Research, 7, 2018
- Social networks and protest participation: Evidence from 130 million Twitter users. American Journal of Political Science, 63(3), pp.690-705, 2019
- PiGx: reproducible genomics analysis pipelines with GNU Guix. Gigascience, 7(12), p.giy123, 2018
- Functional interplay of Epstein-Barr virus oncoproteins in a mouse model of B cell lymphomagenesis. Proceedings of the National Academy of Sciences, 2020
- Munger, K., Egan, P., Nagler, J., Ronen, J. and Tucker, J.A., 2016. Learning (and unlearning) from the media and political parties: Evidence from the 2015 UK election. Forthcoming in BJPS.
Book chapters
- Multi-omics analysis, chapter in Computational Genomics with R, Altuna Akalin, 2020.
More publications
Check out my google scholar profile
Side projects
Some side-projects I've worked on. A little bit of political activism, and some whimsy.
www.lahadam.co.il - להד"ם - Neverland
www.lahadam.co.il was a website that kept track of Israeli politician's edits and deletes on facebook, and showed the diffs. It was retired after the cat-and-mouse game with facebook, who did not think this was a legitimate use of their API, became too tiresome. Got a few thousands hits per day in the campaign period of the 2015 elections, and my wife and I went on Israeli national TV to talk about some of the worst edits. |
www.holderdeord.no - Are they keeping their word?
www.holderdeord.no was a Norwegian political watchdog. It downloaded role-call votes from the Notwegian parliament's API, and compared them with political parties' manifesto pledges. I did some web development and data work as part of a large team of volunteers. The project got some national attention in the 2013 election campaign and has gathered over $50k in donations over the years. It was retired in 2020 when most of the original team didn't have time to maintain it any longer. |
Seinfeldvision - GPT-2 fine-tuned on Seinfeld scripts
I fine-tuned a GPT-2 language model (yes, that one) on Seinfeld scripts. Then, I wrapped it a runway model, to make it easily accessible to non-programmers. Turns out GPT-2 (the 345M parameter version) is very good at producing seinfeld-esque dialogue, in spite of its other shortcomings! |
Retirement calculater - How soon can you retire?
Ever wonder how soon you can retire and live off the returns on your savings? |
Freelance
I have been freelancing as a data scientist and applied machine learning practitioner since 2012. Some of my high profile clients include:
To have a chat, shoot me an e-mail at yona [at] jonathanronen.com
1. I don't know if they implemented it in the end; I chose not to pursue a career in petroleum for environmental reasons. ↩