Jonathan Ronen

M.Sc. Electrical & Control Engineering
Ph.D. Computational Biology & Machine Learning
   

Berlin-based machine learning researcher and data scientist with 10+ years of experience in software engineering, computational social science, biomedical research and AI. Founder and CTO at Arcas This is my personal website.

Research

Genomics

My research in genomics has focused on methods development to accelerate translation of high end genomics platforms from the lab to the clinic.

A novel neural network architecture for multi-modal data integration

I developed a novel Stacked Variational Autoencoder called maui to integrate data from different omics experiments, and used it to get state of the art results in cancer sub-typing and patient survival prediction.

Schematic of stacked variational autoencoder
Schematic of a stacked Variational Autoencoder (maui)
Maui achieves state of the art performance on cancer subtyping and patient survival prediction
maui achieves state of the art performance on cancer subtyping and patient survival prediction
Read the paper

Using gene networks to increase the signal-to-noise ratio in noisy experiments

I used a random walks process on a protein-protein interaction network to smooth noisy data coming from single cell RNA sequencing experiments. The method is an alternative to commonly used imputation methods, which tend to introduce many artificial artifacts in the data.

Sketch of netsmooth algorithm
The netSmooth algorithm

The netSmooth method improves identifiability of cell types from single cell RNA sequencing, but only when using real gene-gene networks (i.e. this is not a random result obtained by a lucky shuffle of the data)

Sketch of netsmooth algorithm
The netSmooth algorithm works well - but only with real gene-gene networks

While netSmooth was developed with single cell RNA sequencing in mind, it also works well on other omics data types. For instance, it further improves patient survival prediction (using maui) when applied to sparse mutation data.

netSmooth improves patient survival prediction
netSmooth, when used on sparse mutation data, improves patient survival prediction accuracy
Read the paper

Political Science

My political science research has focused on political participation on social media.

Social Networks and Protest Participation: Evidence from 130 Million Twitter Users

I collected a dataset representing the social network of twitter users who attended the 2015 Charlie Hebdo protest in Paris, as well as two plausible control sets (twitter users who had shown interest in the march but did not attend, as well as french twitter users in general).

Together with my collaborators at NYU, I analyzed the large scale network, providing the first ever empirical evidence for social protest theory. The study was awarded the American Journal of Political Science's Best Article of 2019, as well as the 2020 American Political Science Association's Best Article in the APSA Information Technology and Politics Section.

Read the paper

Political Knowledge and Misinformation in the Era of Social Media: Evidence from the 2015 U.K. Election

In collaboration with YouGov, my co-authors and I collected information about voters' knowledge of political hot topics, using a panel survey design. We also used the survey respondents' twitter data to see their exposure to tweets by news media and political figures.

Using this design, we were able to see how voters' knowledge on political issues changed during the election campaign, and show that while twitter use generally leads to increased knowledge about political issues, exposure to messages from political parties shifts voters' knowledge from the truth in a direction that benefits the parties' political agendas.

Read the paper

Production optimization

My M.Sc. thesis in Control Engineering tackled the problem of optimal production of oil from large gas-cap reservoirs under the gas coning phenomenon. This phenomenon happens when the gas-oil interface drifts towards the well in reservoirs with a thin layer of oil under a large gas cap. It was written under joint supervision with a co-advisor from Statoil (the Norwegian state petroleum giant).

The gas coning phenomenon in large gas cap reservoirs
The gas coning phenomenon in large gas cap reservoirs

Other than potentially saving Statoil a few millions of dollars on oil production from the Troll A field1, this thesis has received several citations in the petroleum litterature for proving the stability of the PDE model of the gas-oil (or gas-water) interface, under a positive flow assumption (for the math inclinded, it's on page 19 of the thesis).

Open source software

maui - Multiomics autoencoder integration

Multi-omics Autoencoder Integration (maui) is a python package for multi-omics data analysis. It is based on a bayesian latent factor model, with inference done using artificial neural networks.

https://github.com/BIMSBbioinfo/maui

netSmooth - Network smoothing for omics data denoising

netSmooth is an R package for network smoothing of single cell RNA sequencing data. Using gene interaction networks such as protein- protein interactions as priors for gene co-expression, netsmooth improves cell type identification from noisy, sparse scRNA-seq data. The smoothing method is suitable for other gene-based omics data sets such as proteomics, copy-number variation, etc.

https://github.com/BIMSBbioinfo/netSmooth

PiGx - Reproducible pipelines in genomics using GNU Guix

PiGx is a collection of genomics pipelines. All pipelines are easily configured with a simple sample sheet and a descriptive settings file. The result is a set of comprehensive, interactive HTML reports with interesting findings about your samples.

http://bioinformatics.mdc-berlin.de/pigx/

More software

Check out my github profile!

Selected publications

Peer-reviewed articles

Book chapters

More publications

Check out my google scholar profile

Side projects

Some side-projects I've worked on. A little bit of political activism, and some whimsy.

www.lahadam.co.il - להד"ם - Neverland

www.lahadam.co.il was a website that kept track of Israeli politician's edits and deletes on facebook, and showed the diffs. It was retired after the cat-and-mouse game with facebook, who did not think this was a legitimate use of their API, became too tiresome. Got a few thousands hits per day in the campaign period of the 2015 elections, and my wife and I went on Israeli national TV to talk about some of the worst edits.

www.holderdeord.no - Are they keeping their word?

www.holderdeord.no was a Norwegian political watchdog. It downloaded role-call votes from the Notwegian parliament's API, and compared them with political parties' manifesto pledges. I did some web development and data work as part of a large team of volunteers. The project got some national attention in the 2013 election campaign and has gathered over $50k in donations over the years. It was retired in 2020 when most of the original team didn't have time to maintain it any longer.

Seinfeldvision - GPT-2 fine-tuned on Seinfeld scripts

I fine-tuned a GPT-2 language model (yes, that one) on Seinfeld scripts. Then, I wrapped it a runway model, to make it easily accessible to non-programmers. Turns out GPT-2 (the 345M parameter version) is very good at producing seinfeld-esque dialogue, in spite of its other shortcomings!

Get the model

Retirement calculater - How soon can you retire?

Ever wonder how soon you can retire and live off the returns on your savings?

Play with the calculator

Freelance

I have been freelancing as a data scientist and applied machine learning practitioner since 2012. Some of my high profile clients include:

To have a chat, shoot me an e-mail at yona [at] jonathanronen.com

1. I don't know if they implemented it in the end; I chose not to pursue a career in petroleum for environmental reasons.