DataPub #4: Cataloging the World - Medicine, Tweets & Beyond

Learn how our community guest speakers catalog the thousands of pharmaceutical drugs on the market and the never-ending stream of Twitter data, plus ways to explore these datasets for your projects.

At our latest DataPub event, we were lucky to have Tudor Oprea and Suhem Parack join us to share how they’re using open data to power their research and cataloguing projects. One of the most exciting things about hosting this meetup is hearing about different datasets and how community members apply them to better understand the world around them.

...and this session was no exception: Tudor uses various public datasets from government agencies to catalog pharmaceutical drugs into one comprehensive database, while Suhem uses Twitter Developer Labs (and Twitter APIs) to analyze the vast amount of public information people share on an ongoing basis.

If you were able to join live, thank you! If you missed it, we have you covered - check out the recording and detailed recap below.

We hope to see you for DataPub #5 in September, where we'll focus on all things IoT 🎉.

RSVP to join us on Tuesday, September 22nd (1pm PT/ 4pm ET/ 8pm GMT)
If you are unable to attend, register anyway, and we’ll send you the recording and resources shortly following the session.

Guest speakers and Session Summary

Prashant Sridharan, Timescale’s very own open data enthusiast and VP of Marketing, kicks us off, welcoming everyone and briefly sharing a bit about Timescale’s mission: allow every developer to store, analyze, and build on top of their data. Our goal is to help teams measure what matters in their world - whether it’s IoT devices, IT systems, marketing analytics, financial metrics, or anything between.

As an organization, we love combining TimescaleDB with public data to demonstrate the power of analytics (you can check out some of Prashant's open data work in Charting the Spread of COVID-19 and Public Dataset Tips & Tricks: How to weave together public datasets to make sense of the world).

From there, he introduces our (amazing) guest speakers.

Speaker #1 Tudor Oprea, Professor of Medicine at the University of New Mexico: “Intro to DrugCentral 2020”

Tudor Oprea is a professor of Medicine at the University of New Mexico, with decades worth of drug discovery knowledge and data mining experience. In his session, Tudor walks us through his current project, DrugCentral 2020, a database that started as a way to answer “how many pharmaceutical drugs exist?”. The project is funded by the Illuminating the Druggable Genome (IDG), and now curates 4600+ active drug ingredients and thousands of medicines, side effects, use cases, and more.

He starts his talk by posing a seemingly simple question, “What is a drug?” and explaining how there are two schools of thought: what patients, doctors, and consumers call “drugs” (pharmaceutical products), and what scientists call “drugs” (active pharmaceutical ingredients ).

DrugCentral data structure

The DrugCentral Data Structure was created to help reconcile these two cultures, but mapping between the “green and blue” (see diagram just above) is an ongoing, labor-intensive process. As the team maps this information, they extract the relevant pieces of info on drugs, their drug targets, and associated diseases, in order to understand how these drugs work.

The DrugCental team set out to map as many active drug ingredients as possible

Tudor continues to take us through how the project uses external resources and classifications in its mapping model, as well as how they break down the Active Pharmaceutical Ingredients (APIs) statuses - not a small feat, as there are ~4600 total active ingredients, 1.8K of which are FDA approved and on-market. He also shares an example of how we can use DrugCentral to understand the difference between over-the-counter and prescription medications, where minor differences or branding can play a major role in cost.

To wrap up, you’ll see how to use and navigate the DrugCentral database (~5 min demo). Tudor takes us through searching for a drug/agent and exploring its molecular structure, synonyms, dosage, approvals (date, company, etc.), and FDA reported adverse side effects.

Reverse lookups work too, so you can search by side effects and/or diseases to see a list of active ingredients with X side effect(s), or ingredients known to treat Y disease.

You can also download the data directly to explore and run your own analysis.

👉 DrugCentral Download

Link to Tudor’s slides

Speaker #2 Suhem Parack, Developer Relations for Academic Research at Twitter: “Getting started with Twitter Developer Labs”

Suhem is a Senior Developer Advocate at Twitter, where he focuses on helping the academic research community succeed on Twitter’s Developer Platform. Fittingly, Suhem’s talk is around Twitter Developer Labs - the new version of the Twitter API Platform. He shares what’s new with the various APIs and how they’re different, then gives us a code walkthrough to demonstrate how to get up and running with your own tweet sentiment analysis.

He details each API in his talk, but here’s a quick look at what’s new:

Suhem’s favorite? Context Annotations, which allow you to contextually understand what a tweet is about, including the topic (domain) and if any named entities are present in the text. In the past, you may have written custom code or used third party services to handle topic modeling for you, but now Twitter automates this process.

From there, Suhem breaks down all of the APIs available in Twitter Developer Labs and what they allow developers to accomplish, including:

Sample Stream (v1): stream about 1% of all new public Tweets as they happen.
Filtered Stream (v1): filter the real-time stream of public Tweets (developers can filter the real-time stream by applying a set of rules, as well as select response format).
Recent Search (v2): narrow searches to query only public Tweets posted in the previous seven days (tweets are delivered in reverse-chronological order, starting with the most recent Tweets).
Hide replies (v2): programmatically hide replies, using criteria you define.

...and, to bring it all together, Suhem takes us through a code implementation demo, where we follow along as he shows how to register your account and use the APIs to query and analyze tweet sentiment (his demo uses Python, but the steps should apply to any language).

If you have feedback or need help along the way, you can reach out to Suhem directly (@suhemparack) or the Twitter Developer team (@twitterdev).

Explore Twitter Developer Labs for yourself 👉 Intro to Twitter Developer Labs

Download sample code from GitHub 👉 Twitter Developer Labs Sample Code

Link to Suhem’s slides

In closing

Thank you to everyone who helped make this meetup a success, from our speakers Tudor Oprea and Suhem Parack, to all the people that registered (and/or attended live), and the Timescale team members that played a role in bringing DataPub to life.

Our next meetup is on Tuesday, September 22nd (1pm PT / 4pm ET / 8pm GMT).

And, for this one, we'll focus on IoT use cases and projects.

Join us if you’d like to learn about IoT applications and architectures, are already part of the community, or are simply looking for a way to connect with new technical folks - everyone’s welcome and the more the merrier!

RSVP here.

See you in September!

If you have an open data or IoT project that you'd like to share with the rest of the community, please reach out to events@timescale.com.