Module 4 — Weeks 9-10
Stop making notebooks that live on your laptop. Build tools other scientists can actually use — dashboards, apps, and LLM-powered data extraction. This is what scientific data leads DO.
GitHub repo: drug-safety-dashboard
What to build: A Streamlit dashboard that:
Write a Python function that queries the FDA FAERS API for adverse events associated with a given drug name. Parse the JSON response to extract: reaction names, outcome counts, seriousness level, and reporting quarter. Return as a clean DataFrame. Handle pagination for drugs with many reports.Build a Streamlit dashboard with: (1) search bar for drug name, (2) bar chart of top 20 adverse events by frequency, (3) treemap showing events organized by MedDRA organ system class, (4) comparison tab that overlays AE profiles of two drugs side by side using plotly. Use st.tabs for organization. Cache API calls with @st.cache_data.Create a drug clustering analysis: build a feature matrix (rows = drugs, columns = adverse event types, values = normalized frequency). Run UMAP + KMeans(k=5). Visualize with plotly scatter, color by cluster, hover shows drug name. Add a sidebar to adjust k. Do drugs in the same therapeutic class cluster together?GitHub repo: biopaper-mining-tool
What to build:
Write a function using the NCBI E-utilities API that searches PubMed for a query string and returns the top N abstracts with PubMed IDs, titles, authors, publication date, journal, and full abstract text as a DataFrame. Include rate limiting (max 3 requests/second per NCBI guidelines).Using the Anthropic Python SDK, write a function that sends a PubMed abstract to Claude and extracts structured data. The system prompt should instruct Claude to return JSON with these fields: compound_name, modality (one of: small_molecule, antibody, peptide, cell_therapy, gene_therapy, other), target_gene, model_system, species, key_finding, toxicity_mentioned (bool), doses_tested. Handle cases where information isn't stated (return null, not a guess). Use Claude Haiku for cost efficiency since we're processing many abstracts.Build a Streamlit app for this literature mining tool. Layout: (1) sidebar with PubMed search query input, number of papers slider (10-100), and a "Mine Papers" button with a progress bar, (2) main area with tabs: "Results Table" (sortable, filterable DataFrame), "Analytics" (bar charts of top targets, compound types, model systems), "Download" (CSV export button). Cache results so re-running doesn't re-fetch.Startup potential: Automated literature mining + structured databases is a real market need. Pharma companies spend massive resources on manual literature review. A tool that reliably extracts and structures data from thousands of papers is genuinely valuable. Deploy this on Streamlit Cloud and link from your portfolio.
You should have 2 deployed Streamlit apps after this module.