About

About Me

In a small but important part of my professional life, I work with data: building models, running analyses, and trying to communicate results clearly.

I am concerned about the reproducibility crisis affecting published medical research, the implausibly large effect sizes, the curious clustering of p values just below 0.05, the opaque methods, and data sets that are not disclosed. In short, I believe we live in a system of bad incentives for career advancement that directly caused a crisis of reliability, integrity and credibility in bio-medical research.

About This Blog

This blog is a space for documenting statistical methods and techniques I find useful and worth remembering. Topics covered here include cookbook recipes, data visualization, reproducible research workflows — things that come up in actual work as well as toy examples when they help. Packages I like, such as ggplot2, rms, boot, bbmle will get heavy play. I have always struggled with some of the vectorization aspects of R such as the apply family so I am very keen to learn the tidyverse dialect.

I am not a professional statistician though, and this blog is called “starting in stat[istics] [with] R” precisely because I am, eternally, a student of data science.

The first version of this repository was hosted on blogger.com from 2013 to around 2019, then moved to the current hosted site (vaszar.org) as a WordPress blog with two major sequential versions. The WP environment was unnecessarily complex and far too maintenance-intensive for such a simple blog. That made me explore Hugo and Jekyll, but reject them, before settling on a static HTML site.

Posts are written as R Markdown documents — code, output, and prose together. Each post is rendered (“knitted”) in RStudio, then a small R build script assembles the index, archive sidebar, and tag pages. The design is deliberately plain: no JavaScript framework, no SQL database, no server-side processing, no fancy themes. Just static files uploaded to my hosted domain. The blog’s appearance is governed by one simple CSS, and its initial color followed Ethan Schoonover’s Solarized Light scheme, but have since moved to the standard Posix R Studio output; I am still keeping the reference to Solarized in case I go back to it

Note: I am transferring, refreshing, and uploading older (pre-2026) posts from WordPress, assigning them arbitrary dates so they don’t all cluster in April 2026. Newer posts, and the re-dos of older ones benefit heavily from AI systems like Claude and Grok; the direction is set by the human, the execution is AI.

Why R Markdown

R Markdown keeps code, writing, and output in the same document. When the analysis changes, re-knitting the file updates both the prose and the output — no copying results between programs, no version drift.

A small example: generating a sequence and summarizing it.

x <- seq(1, 100, by = 5)
summary(x)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.00   24.75   48.50   48.50   72.25   96.00

The output above is produced live each time the document is knitted, so it always reflects the actual code.

Another feature of R Markdown is that it seamlessly integrates math notation, and images, for example the formula of the two-parameter Weibull PDF:

\[f(x; k, \lambda) = \frac{k}{\lambda} \left(\frac{x}{\lambda}\right)^{k-1} e^{-(x/\lambda)^k}, \quad x \geq 0\] where:

Parameter Symbol Role
Shape \(k > 0\) Controls the shape of the distribution
Scale \(\lambda > 0\) Stretches or compresses the distribution along the x-axis

In summary, this blogs consists of anything that strikes my fancy, presented simply. Less is more.