Install Hugo

Hugo is a static site CMS

A great tutorial in German

  • Install Hugo

brew install hugo

hugo new site blog

  • Install a theme, e.g. cocoa

git clone https://github.com/nishanths/cocoa-hugo-theme.git themes/cocoa

  • Activate the theme

hugo -t cocoa

  • Change the date format depeding on the theme, e.g. cocoa: look up correct date format in the config-file of your theme. Use that information in your config.toml as a parameter

[params]

dateform = "Jan 2, 2006"

dateformfull = "Mon Jan 2 2006 15:04:05 MST"

  • Upload the files to a server or Github

Machine Learning in Economics

“How will machine learning impact economics?” - Quora, Susan Atney

in the longer run econometricians will modify the methods and tailor them so that they meet the needs of social scientists primarily interested in conducting inference about causal effects and estimating the impact of counterfactual policies

  • ML is broad term, she uses narrowly

2 branches in ML

1. supervised learning

  • features/covariates/x to predict outcome y
  • methods: LASSO, random forest, regression trees, support vector machines
    • common feature: cross-validation to select model complexity = off-the-shelf ML methods
    • = training and test data sets
    • = repeatedly estimate model on part of the data –> test it on another part
    • find the “complexity penalty term” that fits the data best in terms of mean-squared error of the prediction (the squared difference between the model prediction and the actual outcome)
  • cross-sectional econometrics (traditional): one model specified, robustness checks by looking at 2 or 3 alternatives
  • but econometrics is not only about good prediction
    • prediction != causal effect
    • sometimes goodness-of-fit is reduced in order to estimate causal effect, e.g. changing prices
    • “Techniques like instrumental variables seek to use only some of the information that is in the data – the “clean” or “exogenous” or “experiment-like” variation in price—sacrificing predictive accuracy in the current environment to learn about a more fundamental relationship that will help make decisions about changing price. This type of model has not received almost any attention in ML.”
  • her research: take ML methods and apply them to causal inference
    • required: change objective function, b/c “ground thruth of causal parameter not observed in any test set”

Statistical theory plays a bigger role, since we need a model of the unobserved thing we want to estimate (the causal effect) in order to define the target that the algorithms optimize for.

  • she’s researching on developing a statistical theory, e.g. random forests

2. unsupervised learning

  • no outcome variable y
  • find clusters of similar objects
  • she did: find clusters of news articles on a similar topic

They are commonly used to group images or videos; if you say a computer scientist discovered cats on YouTube, it can mean that they used an unsupervised ML method to find a set of similar videos, and when you watch them, a human can see that all the videos in cluster 1572 are about cats, while all the videos in cluster 423 are about dogs. I see these tools as being very useful as an intermediate step in empirical work, as a data-driven way to find similar articles, reviews, products, user histories, etc.

Susan Athey on how economists can use machine learning to improve policy

“Supervised machine learning is basically about prediction”

“You have some features (x’s) and you try to predict (y’s). The innovations from machine learning have been to find really effective ways to do that, especially in an environment when there are lots of x’s and you don’t have a theory for exactly how the x’s should predict the y’s.

ML: systematic way of selecting which factors matter and why

Applied to Policy problems

  • detained in jail pending trials, who released on bail?

What ML can’t

But these techniques currently do not uncover how changing one factor affects others –which is at the heart of evaluating policy impacts.

Athnes research

  • modify supervised ML to be able to ask this questions: I change one factors - what happens?

  • ensure effects are valid and not product of chance

“In some sense, these are methods that could open up the ability to discover what is in the data without the risk that you are going to end up with a false discovery”

Arguments against ML in Econometrics

link

  • econometricians want to explain observed phenomena

  • some ML techniques (neural network, SVM, ensemble) have difficult time quantifying impact of one variable on the observed phenomena

  • philosophical diff: econometricians start with theory, ml starts with data

  • times series have LOTS of problems; not controllable in ML, but with manual handwork:

    • Controlling multicollinearity (controlling VIF)
    • Finding co-integrated series
    • Controlling spurious relations
    • Controlling over-fitting
    • Controlling and solving autocorrelation problem
    • Controlling and solving heteroscedasticity problem
  • ML about prediction/optimal path; econometrics about causation

    • ML could help in forecasting

Import data from clipboard in RStudio

Copy your data, use the snippet below. Ready.

df <- read.table(pipe("pbpaste"), sep="\t", header=T)

Upload dataframe to Google Spreadsheets in R

library(googlesheets)

# dataframe
df <- data.frame(x = c(1,2), y=c(2,3))

# access googlesheets
all_my_sheets_in_drive <- gs_ls()

# access spreadheet
gs <- gs_title("my spreadsheet")

# create worksheet
gs_ws <- gs_ws_new(gs, ws_title = "new worksheet")

# upload dataframe to worksheet "new worksheet"
gs_edit_cells(gs, ws="new worksheet", input = df, trim = TRUE)

Delete n rows at top or bottom of a dataframe in R

Example 1: Delete the first 20 rows of a dataframe

tail(d, -3)

Example 2: Delete the last 20 rows of a dataframe

head(d, -3)

Search/Filter with grepl with multiple patterns

Goal: Filter a dataframe, that contains parts of strings, that are in another dataframe.

# dataframe that contains strings
d1 <- c("halloauch", "hieronymus", "grüßdich", "hello", "hi", "hallo")
d1 <- data.frame(d1)
# dataframe that contains patterns
d2 <- c("hi", "hallo")
d2 <- data.frame(d2)

grepl can’t search through a dataframe. It needs a vector with the or-operator

convert patterns to vector

d2 <- as.vector(as.matrix(d2))

combine patterns to a single element

d2 <- paste(d2, collapse=“|”)

check, if dataframe contains patterns and filter these elements:

library(dplyr) d %>% filter(grepl(d2, d1$name))

Start x-axis and y-axis at 0

d %>% ggplot(aes(x = x, y=y) + geom_line() + expand_limits(y = 0, x = 0)

Create Rank Variable with dplyr

library(dplyr)

df <- df %>% mutate(rank = dense_rank(desc(variable_to_be_ranked)))

Get min and max values with group_by in dplyr

library(dplyr) df %>% group_by(group_variable) %>% filter(value == max(value))

df %>% group_by(group_variable) %>% filter(value == max(value))

via stackoverflow

New column based on string pattern

Find a string pattern in a column. Create a new column

df$new_column[df$colum_with_pattern %in% c("abc", "def")] <- "string in new column"

# alternative with pattern vector
patterns <- c("abc", "def")

df$new_column[df$colum_with_pattern %in% patterns] <- "string in new column"