class: center, middle, inverse, title-slide .title[ # Policy Gradient Methods ] .author[ ### Lars Relund Nielsen ] --- layout: true <!-- Templates --> <!-- .pull-left[] .pull-right[] --> <!-- knitr::include_graphics("img/bandit.png") --> <!-- .left-column-wide[] .right-column-small[] --> --- ## Learning outcomes - Identify why policy gradient (PG) methods differs from value-based methods. - Explain why differentiable, parameterized policies are needed for PG algorithms. - Describe the softmax policy parameterization using action preferences. - Understand the structure and meaning of the PG theorem. - Explain the REINFORCE algorithm and understand why it is an unbiased MC estimator. - Explain how baselines reduces variance without altering the expected gradient. - Understand the conceptual and mathematical foundations of actor–critic methods. - Understand how the TD error provides a lower-variance advantage signal for the actor. - Explain how PG methods extend to continuing tasks via average reward. - Understand how to parametrize policies for continuous action spaces. - Recognize how mixed discrete–continuous action spaces can be handled. --- # Policy Gradient Methods - Up to this point approximated based on value functions. - The best policy can be found by selecting the action with the highest estimate. - Policy used is derived from the estimates and hence dependent on the estimates. - Now focus on directly learning a parametrized policy `\(\pi(a|s, \theta)\)`. - Can select actions without referring to a value function. - The objective is to learn the policy the maximize a performance measure `\(J(\theta)\)`. - These methods are known as a *policy gradient method*. - The value function may still be employed to assist in learning the policy parameters. - If also learns a value function approximation, it is referred to as an *actor-critic* method. - The actor is the agent that acts. The critic is the one who criticises or evaluates the actor's performance by estimating the value function. --- ## Policy Approximation - Let the policy be differentiable with respect to `\(\theta\)`: `$$\pi(a|s, \theta) = \Pr(A_t = a|S_t = s, \theta_t = \theta).$$` - In practice, to ensure exploration `\(\pi(a|s,\theta) \in (0, 1)\)` for all `\(s, a\)`. - Updates follow a *stochastic gradient-ascent* rule: `$$\theta_{t+1} = \theta_t + \alpha \nabla J(\theta_t)$$` - For *discrete actions*, we use a softmax function (*soft-max in action preferences*): `$$\pi(a|s,\theta) = \frac{e^{h(s,a,\theta)}}{\sum_b e^{h(s,b,\theta)}},$$` where `\(h(s, a, \theta)\)` is a numerical preference (can be parametrised arbitrarily). - Guarantees continual exploration since no action ever receives zero probability. --- ## Policy Approximation and its Advantages Compared to value-based methods, policy approximation offers several advantages. - In policies with a softmax the resulting stochastic policy can approach a deterministic one. As the differences between preferences grow, the softmax distribution becomes increasingly peaked, and in the limit it becomes deterministic. - Enables the selection of actions with arbitrary probabilities. In problems with significant function approximation, the best approximate policy may be stochastic. - The policy may be a simpler function to approximate. - The choice of policy parameterization is sometimes a good way of injecting prior knowledge about the desired form of the policy into the RL system (important reason). - Stronger convergence guarantees with continuous policy parameterization. - The action probabilities change smoothly as a function of the learned parameter. - In `\(\epsilon\)`-greedy selection, the action probabilities may change dramatically given a small change in action values. --- ## The Policy Gradient Theorem - To do stochastic gradient-ascent, we need to find the gradient of the performance measure `\(J(\theta)\)` with respect to the policy parameters `\(\theta\)`. - Episodic case: Objective/performance `\(J(\theta) = v_{\pi_\theta}(s_0)\)` given `\(\pi_\theta\)`. - Given `\(s\)` and `\(\pi_\theta\)` we can find the next action and reward. - But how can we estimate the performance gradient when the gradient depends on the unknown effect of policy changes on the state distribution? - Policy gradient theorem: The gradient of `\(J(\theta)\)` can be written as `$$\nabla J(\theta) \propto \sum_s \mu(s) \sum_a q_{\pi}(s,a) \nabla \pi(a|s,\theta)$$` where `\(\mu(s)\)` is the on-policy distribution over states under `\(\pi\)`. - The gradient can be expressed without involving the derivative of the state distribution. --- ## From policy gradient to eligibility vector Using `\(\nabla \pi(a \mid S_t, \theta) = \pi(a \mid s, \theta)\,\nabla \ln \pi(a \mid s, \theta)\)`, we may modify the Policy Gradient Theorem: `$$\begin{align*} \nabla J(\theta) &\propto \sum_s \mu(s) \sum_a q_{\pi}(s,a) \nabla \pi(a|s,\theta) = \mathbb{E}_\pi\left[\sum_a q_\pi(S_t,a)\nabla\,\pi(a \mid S_t, \theta)\right] \\ &= \mathbb{E}_\pi\left[\sum_a q_\pi(S_t,a) \pi(a \mid S_t, \theta)\,\nabla \ln \pi(a \mid S_t, \theta)\right]\\ &= \mathbb{E}_\pi\left[q_\pi(S_t, A_t)\,\nabla \ln \pi(A_t|S_t, \theta)\right] = \mathbb{E}_\pi\left[G_t\,\nabla \ln \pi(A_t|S_t, \theta)\right]\\ \end{align*}$$` - Expectation is taken based on the trajectory distribution generated by the current policy. - The policy parameters is adjusted in proportion to the product of the action-value `\(q_\pi(S_t, A_t)\)` and the gradient of the log-probability. - The gradient `\(\nabla \ln \pi(A_t|S_t, \theta)\)` is often called the *eligibility vector*. --- ## REINFORCE: Monte Carlo Policy Gradient <img src="img/1303_REINFORCE.png" width="100%" style="display: block; margin: auto;" /> Note an discount rate have been added here (we didn't include it in the policy gradient theorem). --- ## REINFORCE with Baseline (1) - The REINFORCE algorithm use the full MC return (often very high variance). - To reduce this variance/stability, a baseline `\(b(s)\)` can be subtracted from the return. - Replace the return `\(G_t\)` with `\(G_t - b(S_t)\)`. The new update rule becomes: $$ \theta_{t+1} = \theta_t + \alpha\,(G_t - b(S_t))\,\nabla \ln \pi(A_t|S_t,\theta_t). $$ - The baseline may depend on the state but must not depend on the action. Hence $$ \sum_a b(s)\,\nabla \pi(a|s,\theta) = b(s)\,\nabla \sum_a \pi(a|s,\theta) = b(s)\,\nabla 1 = 0. $$ - Subtracting `\(b(s)\)` alters only variance, not the expectation. - An effective choice is using the approx. state-value `\(b(s) = \hat v(s, w)\)` with updates `$$w \leftarrow w + \alpha_w\,(G_t - \hat v(S_t,w))\,\nabla \hat v(S_t,w).$$` --- ## REINFORCE with Baseline (2) - This produces a *critic* that approximates how good each state is on average. - The policy update (the *actor*) then adjusts the probabilities in proportion to how much better or worse the return was compared to what is expected for the state. - Still a Monte Carlo method. - Still provides unbiased estimates of the policy gradient. - The improvement is purely variance reduction to accelerate learning. - Empirically, leads to much faster convergence. - We now have both learning rules for actor and critic: $$ `\begin{aligned} w &\leftarrow w + \alpha_w\,(G_t - \hat v(S_t,w))\,\nabla \hat v(S_t,w), \\ \theta &\leftarrow \theta + \alpha_\theta\,(G_t - \hat v(S_t,w))\,\nabla \ln \pi(A_t|S_t,\theta). \end{aligned}` $$ --- ## REINFORCE with Baseline algorithm <img src="img/1304_REINFORCE_With_Baseline.png" width="100%" style="display: block; margin: auto;" /> --- ## Actor-Critic Methods - Actor-critic methods replace the full MC return with a bootstrapped estimate. - The policy is the *actor* and the value function is the *critic*. The critic evaluates the state value, and the actor adjust the policy parameters. - Now let the critic use TD updates (faster updates and less variance). TD error: `$$\delta_t = R_{t+1} + \gamma \hat v(S_{t+1}, w) - \hat v(S_t, w).$$` - The critic update becomes: `$$w \leftarrow w + \alpha_w \,\delta_t\, \nabla \hat v(S_t, w).$$` The actor update becomes (with bias but lower variance): `$$\theta \leftarrow \theta + \alpha_\theta\,\delta_t\,\nabla \ln \pi(A_t|S_t, \theta).$$` - Actor-critic methods can be seen as the policy-gradient analogue of SARSA. --- ## Actor-Critic algorithm <img src="img/1305a_One_Step_Actor_Critic.png" width="90%" style="display: block; margin: auto;" /> --- ## Colab Let us consider the an example in the [Colab tutorial][colab-14-policy-gradient]. --- ## Policy Gradient for Continuing Problems (1) - New objective *average reward*: `$$J(\theta) = r(\pi) = \sum_s \mu(s)\sum_a \pi(a|s)\sum_{s',r} p(s',r|s,a)\, r.$$` - The policy gradient theorem still holds (now with equal sign) `$$\nabla r(\pi) = \sum_s \mu(s) \sum_a q_\pi(s,a)\,\nabla \pi(a|s,\theta).$$` - Value functions are as before except that the return is defined as the *differential value*: `$$G_t = R_{t+1} - r(\pi) + R_{t+2} - r(\pi) + \cdots.$$` --- ## Policy Gradient for Continuing Problems (2) - The gradient with a baseline then becomes $$\nabla r(\pi) \approx \mathbb{E}\left[(G_t-b(s))\,\nabla \ln \pi(A_t|S_t,\theta)\right]. $$ - If use TD and let the baseline be the state-value, then `$$G_t - b(s) \approx \delta_t = (R_{t+1} - \hat r + \hat v(S_{t+1})) - \hat v(S_t)$$` - Now also the average reward `\(r(\pi)\)` must be estimated during learning `\((\hat r)\)`. - Policy gradient methods extend naturally to the continuing case, but the formulation shifts from episodic returns to average reward and differential values. --- ## Policy Gradient algorithm (continuing case) <img src="img/1306_actor-critic-cont.png" width="80%" style="display: block; margin: auto;" /> --- ## Policy Parameterisation for Continuous Actions (1) - Consider *continuous action spaces*, meaning actions are real-valued (or vector-valued). - Policies are *parameterised probability density functions* over continuous actions `$$\pi(a \mid s, \theta) = \text{a differentiable density over } a$$` - A common parametrisation is the univariate Gaussian or Normal distribution: $$ \pi(a \mid s, \theta) = \frac{1}{\sqrt{2\pi\sigma^2(s, \theta)}} \exp\left( -\frac{(a - \mu(s, \theta))^2}{2\sigma^2(s, \theta)} \right), $$ where both the mean `\(\mu(s)\)` and standard deviation `\(\sigma(s)\)` may depend on the state and are parametrised by separate sets of weights `\(\theta = (\theta_\mu, \theta_\sigma)\)`. - The mean and variance can be `$$\mu(s, \theta) = {\theta_\mu}^\top \textbf x_\mu(s), \qquad \sigma^2(s, \theta) = \exp({\theta_\sigma}^\top \textbf x_\sigma(s)).$$` --- ## Policy Parameterisation for Continuous Actions (2) - The eligibility vector `\(\nabla \ln \pi(A_t|S_t, \theta_t)\)` becomes: `$$\nabla \ln \pi(a|s, \theta) = \frac{a-\mu(s, \theta_\mu)}{\sigma(s, \theta_\sigma)^2}\, \textbf x(s, \theta_\mu) + \left(\frac{(a-\mu(s, \theta_\mu))^2}{\sigma(s, \theta_\sigma)^2} - 1\right) \textbf x(s, \theta_\sigma).$$` - The choice of parametrization has important effects. - If the variance is too small, exploration collapses; if too large, gradient estimates become noisy. - Learning both mean and variance enables adaptive exploration: the variance shrinks in well-understood regions and grows where uncertainty is higher. - Once a differentiable density is available, all previous machinery for policy gradients applies unchanged. - The policy gradient theorem still holds, as it does not depend on action space cardinality. - Actor-critic methods remain preferable because they reduce variance. --- ## Mixed Action Spaces - The action includes both continuous and discrete components `\(a = (a^{\text{disc}},\, a^{\text{cont}}).\)` - The policy must represent a joint distribution over this mixed action space. - Policy gradient methods handle this naturally as long as the policy is differentiable. - A standard and convenient factorization is: $$ \pi(a \mid s) = \pi(a^{\text{disc}} \mid s)\, \pi(a^{\text{cont}} \mid s, a^{\text{disc}}). $$ - First choose the discrete action component. Then choose the continuous parameters conditioned on the discrete choice. - The log-policy splits naturally: `$$\ln \pi(a \mid s) = \ln \pi(a^{\text{disc}} \mid s) + \ln \pi(a^{\text{cont}} \mid s, a^{\text{disc}}).$$` `$$\nabla_\theta \ln \pi(a \mid s) = \nabla_\theta \ln \pi(a^{\text{disc}} \mid s) + \nabla_\theta \ln \pi(a^{\text{cont}} \mid s, a^{\text{disc}}).$$` --- ## Colab Let us consider the an example in the [Colab tutorial][colab-14-policy-gradient]. <!-- # References --> <!-- ```{r, results='asis', echo=FALSE} --> <!-- PrintBibliography(bib) --> <!-- ``` --> [BSS]: https://bss.au.dk/en/ [bi-programme]: https://masters.au.dk/businessintelligence [course-help]: https://github.com/bss-osca/rl/issues [cran]: https://cloud.r-project.org [cheatsheet-readr]: https://rawgit.com/rstudio/cheatsheets/master/data-import.pdf [course-welcome-to-the-tidyverse]: https://github.com/rstudio-education/welcome-to-the-tidyverse [Colab]: https://colab.google/ [colab-01-intro-colab]: https://colab.research.google.com/drive/1o_Dk4FKTsDxPYxTXBRAUEsfPYU3dJhxg?usp=sharing [colab-03-rl-in-action]: https://colab.research.google.com/drive/18O9MruUBA-twpIDpc-9boXQw-cSjkRoD?usp=sharing [colab-03-rl-in-action-ex]: https://colab.research.google.com/drive/18O9MruUBA-twpIDpc-9boXQw-cSjkRoD#scrollTo=JUKOdK_UqKRJ&line=3&uniqifier=1 [colab-04-python]: https://colab.research.google.com/drive/1_TQoJVTJPiXbynegeUtzTWBgktpL5VQT?usp=sharing [colab-04-debug-python]: https://colab.research.google.com/drive/1JHVxbE89iJ8CGJuwY-A4aEEbWYXMH4dp?usp=sharing [colab-05-bandit]: https://colab.research.google.com/drive/19-tUda-gBb40NWHjpSQboqWq18jYpHPs?usp=sharing [colab-05-ex-bandit-adv]: https://colab.research.google.com/drive/19-tUda-gBb40NWHjpSQboqWq18jYpHPs#scrollTo=Df1pWZ-DZB7v&line=1 [colab-05-ex-bandit-coin]: https://colab.research.google.com/drive/19-tUda-gBb40NWHjpSQboqWq18jYpHPs#scrollTo=gRGiE26m3inM [colab-08-dp]: https://colab.research.google.com/drive/1PrLZ2vppqnq0xk0_Qu7UiW3fASftZUX6?usp=sharing [colab-08-dp-ex-storage]: https://colab.research.google.com/drive/1PrLZ2vppqnq0xk0_Qu7UiW3fASftZUX6#scrollTo=nY6zWiv_3ikg&line=21&uniqifier=1 [colab-08-dp-sec-dp-gambler]: https://colab.research.google.com/drive/1PrLZ2vppqnq0xk0_Qu7UiW3fASftZUX6#scrollTo=GweToDSPd5gj&line=1&uniqifier=1 [colab-08-dp-sec-dp-maintain]: https://colab.research.google.com/drive/1PrLZ2vppqnq0xk0_Qu7UiW3fASftZUX6#scrollTo=HQnlVuuufR_Q&line=1&uniqifier=1 [colab-08-dp-sec-dp-car]: https://colab.research.google.com/drive/1PrLZ2vppqnq0xk0_Qu7UiW3fASftZUX6#scrollTo=xERxGYQDkR87&line=1&uniqifier=1 [colab-09-mc]: https://colab.research.google.com/drive/1I4gBqDqYQAEPOVlMqTyBG1AKSHTgyDm-?usp=sharing [colab-09-mc-sec-mc-seasonal-ex]: https://colab.research.google.com/drive/1I4gBqDqYQAEPOVlMqTyBG1AKSHTgyDm-#scrollTo=1BzUCPQxstvQ&line=3&uniqifier=1 [colab-10-td-pred]: https://colab.research.google.com/drive/1JhLDAtc-5lJ3fzp7natjT_ea_JRiQS7d?usp=sharing [colab-10-td-pred-sec-ex-td-pred-random]: https://colab.research.google.com/drive/1JhLDAtc-5lJ3fzp7natjT_ea_JRiQS7d#scrollTo=1BzUCPQxstvQ&line=4&uniqifier=1 [colab-11-td-control]: https://colab.research.google.com/drive/1EC7qmhZqirQdfV1lDn5wqabGlgE49Ghw?usp=sharing [colab-11-td-control-sec-td-control-storage]: https://colab.research.google.com/drive/1EC7qmhZqirQdfV1lDn5wqabGlgE49Ghw#scrollTo=1BzUCPQxstvQ&line=3&uniqifier=1 [colab-11-td-control-sec-td-control-car]: https://colab.research.google.com/drive/1EC7qmhZqirQdfV1lDn5wqabGlgE49Ghw#scrollTo=5CcNmaUVXekC&line=1&uniqifier=1 [colab-12-approx-pred]: https://colab.research.google.com/drive/1-kh0SiNucJrzUUnIOLSidcA2RO5J1rvY?usp=sharing [colab-13-approx-control]: https://colab.research.google.com/drive/1aTPzgxC2_4O1TStfmiEAArf4kxhEVoFU?usp=sharing [colab-14-policy-gradient]: https://colab.research.google.com/drive/1noa3mzdi4sLyBB9GCzsV9__5ikOwwSn4?usp=sharing [DataCamp]: https://www.datacamp.com/ [datacamp-signup]: https://www.datacamp.com/groups/shared_links/45955e75eff4dd8ef9e8c3e7cbbfaff9e28e393b38fc25ce24cb525fb2155732 [datacamp-r-intro]: https://learn.datacamp.com/courses/free-introduction-to-r [datacamp-r-rmarkdown]: https://campus.datacamp.com/courses/reporting-with-rmarkdown [datacamp-r-communicating]: https://learn.datacamp.com/courses/communicating-with-data-in-the-tidyverse [datacamp-r-communicating-chap3]: https://campus.datacamp.com/courses/communicating-with-data-in-the-tidyverse/introduction-to-rmarkdown [datacamp-r-communicating-chap4]: https://campus.datacamp.com/courses/communicating-with-data-in-the-tidyverse/customizing-your-rmarkdown-report [datacamp-r-intermediate]: https://learn.datacamp.com/courses/intermediate-r [datacamp-r-intermediate-chap1]: https://campus.datacamp.com/courses/intermediate-r/chapter-1-conditionals-and-control-flow [datacamp-r-intermediate-chap2]: https://campus.datacamp.com/courses/intermediate-r/chapter-2-loops [datacamp-r-intermediate-chap3]: https://campus.datacamp.com/courses/intermediate-r/chapter-3-functions [datacamp-r-intermediate-chap4]: https://campus.datacamp.com/courses/intermediate-r/chapter-4-the-apply-family [datacamp-r-functions]: https://learn.datacamp.com/courses/introduction-to-writing-functions-in-r [datacamp-r-tidyverse]: https://learn.datacamp.com/courses/introduction-to-the-tidyverse [datacamp-r-strings]: https://learn.datacamp.com/courses/string-manipulation-with-stringr-in-r [datacamp-r-dplyr]: https://learn.datacamp.com/courses/data-manipulation-with-dplyr [datacamp-r-dplyr-bakeoff]: https://learn.datacamp.com/courses/working-with-data-in-the-tidyverse [datacamp-r-ggplot2-intro]: https://learn.datacamp.com/courses/introduction-to-data-visualization-with-ggplot2 [datacamp-r-ggplot2-intermediate]: https://learn.datacamp.com/courses/intermediate-data-visualization-with-ggplot2 [dplyr-cran]: https://CRAN.R-project.org/package=dplyr [google-form]: https://forms.gle/s39GeDGV9AzAXUo18 [google-grupper]: https://docs.google.com/spreadsheets/d/1DHxthd5AQywAU4Crb3hM9rnog2GqGQYZ2o175SQgn_0/edit?usp=sharing [GitHub]: https://github.com/ [git-install]: https://git-scm.com/downloads [github-actions]: https://github.com/features/actions [github-pages]: https://pages.github.com/ [gh-rl-student]: https://github.com/bss-osca/rl-student [gh-rl]: https://github.com/bss-osca/rl [happy-git]: https://happygitwithr.com [hg-install-git]: https://happygitwithr.com/install-git.html [hg-why]: https://happygitwithr.com/big-picture.html#big-picture [hg-github-reg]: https://happygitwithr.com/github-acct.html#github-acct [hg-git-install]: https://happygitwithr.com/install-git.html#install-git [hg-exist-github-first]: https://happygitwithr.com/existing-github-first.html [hg-exist-github-last]: https://happygitwithr.com/existing-github-last.html [hg-credential-helper]: https://happygitwithr.com/credential-caching.html [hypothes.is]: https://web.hypothes.is/ [Jupyter]: https://jupyter.org/ [osca-programme]: https://masters.au.dk/operationsandsupplychainanalytics [Peergrade]: https://peergrade.io [peergrade-signup]: https://app.peergrade.io/join [point-and-click]: https://en.wikipedia.org/wiki/Point_and_click [pkg-bookdown]: https://bookdown.org/yihui/bookdown/ [pkg-openxlsx]: https://ycphs.github.io/openxlsx/index.html [pkg-ropensci-writexl]: https://docs.ropensci.org/writexl/ [pkg-jsonlite]: https://cran.r-project.org/web/packages/jsonlite/index.html [Python]: https://www.python.org/ [Positron]: https://positron.posit.co/ [PyCharm]: https://www.jetbrains.com/pycharm/ [VSCode]: https://code.visualstudio.com/ [R]: https://www.r-project.org [RStudio]: https://rstudio.com [rstudio-cloud]: https://rstudio.cloud/spaces/176810/join?access_code=LSGnG2EXTuzSyeYaNXJE77vP33DZUoeMbC0xhfCz [r-cloud-mod12]: https://rstudio.cloud/spaces/176810/project/2963819 [r-cloud-mod13]: https://rstudio.cloud/spaces/176810/project/3020139 [r-cloud-mod14]: https://rstudio.cloud/spaces/176810/project/3020322 [r-cloud-mod15]: https://rstudio.cloud/spaces/176810/project/3020509 [r-cloud-mod16]: https://rstudio.cloud/spaces/176810/project/3026754 [r-cloud-mod17]: https://rstudio.cloud/spaces/176810/project/3034015 [r-cloud-mod18]: https://rstudio.cloud/spaces/176810/project/3130795 [r-cloud-mod19]: https://rstudio.cloud/spaces/176810/project/3266132 [rstudio-download]: https://rstudio.com/products/rstudio/download/#download [rstudio-customizing]: https://support.rstudio.com/hc/en-us/articles/200549016-Customizing-RStudio [rstudio-key-shortcuts]: https://support.rstudio.com/hc/en-us/articles/200711853-Keyboard-Shortcuts [rstudio-workbench]: https://www.rstudio.com/wp-content/uploads/2014/04/rstudio-workbench.png [r-markdown]: https://rmarkdown.rstudio.com/ [ropensci-writexl]: https://docs.ropensci.org/writexl/ [r4ds-pipes]: https://r4ds.had.co.nz/pipes.html [r4ds-factors]: https://r4ds.had.co.nz/factors.html [r4ds-strings]: https://r4ds.had.co.nz/strings.html [r4ds-iteration]: https://r4ds.had.co.nz/iteration.html [stat-545]: https://stat545.com [stat-545-functions-part1]: https://stat545.com/functions-part1.html [stat-545-functions-part2]: https://stat545.com/functions-part2.html [stat-545-functions-part3]: https://stat545.com/functions-part3.html [slides-welcome]: https://bss-osca.github.io/rl/slides/00-rl_welcome.html [slides-m1-3]: https://bss-osca.github.io/rl/slides/01-welcome_r_part.html [slides-m4-5]: https://bss-osca.github.io/rl/slides/02-programming.html [slides-m6-8]: https://bss-osca.github.io/rl/slides/03-transform.html [slides-m9]: https://bss-osca.github.io/rl/slides/04-plot.html [slides-m83]: https://bss-osca.github.io/rl/slides/05-joins.html [sutton-notation]: https://bss-osca.github.io/rl/misc/sutton-notation.pdf [tidyverse-main-page]: https://www.tidyverse.org [tidyverse-packages]: https://www.tidyverse.org/packages/ [tidyverse-core]: https://www.tidyverse.org/packages/#core-tidyverse [tidyverse-ggplot2]: https://ggplot2.tidyverse.org/ [tidyverse-dplyr]: https://dplyr.tidyverse.org/ [tidyverse-tidyr]: https://tidyr.tidyverse.org/ [tidyverse-readr]: https://readr.tidyverse.org/ [tidyverse-purrr]: https://purrr.tidyverse.org/ [tidyverse-tibble]: https://tibble.tidyverse.org/ [tidyverse-stringr]: https://stringr.tidyverse.org/ [tidyverse-forcats]: https://forcats.tidyverse.org/ [tidyverse-readxl]: https://readxl.tidyverse.org [tidyverse-googlesheets4]: https://googlesheets4.tidyverse.org/index.html [tutorial-markdown]: https://commonmark.org/help/tutorial/ [tfa-course]: https://bss-osca.github.io/tfa/ [video-install]: https://vimeo.com/415501284 [video-rstudio-intro]: https://vimeo.com/416391353 [video-packages]: https://vimeo.com/416743698 [video-projects]: https://vimeo.com/319318233 [video-r-intro-p1]: https://www.youtube.com/watch?v=vGY5i_J2c-c [video-r-intro-p2]: https://www.youtube.com/watch?v=w8_XdYI3reU [video-r-intro-p3]: https://www.youtube.com/watch?v=NuY6jY4qE7I [video-subsetting]: https://www.youtube.com/watch?v=hWbgqzsQJF0&list=PLjTlxb-wKvXPqyY3FZDO8GqIaWuEDy-Od&index=10&t=0s [video-datatypes]: https://www.youtube.com/watch?v=5AQM-yUX9zg&list=PLjTlxb-wKvXPqyY3FZDO8GqIaWuEDy-Od&index=10 [video-control-structures]: https://www.youtube.com/watch?v=s_h9ruNwI_0 [video-conditional-loops]: https://www.youtube.com/watch?v=2evtsnPaoDg [video-functions]: https://www.youtube.com/watch?v=ffPeac3BigM [video-tibble-vs-df]: https://www.youtube.com/watch?v=EBk6PnvE1R4 [video-dplyr]: https://www.youtube.com/watch?v=aywFompr1F4 [wiki-snake-case]: https://en.wikipedia.org/wiki/Snake_case [wiki-camel-case]: https://en.wikipedia.org/wiki/Camel_case [wiki-interpreted]: https://en.wikipedia.org/wiki/Interpreted_language [wiki-literate-programming]: https://en.wikipedia.org/wiki/Literate_programming [wiki-csv]: https://en.wikipedia.org/wiki/Comma-separated_values [wiki-json]: https://en.wikipedia.org/wiki/JSON