Path Optimization
In collaboration with Lachlan Chu
Introduction
Natural processes can usually be commended for their subtle efficiencies - how essential are performed, say, as well or as quickly as possible. While there do exist bees whose transportation of pollen from flower to flower is slower than that of others, we cannot overlook how the most powerful systems in existence are not those of quantum computers but ones which have remained unedited for millions of years. Consider the process of photosynthesis: light enters a specific harvesting complex perfectly suited to take advantage of a quantum mechanical principle we discovered only (relatively) recently. Its energy travels through two photosystems, producing in that time all necessary ingredients for floral survival.
Upon investigating further the smaller, more subtle elements of nature, it becomes evident that fundamental systems always function at the extrema. If a ball is thrown directly upward, it will return to the ground - if unobstructed by any external force - along the same path. This is, indeed, the fastest route. A minimum.
Here, we will discuss paths and their optimizations. In order to effectively examine them, however, it is useful to attribute to each path - for instance, the parabolic path of a projectile under the influence of just gravity - a number which can determine its efficiency. Call this value the action, and let a path in space be viewed as a function \(\boldsymbol{q}\) of time \(\boldsymbol{t}\). Now consider this diagram, displaying some possible paths of a projectile:
Clearly, \(\boldsymbol{q}(\boldsymbol{t})\) seems the most reasonable - the most natural, perhaps. If we were to attribute to each of these an action value, how could we make sense of the resulting numbers? Of the process? First, we must define more intuitively what exactly the action represents. It's a value, of course, but what does it indicate? Here, we will consider the action to be a constant, measured in Joule-seconds, whose value inversely corresponds to the efficiency of a path or system; therefore, we should expect \(\boldsymbol{q}(\boldsymbol{t})\) to have the lowest action of all these options, given that it represents a minimum. In order to actually compute this minimum, given that we have a function in \(t\) as input, we will use the action functional, denoted \(S\). From this point on in the article, things may get a bit more technical, but do not lose track of the purpose: to investigate, in different instances, how nature chooses an optimal solution to its problems. Many students may be familiar with a kind example of such a process in finding the fastest route between points in a maze. Although the 10th grade biology slime mold did have to extend itself in a number of fruitless directions, the final product did indicate an intelligence able to preserve only the efficient strip of body.
Returning to the action functional, which accepts a function and returns a single value, it is not difficult to compute when presented with the study of a path. Here, we can get an accurate interpretation of the action value by analogizing it to the length of a given path. Thus, our functional should be of the form
\[ S[q(t)]=\int_q d s, \]
which indicates that we are integrating over a path \(q\) with respect to a distance metric \(d s\). Because a natural process will choose a minimum, maximum, or saddle point depending on the situation, it seems logical to commit to the following process: determine which type of extreme will be selected, then evaluate the action of all events or classes of events to compare them. There is, however, a way to get around this.
Derivation of the Euler-Lagrange Equations
First, we will rewrite our action functional. Making use of the Lagrangian, we can rewrite the integral expression to produce the correct units:
\[ S[q(t)]=\int_{t_1}^{t_2} L(t, q, \dot{q}) d t \]
where \(\boldsymbol{L}\) has units Joules, making its integral over time measured in Joule-seconds; here, \(\boldsymbol{t}_{\mathbf{1}}\) and \(\boldsymbol{t}_{\mathbf{2}}\) are the endpoints of the path \(q\). It is accurate to assume the Lagrangian to denote a small distance along the path by some metric. We wish to demonstrate that \(\boldsymbol{q}\) does, indeed, minimize \(\boldsymbol{S}[\boldsymbol{q}]\), so we will reasonably consider infinitesimal perturbations to \(\boldsymbol{q}\). We will define this variation as
\[ \delta q(t):=\epsilon p(t) \]
for some small \(\delta, \epsilon\) and some arbitrary function \(p(t)\). Since we want to take perturbations of \(q\), we must insist that the function \(p\) possess some of the following properties. First, it must coincide with the boundary points \(t_1\) and \(t_2\) of \(q\) such that \(p\left(t_1\right)=p\left(t_2\right)=0\). This makes sense, because otherwise \(p\) would not be a path that serves the same purpose of connecting \(t_1\) and \(t_2\). Secondly, it must be smooth for \(t_1 \[
\begin{aligned}
S[q+\epsilon p] & =\int_{t_1}^{t_2} L(t,(q+\epsilon p), \frac{d}{d t}(q+\epsilon p)) d t \\
& =\int_{t_1}^{t_2} L(t, q+\epsilon p, \dot{q}+\epsilon \dot{p}) d t .
\end{aligned}
\] Now, having also the action functional for the perturbation, we can try to disprove that this perturbation can be the actual desired path, regardless of our choice in \(\boldsymbol{\epsilon}\) or \(\boldsymbol{p}\). Say that this is the fastest path, with the lowest action. Then, because it is a local minimum, we must also have that its derivative is zero. Using this property, we will be able to do the brunt of the derivation work. Assuming \[
\frac{\partial S[q+\epsilon p]}{\partial \epsilon}=0,
\] we will perform a quick Taylor expansion, noting all of this is centered around our variable of differentiation, \(\epsilon\), being set to zero. Recalling that the general form of a Taylor series for a function \(f(x, y, z)\) of three variables is: \[
\begin{aligned}
f\left(x+\delta_1, y+\delta_2, z+\delta_3\right) \approx & f(x, y, z)+\delta_1 \frac{\partial f(x, y, z)}{\partial x}+\delta_2 \frac{\partial f(x, y, z)}{\partial y}+\delta_3 \frac{\partial f(x, y, z)}{\partial z} \\
& +\frac{1}{2 !} \delta_1^2 \frac{\partial^2 f(x, y, z)}{\partial x^2}+\frac{1}{2 !} \delta_2^2 \frac{\partial^2 f(x, y, z)}{\partial y^2}+\frac{1}{2 !} \delta_3^2 \frac{\partial^2 f(x, y, z)}{\partial z^2} \\
& +\delta_1 \delta_2 \frac{\partial^2 f(x, y, z)}{\partial x \partial y}+\delta_2 \delta_3 \frac{\partial^2 f(x, y, z)}{\partial y \partial z}+\delta_1 \delta_3 \frac{\partial^2 f(x, y, z)}{\partial x \partial z}+\cdots
\end{aligned}
\] and that, for our purposes, terms of higher order are negligible (as they are multiplied by \(\epsilon^2\) or smaller) and will not affect the accuracy of the equation, the expanded expression is not so difficult. Our functional becomes: \[
S[q+\epsilon p]=\int_{t_1}^{t_2} L(t, q, \dot{q})+\epsilon p \frac{\partial L}{\partial q}+\epsilon p \frac{\partial L}{\partial \dot{q}}+\mathcal{O}\left(\epsilon^2\right) d t .
\] Consider the middle two differential terms an expanded form of \[
\epsilon \frac{\partial L}{\partial \epsilon}=\epsilon\left(\frac{\partial L}{\partial t} \frac{d t}{d \epsilon}+\frac{\partial L}{\partial q} \frac{d(q+\epsilon p)}{d \epsilon}+\frac{\partial L}{\partial \dot{q}} \frac{d(\dot{q}+\epsilon \dot{p})}{d \epsilon}\right),
\] sometimes called the total derivative, which yields the same result as that in (1.3). Finally, we are able to return to (1.2): \[
\begin{aligned}
\frac{\partial S[q+\epsilon p]}{\partial \epsilon} & =\frac{\partial}{\partial \epsilon} \int_{t_1}^{t_2} L+\epsilon\left(p \frac{\partial L}{\partial q}+\dot{p} \frac{\partial L}{\partial \dot{q}}\right) d t \\
& =\int_{t_1}^{t_2} p \frac{\partial L}{\partial q}+\dot{p} \frac{\partial L}{\partial \dot{q}} d t
\end{aligned}
\] which must equal zero by our earlier assumption. The transition between lines was done by (the factorization in the first line perhaps indicates this as a next step) first taking the derivative \(\frac{\partial}{\partial \epsilon}\) inside the integral. Then, we could simply dissect the integrand for any function depending on \(\epsilon\), of which there was one: \(\epsilon\). Because \(L\) and the two derivative terms exist irrespective of \(\epsilon\), the derivative \(\frac{\partial}{\partial \epsilon}\) is easily computed. Finally, we will simplify (1.4) by expanding it through partial integration. To do this, we will consider only the second half term in the integrand, and imagine splitting the expression in (1.4) into two separate integrals with the same bounds, e.g. \[
\int_{t_1}^{t_2} p \frac{\partial L}{\partial q} d t+\int_{t_1}^{t_2} \dot{p} \frac{\partial L}{\partial \dot{q}} d t .
\] With partial integration on the latter, we will set \[
u=\frac{\partial L}{\partial \dot{q}}, \frac{d v}{d t}=\dot{p}
\] and \[
\frac{d u}{d t}=\frac{d}{d t} \frac{\partial L}{\partial \dot{q}}, v=p .
\] Note: I prefer to write derivatives in their full form instead of their more illogical split form, for example \(\frac{d v}{d t}=\dot{p}\) instead of \(d v=\dot{p} d t\). Now, noticing that the \(u v \mid \frac{t_2}{t_1}\) term goes to zero because \(p\left(t_1\right)-p\left(t_2\right)=0\), we have that \[
\int_{t_1}^{t_2} \dot{p} \frac{\partial L}{\partial \dot{q}} d t=-\int_{t_1}^{t_2} p \frac{d}{d t} \frac{\partial L}{\partial \dot{q}} d t
\] and therefore, after combining with the first half of the integral in (1.4), \[
\begin{aligned}
\frac{\partial S[q+\epsilon p]}{\partial \epsilon} & =\int_{t_1}^{t_2} p \frac{\partial L}{\partial q}+p \frac{\partial L}{\partial \dot{q}} d t \\
& =\int_{t_1}^{t_2} p \frac{\partial L}{\partial q}-p \frac{d}{d t} \frac{\partial L}{\partial \dot{q}} d t \\
& =\int_{t_1}^{t_2} p\left(\frac{\partial L}{\partial q}-\frac{d}{d t} \frac{\partial L}{\partial \dot{q}}\right) d t .
\end{aligned}
\] We have arrived at the end of the derivation. Because we stipulate that this perturbation - and the original path \(\boldsymbol{q}\), with \(\boldsymbol{p}=\mathbf{0}\), for that matter - is stationary, we have that the derivative is zero as expressed in (1.2). Thus, the expression in (1.5) must be zero; because \(p\) will be nonzero in all cases other than the trivial perturbation (the un-variation?), we have that \[
\frac{\partial L}{\partial q}-\frac{d}{d t}\left(\frac{\partial L}{\partial \dot{q}}\right)=0 .
\] The area underneath the null function is zero, and thus is the Euler-Lagrange equation.
The Significance of The Euler-Lagrange Equation
The solutions to this equation will result in the natural motions for the Lagrangian, and this can be applied across geometries with different metrics. Let us consider the upper half plane model of hyperbolic geometry, but first the Euclidean plane with which we are familiar. To show that the geodesics are straight lines, we will write the Lagrangian as the integrand of an arc length integral:
\[ \int_{t_1}^{t_2} \sqrt{1+\dot{q}^2} d t \]
Thus, we have
\[ L(t, q, \dot{q})=\sqrt{1+\dot{q}^2}, \]
which is irrespective of \(\boldsymbol{q}\); note that \(\dot{\boldsymbol{q}}\) is still a function of \(\boldsymbol{t}\). Because of this, we need only evaluate the latter term in the E-L equation because the first goes to zero. First finding the partial, we have:
\[ \frac{\partial L}{\partial \dot{q}}=\frac{\dot{q}}{\sqrt{1+\dot{q}^2}} \]
and then
\[ \frac{d}{d t}\left(\frac{\partial L}{\partial \dot{q}}\right)=\frac{\ddot{q}}{\left(1+\dot{q}^2\right)^{3 / 2}}=0 \]
Now, we need only to solve the differential equation. This is a second-order nonlinear ODE, which can be solved with substitution. Let \(\dot{q}=v\), such that \(\ddot{q}=\dot{v}\). Integration will now simplify things:
\[ \int \frac{\dot{v}}{\left(1+v^2\right)^{3 / 2}} d t=\int 0 d t \]
and
\[ \frac{v}{\sqrt{v^2+1}}=c_1 . \]
Now, isolating for \(v\) will give us
\[ v= \pm \frac{c_1}{\sqrt{1-c_1^2}}=\dot{q} . \]
This differential equation is now trivial to solve:
\[ \begin{aligned} q & = \pm \int \frac{c_1}{\sqrt{1-c_1^2}} d t \\ & = \pm \frac{c_1}{\sqrt{1-c_1^2}} t+c_2 . \end{aligned} \]
And this is the geodesic on the Euclidean plane. Collecting all constant terms and renaming them, we get that
\[ \boldsymbol{q}(\boldsymbol{t})=\boldsymbol{\alpha} \boldsymbol{t}+\boldsymbol{\beta}, \]
a straight line with slope \(\boldsymbol{\alpha}\) and \(\boldsymbol{q}\)-intercept \(\boldsymbol{\beta}\).
Now that we've completed the Euclidean case, let us move on to the hyperbolic upper half plane model. As known, the geodesics here are half-lines and half-circles perpendicular to the horizontal. Beginning with our distance metric, we evaluate a line integral on this model with the following integral:
\[ \int_{t_1}^{t_2} d s=\int_{t_1}^{t_2} \sqrt{\frac{q_1^2+q_2^2}{q_2^2}} d t . \]
Consider \(\left(q_1, q_2\right)=(x, y)\). We will define for simplicity:
\[ L(t, q, \dot{q})=\frac{q_1{ }^2+q_2{ }^2}{q_2^2} \text { in which } t \in \mathbb{R}, q \in D, \dot{q} \in \mathbb{R}^2 \]
for \(D=\left\{\left(q_1, q_2\right) \in \mathbb{R}^2: q_2>0\right\}\). This Lagrangian notation is analogous \(L\left(t, q_1, q_2, \dot{q}_1, \dot{q}_2\right)\). Firstly, we will note that we are presented with an autonomous differential equation when observing the E-L equations, as \(\frac{\partial L}{\partial t}=0\). Now, we can reduce the equation to
\[ \frac{d}{d t}\left(\frac{\partial L}{\partial \dot{q}}\right)=0, \]
indicating that \(\frac{\partial L}{\partial \dot{q}}\) is constant. In this case, the first integrals \(\boldsymbol{E}\) and \(\boldsymbol{p}\) are
\[ \begin{aligned} & E=\frac{\dot{q}_1{ }^2+\dot{q}_2{ }^2}{q_2^2}=c_1 \\ & p=\frac{\dot{q}_1}{q_2^2}=c_2, \end{aligned} \]
respectively (perhaps it would be more accurate to write \(\boldsymbol{p}_{\boldsymbol{q}_1}\), as we are observing a conserved momentum associated with \(\frac{\partial L}{\partial \dot{q}_1}\) as \(\frac{\partial L}{\partial \dot{q}_1}=0\) for our Lagrangian, but it is indicated by the derivative). They are called first integrals because we may decrease the order of the differential equation by one to achieve a constant, indicative of an object conserved by the flow of a dynamical system. Here, we have the first component of the momentum because of the Lagrangian's invariance under \(x\)-translations and the energy by the formula:
\[ E=\frac{\partial \mathcal{L}}{\partial \dot{q}} \cdot \dot{q}-\mathcal{L} \]
where \(\mathcal{L}\) is a time-independent Lagrangian (e.g. \(\mathcal{L}(\boldsymbol{q}, \dot{q})\) ). Assuming \(\boldsymbol{c}_2 \neq \mathbf{0}\), we may divide (2.3) by the square of (2.4) with the intent of simplifying the fractional expression in the former. We arrive, then, at:
\[ \frac{E}{p^2}=q_2^2 \frac{\dot{q}_1^2+\dot{q}_2^2}{\dot{q}_1^2}=\frac{c_1}{c_2^2} \]
which is another constant. Letting \(\frac{\epsilon_1}{e_2}=r^2\) for some arbitrary \(r\), we get after simplification:
\[ r^2=q_2^2\left(1+\left(\frac{d q_2}{d q_1}\right)^2\right), \]
a very familiar expression. Rewriting (2.5) such that \(L=L(t, x, y, \dot{x}, \dot{y})\), we have:
\[ r^2=y^2\left(1+y'^2\right) . \]
Isolating the differential,
\[ \frac{d y}{d x}=\frac{\sqrt{r^2-y^2}}{y} \]
we will divide both sides by the RHS and integrate with respect to \(x\) :
\[ \int \frac{y / y}{\sqrt{r^2-y^2}} d x=\int d x \]
which gives us
\[ \int \frac{y}{\sqrt{r^2-y^2}} d y=x+C . \]
This integral is trivial to solve and can be done with a simple substitution of \(u=r^2-y^2\). The resulting equality is
\[ -\sqrt{r^2-y^2}=x+C . \]
Squaring once and rearranging terms, we have finally solved for the semicircular geodesics of the upper half plane model of hyperbolic geometry:
\[ r^2=(x+C)^2+y^2 \]
or
\[ \left(x-x_0\right)^2+y^2=r^2, \]
defining a circle whose center is on the horizontal axis. Isolating for \(y\) and taking the square root will yield the upper half-circle.