Solution to "Shine"


If the Sun shines on day 1, then the day which is most likely to also see Sun is day 17, with a chance of approximately 8.451\%.

This is how I found the solution. As usual, there could be other ways to obtain the solution.

First of all, we want to understand the problem. According to the limerick, if the Sun shines on a given day, then it has a one percent chance to shine again on the next day. If it doesn't shine the second day, then on the third day it has a chance of two percent. This repeats until it shines, and it will most certainly shine at least by day 101 on which it would have a 100\% chance to shine even if it failed to shine the previous 99 days. Whenever it does decide to shine, then the pattern will of course start over. Therefore, on any given day (past day two) there may be multiple ways in which the Sun will shine.

For instance, day three can shine by two ways:

(1) If the Sun shines on day two (one percent chance) and again on day three (again a one percent chance given that it shone on day two). The chance of this is then (0.01)^2=0.01\%.

(2) If the Sun does not shine on day two (99\% chance) then there is now a 2\% chance to shine on day three. In other words, this scenario has a 0.99\cdot 0.02=1.98\% chance of happening.

Then the chance that the Sun will shine on day three is obtained by adding case (1) and case (2) which gives us the total chance of (0.01+1.98)\%=1.99\%.

Notice that day 1 as given by the setup will always be true. And looking at the scenario of it shining on day n, then day n will always be true. Then the total number of scenarios which will result in day n shining will be 2^{n-2}, which gets huge very quickly. We can see this will get very complicated, as the further from day one we go, the more "scenarios" there are which will result in the Sun shining on that day. For instance, on day 4 there are the following scenarios (where T means the Sun shined and F means the Sun didn't shine):

\text{Day 1}
\text{Day 2}
\text{Day 3}
\text{Day 4}
T
T
T
T
T
T
F
T
T
F
T
T
T
F
F
T

So, to avoid having to perform these numerous calculations, let's take a different approach. Let's pretend that we have some function f which tells us the exact probability f(n) of the Sun shining on day n.

We know that f(n) will depend in some way on f(n-1), f(n-2), etc., down to f(1) if n\le 101 or f(n-100) if n>101.

Now pretend we know the chance f(n-1) of it shining on day n-1. Then day n can shine via this particular scenario with a chance of

f(n-1)\times \underbrace{(1/100)}_{\text{chance to shine first day after shining}}.

Moving next to f(n-2), which is the chance of day n-2 being true, we need the scenario n-1 false, n true, which is

f(n-2)\times (99/100)\times (2/100).

We don't need to look at the case (n-2,n-1,n)=(T,T,T) because all scenarios with (n-1)=T were included in f(n-1). Therefore, for each (n-i) we are only interested in (n-i,n)=(T,T) while all in between take F. Thus, we obtain

f(n)=f(n-1)\left(\frac{1}{100}\right) + f(n-2) \left(\frac{99}{100}\right)\left(\frac{2}{100}\right) +

f(n-3)\left(\frac{99}{100}\right)\left(\frac{98}{100}\right)\left(\frac{3}{100}\right) +f(n-4)\left(\frac{99}{100}\right)\left(\frac{98}{100}\right)\left(\frac{97}{100}\right)\left(\frac{4}{100}\right)+\cdots,

terminating at f(1) if n\le 101 or f(n-100) if n>101. We can write f a little more succinctly as

f(n)=\sum_{i=k}^{n-1}{f(i)\left( \frac{99!}{(100-n+i)!}\right) \left( \frac{n-i}{100^{n-i}}\right)}

where k=1 if n\le 101 or k=n-100 if n>101.

Since we know f(1)=1 as defined in the problem, then each f(n) can be built up starting with f(2), then f(3), and so on as high as we would like. These calculations may be cumbersome by hand, but it is simple to write a loop structure in a program which can handle very large (and very small) numbers. I just implemented a couple lines in PARI/GP to do the calculations for me, using the following code (Note: "?" is the beginning of a command line in PARI/GP) :

? f=vector(5000);
? f[1]=1.0;
? for(n=2,101,f[n]=sum(i=1,n-1,f[i]*(99!/((100-n+i)!))*(n-i)/(100.^(n-i))));
? for(n=102,5000,f[n]=sum(i=n-100,n-1,f[i]*(99!/((100-n+i)!))*(n-i)/(100.^(n-i))));
? for(i=2,5000,write("shine_probs.txt",{i," ",f[i]}));

This computes f(n) for all n\le 5000 and outputs them to a file called "shine_probs.txt" in the same directory you are in while running PARI/GP. By examining this file, you can see the limiting behavior of the probabilities over time as the chance to shine approaches the expected value of \approx 1/12.2 \text{ days} (another set of calculations not asked for in the statement of the problem). But there is a clear maximum at day 17 with a chance to shine of \approx 8.451\%, as you can see in this snippet of the output file:

2 0.010000000000000000000000000000000000000
3 0.019900000000000000000000000000000000000
4 0.029503000000000000000000000000000000000
5 0.038623870000000000000000000000000000000
6 0.047098557100000000000000000000000000000
7 0.054791952138999999999999999999999999999
8 0.061603835291470000000000000000000000000
9 0.067472483459970699999999999999999999998
10 0.072375725623108842999999999999999999998
11 0.076329454074908884749999999999999999998
12 0.079383816112694389540299999999999999997
13 0.081617499375789496299642999999999999997
14 0.083130665539032827220429549999999999997
15 0.084037168251357251499712578699999999997
16 0.084456706395836517722907309162999999997
17 0.084507515173309360387921132513869999997
18 0.084300094688822103007229782362676299996
19 0.083932333825561971488875645354793178996
20 0.083486224781764267400384344044402259626
21 0.083026200066502305891420278215012420796
22 0.082598976626996366200920282448253594185
23 0.082234674962842870122315077776895294629
24 0.081948903291329620204862399957767369783
25 0.081745461136546946713671878881667012704
26 0.081619320859135601056771908582577814413
27 0.081559583015775679234911544521974318831
28 0.081552162539795666561298686044850671013
29 0.081582036816964185114928930607826380337
30 0.081634963354108899257474604059547122470
31 0.081698645042514095653415406927112399269

To check my analytical solution, I implemented a Monte Carlo experiment in C++ with the following code:

#include<iostream>
#include<fstream>
#include<ctime>
#include<cstdlib>

using namespace std;

int main()
{
  srand(time(NULL));
  ofstream outfile, randcheck_file;
  unsigned long long num_days=50;
  unsigned long long num_trials=10000000;
  unsigned long long* day_count;
  unsigned long long rand_check[100]={};
  double rand_num;
  double chance;
  unsigned long long status=5;
  outfile.open("shinecheck.txt");
  randcheck_file.open("shine_rand_check.txt");
  day_count = new unsigned long long [num_days];

  for(unsigned long long i=0; i<num_trials; i++)
    {
      if((double)i/(double)num_trials>=(double)status/100.0)
	{
	  cout<<status<<"%-"<<flush;
	  status=status+5;
	}
      chance=1.0;
      for(int j=0; j<num_days; j++)
	{
	  rand_num=(double)rand();
	  while(rand_num==(double)RAND_MAX)
	    rand_num=(double)rand();
	  rand_num=rand_num/(double)RAND_MAX;

	  //checking random number spread
	  for(int m=0; m<100; m++)
	    {
	      if(rand_num<(double)(m+1)/100.0)
		rand_check[m]++;
	    }
	  //done checking random number spread

	  if(rand_num < chance/100.0)
	    {
	      chance=1.0;
	      day_count[j]++;
	    }
	  else
	    {
	      chance=chance+1.0;;
	    }
	}
    }

  for(int k=0; k<num_days; k++)
    {
      outfile << (k+2) << " " << (double)day_count[k]/(double)num_trials << endl;
    }

  for(int i=0; i<100; i++)
    randcheck_file << i+1 << " " << (double)(rand_check[i]*100)/((double)(num_days*num_trials)) << endl;

  cout<<endl;
  outfile.close();
  randcheck_file.close();
  delete[] day_count;

  return 0;
}

This program basically recreates the events using random numbers and stores the results over 10 million trials, finally outputting each day's percent "true"s to a file "shinecheck.txt" which should be very similar to the analytical solution. The above code also includes a random number counter to ensure that the random numbers were spread evenly across the interval. The analytical and experimental solutions agree very closely, as you can see in the graph below.

So we have rather long-windedly arrived at our confident solution.