But, What should I Study ?

Two different kinds of people/problems:

  1. What's the maximum GPA I can expect in the time remaining.
  2. What's the minimum amount of study I need to get a certain GPA.

Equivalent problems with constraints and objective function exchanged.

TL;DR : We formulate the problem as Linear in constraints, with a non-differentiable objective function

Problem Formulation

The problem general overview of the problem would look like.
Maximize:

* Expected GPA  

Subject to:

* Timing Constraints (exam schedules, maximum hours perday, etc)
* Course Credits
* Past Performance of Seniors ( grades recieved by our seniors )

Timing Constraints

let t = (t1, t2, t3 ...) be the times allocated
let d = (d1, d2, d3 ...) be the exam dates
corresponding to courses c = (c1, c2, c3 ..)

Then t creates a linearly constrained system

Why?

Assuming: one can only study one subject at a time
Assuming: d1 < d2 < d3 .. , true because we can sort the subjects to make it true
let Today's date be N $$ t_1 \le d_1-N $$ $$ t_2 \le d_2 - N - t_1 $$ $$ t_3 \le d_3 - N - t_1 - t_2 $$ $$ t_4 \le d_3 - N - t_1 - t_2 - t3$$

$$ \sum_{i=1}^{n} t_i \le d_n - N$$

or $$ Kt \le d - N$$

where K is an nxn matrix with $$K_{ij} = 1 : i \ge j: \forall i,j: else : 0$$

Expected GPA

The expected GPA is the weighted average of the expected grade points of a student in his courses, weighted according to credits.

In [2]:
def Expected_GPA(user):
    ans = sum([c.credits * user.expected_grade_point(c) for c in user.courses])
    ans /= sum([c.credits for c in user.courses])
    return ans

Expected Grade Point

The expected grade point of a student, depends on the grading scheme of the course and the percentile of the student

In [3]:
class student:
    def expected_grade_point(self, course):
        return course.grading_policy(self.percentile(course))

Course Grading policy

This function is learnt using the past data of our seniors.
We measure the percentile of people who got a grade and make a function accordingly.

In [4]:
class course:
    def __learn_grading_policy__(self, course_data):
        self.grading_policy = learn_grading_policy(course_data)
In [5]:
def learn_grading_policy(course_data):
    grades = sorted((course_data))
    ranges = [0 for i in range(11)]
    for grade in grades:
        ranges[grade]=ranges[grade]+1
    for i in range(len(ranges)-1):
        ranges[i+1] = (ranges[i+1] +ranges[i])/len(course_data)

    def policy(x):
        for i in range(len(ranges)):
            if x < ranges[i]:
                pass
            else:
                return grade[i-1]
    return policy
    

Assumptions

  • We use only historical grade data, which we can access directly (hopefully) from the academic office, instead of gathering information on a per course basis
  • The grading policy of a course doesn't vary too dramatically on a per-year basis.
  • The grading policy is based on the assumption that there exists a mapping of percentile to corresponding grades

Pecentile in a course

Depends on

  • Expected Total Marks in the course
  • Expected Marks Distribution

Expected Total marks

Is "marks recieved till now" + "expected marks from upcoming exams"

Expected upcoming marks

We need some way to measure how well we can effect our marks according to the efforts we put in. eg.
In some subjects it's better to invest time because the preditablity is higher.

So how do we solve this problem ?

God time

Is the maximum amount of time a user thinks he needs to cover the syllabus completely.

An observation

Imagine a user is able to study for x% of his god time.
does that mean he'll score x% marks ? No

Let's model marks as $$ marks = f(x) + randomness $$ and randomness as $$ |randomness| = g(subject) $$

Now a person can't get more than 100% of the marks or less than 0% so any positive deviation is not useful at 100%

In [6]:
def plot(stds):
    
    def f(x):
        return x
    
    def g(r):
        d = np.random.normal(0,r, size=x.shape)
        return d
    
    def clip(A):
        A[A>1] = 1
        A[A<0] = 0
        return A
    
    def marks():
        return clip(x+g(r))
    
    def expected(f, n= 1000):
        return sum([f() for i in range(n)])/n
    
    
    plt.figure(figsize=(16,8))
    plt.ylabel('Fraction Marks')
    plt.xlabel('Fraction of time spent')
    
    x = f(np.linspace(0,1,100))
    
    for r in stds:
        y = expected(marks)
        plt.plot(x, y, label=f"randomness: {r}")
    plt.legend(loc='upper left')
    plt.show()
In [7]:
plot(stds=[0, 0.5, 1])

But how do you calculate randomness ?

Problem

  • We don't know what % of god time our seniors were able to give

Possible solution

  • Assume x = CGPA/10, for the first time.
  • We'll be able to collect x from the current year

Justification:

  • The CGPA is the avg over a lot of courses, so any per course variation should not have a massive effect on the CGPA

Expected marks distribution

Created using past data and the marks data of current year eg. marks people got this year uptil the mid-sem

Scaling

Imagine if we only have data uptil mid-semester, how do we get the final expected marks distribuition ? Many ways:

  1. Assuming the distribution should have the same "shape"

    • Naive scaling of marks interval and distributing uniformly
  2. Assuming each mark comes from a fixed probability distribution

    • probability learnt using marks uptil now
    • So total marks distribution is just repeated draws from this distribution
  3. Taking the HCF of the two ranges and assuming it comes from that probability distribution.

Not fixed yet, looking for possible better solutions.

  • How would we incorprate last years data in any of these models ?

Conclusion

We model the problem of maximizing CGPA as a quadratic programming

  1. f: percentile -> grade ; is quadratic.
  2. g: Marks -> Percentile ; is linear.
  3. h: Time -> Marks ; is linear.

f,g, and h's parameters are learned by using the last years data.

Using off the shelf QP solvers we have a system that, with an increasing number of users makes better predicitions.

Thank You!

@arjunbazinga & @ameykpatel