But, What should I Study ?¶

Two different kinds of people/problems:

What's the maximum GPA I can expect in the time remaining.
What's the minimum amount of study I need to get a certain GPA.

Equivalent problems with constraints and objective function exchanged.

TL;DR : We formulate the problem as Linear in constraints, with a non-differentiable objective function

Problem Formulation¶

The problem general overview of the problem would look like.
Maximize:

* Expected GPA

Subject to:

* Timing Constraints (exam schedules, maximum hours perday, etc)
* Course Credits
* Past Performance of Seniors ( grades recieved by our seniors )

Timing Constraints¶

let t = (t1, t2, t3 ...) be the times allocated
let d = (d1, d2, d3 ...) be the exam dates
corresponding to courses c = (c1, c2, c3 ..)

Then t creates a linearly constrained system

Why?¶

Assuming: one can only study one subject at a time
Assuming: d1 < d2 < d3 .. , true because we can sort the subjects to make it true
let Today's date be N $$ t_1 \le d_1-N $$ $$ t_2 \le d_2 - N - t_1 $$ $$ t_3 \le d_3 - N - t_1 - t_2 $$ $$ t_4 \le d_3 - N - t_1 - t_2 - t3$$

$$ \sum_{i=1}^{n} t_i \le d_n - N$$

or $$ Kt \le d - N$$

where K is an nxn matrix with $$K_{ij} = 1 : i \ge j: \forall i,j: else : 0$$

Expected GPA
depends on:
for every subject:
Credits allocated to the subject &
Grade point recieved in the subject
depends on: Grading Scheme & created using: Senior's data Percentile in the subject depends on: Expected Marks Distribution & created using: current marks distribution & Senior's data (HOW ?) Marks obtained depends on: Scored till now & Expected Marks depends on: Maximum possible marks & Efficiency created using: Senior's data & User's past performance (HOW ?) depends on Time allocated depends on: We get to select God time

Expected GPA¶

The expected GPA is the weighted average of the expected grade points of a student in his courses, weighted according to credits.

In [2]:

def Expected_GPA(user):
    ans = sum([c.credits * user.expected_grade_point(c) for c in user.courses])
    ans /= sum([c.credits for c in user.courses])
    return ans

Expected Grade Point¶

The expected grade point of a student, depends on the grading scheme of the course and the percentile of the student

In [3]:

class student:
    def expected_grade_point(self, course):
        return course.grading_policy(self.percentile(course))

Course Grading policy¶

This function is learnt using the past data of our seniors.
We measure the percentile of people who got a grade and make a function accordingly.

In [4]:

class course:
    def __learn_grading_policy__(self, course_data):
        self.grading_policy = learn_grading_policy(course_data)

In [5]:

def learn_grading_policy(course_data):
    grades = sorted((course_data))
    ranges = [0 for i in range(11)]
    for grade in grades:
        ranges[grade]=ranges[grade]+1
    for i in range(len(ranges)-1):
        ranges[i+1] = (ranges[i+1] +ranges[i])/len(course_data)

    def policy(x):
        for i in range(len(ranges)):
            if x < ranges[i]:
                pass
            else:
                return grade[i-1]
    return policy

Assumptions¶

We use only historical grade data, which we can access directly (hopefully) from the academic office, instead of gathering information on a per course basis
The grading policy of a course doesn't vary too dramatically on a per-year basis.
The grading policy is based on the assumption that there exists a mapping of percentile to corresponding grades

Pecentile in a course¶

Depends on

Expected Total Marks in the course
Expected Marks Distribution

Expected Total marks¶

Is "marks recieved till now" + "expected marks from upcoming exams"

Expected upcoming marks¶

We need some way to measure how well we can effect our marks according to the efforts we put in. eg.
In some subjects it's better to invest time because the preditablity is higher.

So how do we solve this problem ?

God time¶

Is the maximum amount of time a user thinks he needs to cover the syllabus completely.

An observation¶

Imagine a user is able to study for x% of his god time.
does that mean he'll score x% marks ? No

Let's model marks as $$ marks = f(x) + randomness $$ and randomness as $$ |randomness| = g(subject) $$

Now a person can't get more than 100% of the marks or less than 0% so any positive deviation is not useful at 100%

In [6]:

def plot(stds):
    
    def f(x):
        return x
    
    def g(r):
        d = np.random.normal(0,r, size=x.shape)
        return d
    
    def clip(A):
        A[A>1] = 1
        A[A<0] = 0
        return A
    
    def marks():
        return clip(x+g(r))
    
    def expected(f, n= 1000):
        return sum([f() for i in range(n)])/n
    
    
    plt.figure(figsize=(16,8))
    plt.ylabel('Fraction Marks')
    plt.xlabel('Fraction of time spent')
    
    x = f(np.linspace(0,1,100))
    
    for r in stds:
        y = expected(marks)
        plt.plot(x, y, label=f"randomness: {r}")
    plt.legend(loc='upper left')
    plt.show()

In [7]:

plot(stds=[0, 0.5, 1])

But how do you calculate randomness ?¶

Problem¶

We don't know what % of god time our seniors were able to give

Possible solution¶

Assume x = CGPA/10, for the first time.
We'll be able to collect x from the current year

Justification:¶

The CGPA is the avg over a lot of courses, so any per course variation should not have a massive effect on the CGPA

Expected marks distribution¶

Created using past data and the marks data of current year eg. marks people got this year uptil the mid-sem

Scaling¶

Imagine if we only have data uptil mid-semester, how do we get the final expected marks distribuition ? Many ways:

Assuming the distribution should have the same "shape"
- Naive scaling of marks interval and distributing uniformly
Assuming each mark comes from a fixed probability distribution
- probability learnt using marks uptil now
- So total marks distribution is just repeated draws from this distribution
Taking the HCF of the two ranges and assuming it comes from that probability distribution.

Not fixed yet, looking for possible better solutions.

How would we incorprate last years data in any of these models ?

Conclusion¶

We model the problem of maximizing CGPA as a quadratic programming

f: percentile -> grade ; is quadratic.
g: Marks -> Percentile ; is linear.
h: Time -> Marks ; is linear.

f,g, and h's parameters are learned by using the last years data.

Using off the shelf QP solvers we have a system that, with an increasing number of users makes better predicitions.

Thank You!¶

@arjunbazinga & @ameykpatel