Big Data in Education (Part 2 of 2)

The following is Part 2 of 2 on data in education. The first article introduces the concept and application of data in education. The second article looks at recent movements by the Department of Education in data mining, modeling and learning systems.

Big data analytics are coming to public education. In 2012, the US Department of Education (DOE) was part of a host of agencies to share a $200 million initiative to begin applying big data analytics to their respective functions. The DOE targeted its $25 million share of the budget toward efforts to understand how students learn at an individualized level. This segment reviews the efforts enumerated in the draft paper released by the DOE on their big data analytics.

The ultimate goal of incorporating big data analytics in education is to improve student outcomes – as determined common metrics like end-of-grade testing, attendance, and dropout rates. Currently, the education sector’s application of big data analytics is to create “learning analytic systems” – here defined as a connected framework of data mining, modeling, and use-case applications.

The hope of these systems is to offer educators better, more accurate information on answer the “how” question in student learning. Is a student performing poor because she is distracted by her environment? Does a failing mark on the end-of-year test mean that the student did not fully grasp the year’s material, or was she having a off day? Learning analytics can help provide information to help educators answer some of these tough, real world questions.

Data Mining to Answer Questions

Educational data mining is a major part in the move toward big data learning analytics. Recent trends in education have allowed researchers to amass large volumes of unstructured data. Structured data has been collected for years in the education sector, typically in the form of grades or attendance records. New methods of interactive learning have led to more unstructured data through intelligent tutoring systems, simulations, and learning games. This allows for the collection of richer data sets than previously possible, creating new research opportunities into students’ learning environment.

Educational data has several unique characteristics. Summarized;

…[E]ducational data is … hierarchical. Data at the keystroke level, the answer level, the session level, the student level, the classroom level, the teacher level, and the school level are nested inside one another. (DOE: Learning Analytics, pg. 18, 2012)

Thus, when a student answers a single question, several variables are being simultaneously analyzed.

Time is also an important factor. Do large gaps between answering correct questions translate into better answers? Does a student spend too much time on the first parts of exams only to rush the latter parts?

The order, sequence, and context in which the questions are answered provide even greater amounts data researchers can use to uncover patterns in student learning. Students may preform better when asked a series of increasing difficult, but related questions rather than randomly selections of questions from a common pot. The move toward adaptive testing in the GRE (standardize testing for graduate school) shows a trend toward this effort.

Researchers can use all of this data to answer important questions about what makes the best learning environment for students. Understanding important questions academic questions can help educators create models about student learning efforts.

How the data is collected is important for its future usability. A challenge behind receiving the influx of data will be to standardize it on the front end so it can be usefully dissected. This does not mean converting unstructured to structured, but rather intuitive methods of categorizing incoming information similar to how YouTube has users categorize their videos during an upload. The DOE would need to be a standard-bearer for the organizing how this information is incorporated into databases for use modeling purposes.

User Knowledge and Behavior Modeling

Monitoring “how” a student tests has enabled researchers to model student behavior effectively. Beyond simply getting the correct answer, how a student works toward that goal can be just as important,

• How long has the student taken between questions?

• What previous kinds of questions have the student gotten correct/wrong?

•What kind of hints does the student benefit from most?

Monitoring these interactions can help create a behavior profile for individual students that can help educators understand the specific processes a student goes through in order to grasp the material.

Creating adaptive learning systems using these student behavior profiles can enhance the effect. Armed with the information of “how” a student learns, developers can then tailor future questions and hints designed to increase the retention and synthesis of information. Developers like DreamBox Learning and Knewton have created and released their versions of an adaptive learning system. Their software provides millions of ways students can work through the program based on how they complete their assignments.

Education Use-Cases

Educators and researchers have developed five major techniques for extracting value from educators’ big data.

• Prediction – for understanding the likelihood of expected events. For example, having the ability to know when a student intentionally misses a question despite actual ability.

• Clustering – Discovering data points that naturally go together. Useful for putting together students of similar academic ability.

• Relationship Mining – discovering relationships between variables and encoding them for later use. Useful for detecting if a student gets the correct answer reliability after seeking help.

• Distillation for human judgment – building visual models human parsing to aid in machine learning models.

• Discovery with models – meta-study using models developed using big data analytics.

Researchers believe these techniques will help educators more effectively guide students toward a more individualized learning process.

What is striking is how these education use-cases overlap with other common uses of big data analytic systems. For example, commercial banks may use clustering algorithms for profiles of purchases that will allow them to more readily detect fraud in a system. These uses provide a framework for the creation of useful learning analytic systems.

Learning Analytic Systems

The implementation of all of these leads to the creation of a learning analytics system – techniques hold the promise of improving the academic outcomes of students. While similar systems have been in place in the commercial sector of years, the education sector has many challenges ahead before it truly becomes a success story.

Acquiring the data presents its own sets of challenges. For college-age and mature students, data collection is not a major issue, however for school-age students, it does require collectors to jump over some hurdles to prevent potentially identifying individual students. Some hurdles are legal, while others are ethical. Regardless this does slow down the overall process of collection.

The number and skill of data collectors is also an issue. Website’s use of cookies for data gather is a common method companies can uniformly gather information. The DOE, however would have to rely on the thousands of school districts and networks of researchers to refine and certify data.

Even with its innate challenges, learning analytics represent a quantum leap in creating a customized learning environment for each student. Custom-fit learning curricula handed daily to each student, early detection systems designed to find the warning signs of potential disenrollment and dropouts, multi-year learning plans designed to challenge rather induce boredom. All made possible through the use of big data analytics.

Categorized by :
Business Analytics Apps Hadoop Ecosystem Other


Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.