the process

Methodology

Visualizations

Deliverables

the process

Methodology

Visualizations

Deliverables

Berkeley Unified School District

Exploring where Berkeley's public schools fall short in their performance

Timeline

5 months

Project Type

Research

Role

Data Analyst

Tools

Deepnote, Google Suite

overview

I transformed 5+ years of Berkeley Unified School District’s raw data into clear visual stories about student outcomes.

Through Berkeley’s Data Science Discovery Program, I worked on a local project where I coded Python visualizations and designed the final deliverables—a poster and webpage.

This project not only strengthened my coding confidence but also deepened my passion for using data to drive positive social impact.

results

Overall, there is no significant decrease in the achievement gap since 2014.

Although our data doesn't follow the exact same cohort of students throughout, and we're missing some data due to COVID, we see a similar trend of achievement gaps never falling below 20%.

21%

was the lowest achievement gap seen across socioeconomically disadvantaged and non-socioeconomically disadvantaged students.

57.1%

was the highest achievement gap seen across socioeconomically disadvantaged and non-socioeconomically disadvantaged students.

33.7% & 35.8%

were the final achievement gaps collected for the same class of students from 4th grade to 11th grade testing (for their English & Math results).

Interestingly, several achievement gaps between certain subgroups were actually decreasing leading up to 2019. Unfortunately, in 2023, those achievement gaps have since increased once more.

deliverables & results

I uncovered achievement gaps up to 50%, and my findings were presented to the committee responsible of the following year's district education plans.

I developed 5+ static visualizations, showcased through a webpage and poster. The poster was featured at DS Discovery’s Fall 2023 Symposium, and I presented the webpage and visualizations to the district planning committee.

5+ static visualizations

But how did I get here?

background

Berkeley Unified School District (BUSD) strives for academic excellence but continues to grapple with longstanding achievement gaps.

BUSD is the local public school system in Berkeley. It has a history of being one of the highest achieving schools in Alameda County and the state of California, however, for decades there has also been an underlying story of inequity. Despite significant investments in educational programs, the achievement gaps persist, indicating that current strategies may not be yielding the desired results.

A key effort to address this is the Local Control Accountability Plan (LCAP), where the Parent Advisory Committee (PAC) reviews student performance data and provides feedback to the district. However, interpreting this data remains a challenge, making it difficult to drive meaningful improvements.

our challenge

How can we effectively interpret the data to understand its real-world implications for schools, and then clearly present these insights to the PAC?

This project aimed to use evidence-based storytelling to make student performance data more accessible and actionable for stakeholders through 3 key goals.

Analyze data from the California Assessment of Student Performance and Progress (CAASPP) - Smarter Balanced Assessment (SBA) to identify patterns in achievement gaps.

Create visualizations to convey insights related to student performance in English Language Arts (ELA) and Mathematics, focusing on socioeconomic disparities.

Develop a reproducible workflow for ongoing analysis and future iterations.​​

students of interest

Rather than focus on race or ethnicity, we wanted to look at an intersectional population through an economic lens.

Since the data was collected by each school and could not be reduced to each student, we followed five cohorts.

1

Fourth grade cohort each year

2

Eighth grade cohort each year

3

Psuedo-cohort* each year

4

All 11 elementary schools

5

All 3 middle schools

*We would not be able to track the same group of students as they advanced to each new grade, so we followed the one BUSD class of students starting as 3rd graders in 2015 to 11th graders in 2023.

From here, we considered whether each group fit into 1 of 2 subgroups:

Socioeconomically Disadvantaged (SED) students

The SED group includes students who meet one of two criteria:

  • Are eligible for free or reduced-price school meals.

  • Neither parent graduated from high school.

Non-Socioeconomically Disadvantaged (Non-SED) students

This group includes students who meet both criteria:

  • Have parents who have a high school diploma.

  • Are not eligible for free or reduced-price meals.

the data

I collected, explored, and cleaned 7 years of CAASPP data specific to BUSD.

All data was collected straight from CAASPP's website. I also dug through their website and spoke to a representative to confirm code terminlogy.

I turned lines of text into clean, easy-to-read datasets.

After cleaning the data, I began creating visualizations through Python libraries (Matplotlib, Seaborne) to represent achievement gaps for each cohort.

Visualizations

Utilizing Python's visualization libraries (Seaborne, Matplotlib), I visualized the percentage of students meeting or exceeding state standards.

Note: SED = Socio-economically disadvantaged students; Non-SED = Socio-economically disadvantaged students

4th graders each year (2015 - 2023)

Amongst fourth graders, there is no significant decrease in the achievement gap in the past 8 years. 

  • The highest achievement gaps in ELA and Math topics were both recorded in 2015: 52% (ELA) and 51% (Math).

  • The achievement gap had already been decreasing before 2020, but its increase in 2022-2023 suggests that new BUSD programs, ie. 2020 Vision have not provided a long-term, tangible benefit.

    • However, because state testing was not done in 2020-2021, it is also difficult to confirm this just looking at CAASPP data alone.

8th graders each year (2015 - 2023)

Amongst eighth graders, there is ultimately no significant decrease in the achievement gap as students leave the eighth grade and enter middle school.

  • The highest achievement gaps in ELA and Math topics were both recorded in 2022: 46.12% (ELA) and 43.8% (Math).

    • This could be explained by students returning to schools and this exam after Covid-19 procedures.

  • While at least 70% of non-SED students consistently meet or exceed state standards, SED students doing the same remains below 50%.

Pseudo-cohort each year (followed the same, one class from 2015 - 2022)

Achievement gaps show a significant decrease as students progress to each next level, but both SED and non-SED students fall in terms of progress.

  • This indicates that SED students are able to catch up their non-SED counterparts as they each progress in school.

  • Unfortunately, both sets of students were affected by the Covid-19 crisis, indicated by a sharp decline in both ELA and Math results in 11th grade (2022).

Since newer programs have only recently been introduced (beyond 2018/6th grade) it's difficult to say whether they were contributing to the decreasing achievement gap.

All 11 elementary schools (2022 - 2023)

Within 2022 to 2023, state test results did not indicate a significant decrease achievement gap.

  • Non-SED students meeting or exceeding the state standard averaged 78.6% (ELA) and 76.4% (Math), while SED students results averaged at 35.2% (ELA) and 33.7% (Math).

  • Berkeley Arts Magnet at Whittier and Rosa Parks continued having 2 of the lowest achievement gaps, while Washington and Cragmont remained having 2 of the higher achievement gaps.

All 3 middle schools (2022 - 2023)

Within 2022 to 2023, state test results did not indicate a significant decrease achievement gap, with students performing in a similar range as the elementary schools.

  • Non-SED students meeting or exceeding the state standard averaged 78.6% (ELA) and 65.1% (Math), while SED students results averaged at 35.2% (ELA) and 28.4% (Math).

Final Deliverables

The final findings of this project were shared with peers and district stakeholders through an interactive webpage and a detailed poster presentation.

Wishes

Getting district specific data to enhance our understanding

Unfortunately, challenges in obtaining district-wide data limited our ability to gain a more comprehensive understanding of student performance. Our initial goal was to compare state assessment results with district-wide exams to better evaluate the effectiveness of educational programs. Additionally, this limitation prevented us from accounting for differences in student representation across schools, making it difficult to fully assess variations in achievement gaps.

Final Mentions

Special thanks to my team and the UC Berkeley Discovery Program!

Each week, I had the opportunity to discuss the data and new findings with a PAC member leading the project, a UC Berkeley professor, and a fellow Data Science major who assisted with final statistics and interactive visualizations. This project was also made possible through UC Berkeley’s Discovery Program, which connects students with data science research opportunities.