The Self-Data Analysis Project (First Post)
If you had all of your data in one place— your purchase history, books you’ve read, food you’ve eaten, workouts schedules, music playlists— what could you learn about yourself? What questions can be answered only by combining your Amazon data with your YouTube watch history, and then combined with your sleeping history over the past 6 months?
What are the insights only you can learn about yourself, leveraging everything that’s been collected across digital products and apps you use?
Around 4 years ago, I had the idea to mine as much data about myself as possible. I wanted a sense of control over my own habits and personal information, largely as a response to my data being collected and used by any website or digital company I encounter. If I could have equal access to any data a browser/site is capable of collecting, along with the ability to interpret and analyze it— what would I learn about myself?
Over time, my thinking began to shift. Specifically, any data around things like browsing patterns, click behaviors, scroll pacing, etc. is a) anonymized by companies, meaning I can’t really know what is useful in the first place, b) likely analyzed with machine learning and AI to an extent I could never dream of approaching, and c) pretty boring. Instead, I’d like to see a dashboard of my data that really feels like me.
For further context, I also spent a year working on a data privacy team at NBCUniversal, where I helped maintain and establish user access request processes for the company. For the unfamiliar, the US recently had a landmark data privacy policy passed called the California Consumer Protection Act (CCPA), modeled after a similar policy in the EU known as GDPR. As a result, large companies were forced to establish processes where consumers could request a copy of any information collected about them, ask that it be deleted, or receive a summary of what may be collected about them. While this is only a legal requirement for businesses that operate in California, and they only technically need to comply with requests submitted by residents in California, businesses had to prepare processes to handle these requests. And, as a result, almost any large tech company will still fulfill your request even if you don’t live in California. Seeing how data privacy was managed at a large company (with a significant number of requests) was fascinating! It also reframed my approach to how I could self-analyze my personal data.
Now let’s get into the details.
Project Summary:
Create and maintain a database containing solely data about myself and use it to creatively understand and visualize my habits, interests, and online behaviors.
Topics I May Explore:
Basically, anything.
I anticipate approaching some topics with a question first, and then conducting analysis in search of the answer. What’s the dumbest purchase I’ve made this year? What types of gifts have I given people in the past, and how can I get better at it? Where should I travel next? What types of accounts do I follow on Instagram?
There’s also a lot of historical data at my disposal, where I may just play around and see what I can find. This data may include health, calorie intake/eating, sleep, entertainment, shopping, budgeting, etc. Mostly, this can be collected via CCPA requests.
For context, I’ve already downloaded numerous CCPA requests and started to sort through them. These include:
Spotify (music)
Letterboxd (movies)
Instagram (social media I use)
Facebook (social media I don’t use)
Amazon (a lot— it’s a bit too daunting to have sorted through yet)
Google/Youtube (also daunting)
Approach & Technologies:
I plan to collect data from 2 primary sources:
Data Exports provided by companies/websites I use (via CCPA requests)
Data I’ve collected myself via personal Excel docs
I will likely manually comb through all data to ensure it’s formatted in a way that can be joined with other sources (i.e. lowercasing, time zone conversions, etc.).
I’ve decided to use SQLite databases to store and manage data— however, because SQLite doesn’t allow for multiple schemas in the same database, I’ll likely have separate databases for each source of data. There’s a risk this solution won’t be scalable over time, but because I’ll be managing the data myself, I don’t anticipate working with a volume of data sources where this will become a major issue.
To create and edit database files I’ll be using DB Browser for SQLite. To query and conduct analysis, I’m currently leaning towards DataGrip (which I use at my current job, and I like the familiarity with the UI). The data itself will be stored on an external hard drive, which might just be for now. It would make sense to store and manage data using AWS or another cloud service, but it seems unnecessary and expensive at the moment. A lot of my technologies may change over time.
As for data visualizing… more to come. I have solid experience with Tableau but don’t currently have a license, so I plan to explore free/cheaper options. In a future post I’ll share some analysis I did around watching every Oscar nominee from 2019— but to create the charts and graphic I only used Microsoft PowerPoint. I’d like to take a more sophisticated approach moving forward, but it will take some time and exploration.
Skills I Hope to Learn:
Apart from the insight I hope to actually gain from undertaking this project, there are a lot of professional/personal reasons I want to get started.
Data Analysis. I want to better understand what types of skillsets are beneficial to analysts and get an opportunity for real-world practice. I’d also like to have a creative, fun approach to push this idea in entertaining and interesting directions.
Data Product Management. How can I establish and maintain a technology infrastructure that aids my analysis efforts? Spoiler alert— this is essentially my job at the moment, so any practice will have immediate benefits in my career.
SQL. I’ve taken basic SQL courses but am in need of more applications to practice and grow this skill.
Technical (and Very Non-Technical) Writing. After communicating in writing solely via email, text, or Slack for the past 3 years, I’d like to find my voice through writing again. This is partially my reason for blogging as the outlet for this project as opposed to video, etc. (though you never know!). I’m hoping to hone in my writing and practice communicating ideas clearly and thoughtfully.
New Technology. This will hopefully be my excuse to play with and explore emerging tech.
There will be much more to come, but hopefully this serves as a sufficient outline to set expectations for the project. I’m not sure what frequency these posts will appear with, or how quickly I’m able to make decent progress. I’ll start small, but at least it’s a start. I’m truly excited to get started and am grateful for you following along.