Welp, Pocket shuts down tomorrow despite our pleas for it to stay. While migrating1 all of my saved articles, I noticed that I’ve got almost 900 saved articles spanning nearly 7 years. That’s a goldmine of stuff-I-like data! Some quick analysis using xsv2:
哎呀,尽管我们恳求 Pocket 明天就关门了。在迁移 1 篇我保存的所有文章时,我注意到我保存了近 900 篇文章,时间跨度接近 7 年。那是一座类似 stuff-I 的数据的金矿!使用 xsv2 进行一些快速分析:
𝄢 unzip pocket.zip && xsv headers part_000000.csv
1 title
2 url
3 time_added
4 tags
5 status
𝄢 xsv sample 1 part_000000.csv | xsv flatten
title The Uncertain Future of American Libraries
url https://mek.fyi/posts/the-uncertain-future-of-american-libraries
time_added 1678243022
tags
status unread
𝄢 xsv count part_000000.csv
878
𝄢 xsv select time_added part_000000.csv |
xsv stats | xsv select min,max | xsv flatten |
while read header value; do
echo -n "$header "
date -I -d "@$value"
done
min 2018-07-12
max 2025-06-26
Experiment 实验
Recalling Simon Willison’s recent geoguessing challenge for o3, I considered, “What might o3 be able to tell me about myself, simply based on a list of URLs I’ve chosen to save?” I opened a temporary chat with o3 (doesn’t use ChatGPT’s memory feature) and prompted:
回想起 Simon Willison 最近为 o3 进行的 地理猜测挑战 ,我想道:“仅仅根据我选择保存的 URL 列表,o3 就能告诉我关于我自己的什么信息?我打开了一个与 o3 的临时聊天(不使用 ChatGPT 的记忆功能)并提示:
I’m going to share a list of articles that I’ve saved to Pocket, a read-it-later app. I want you to use that information to “profile” me. Tell me everything about myself that you can possibly infer from this data set. Here’s an example of possible dimensions of that profile, but feel free to include other things that you can infer: age, sex, geographic location, education level, industry/field of work, job level/seniority, income bracket, political orientation, risk tolerance, learning style, information diet, relationship status, parental status and kids’ ages, health concerns, major life transitions, seasonal patterns in interests, etc.
我将分享我已保存到 Pocket(一个稍后阅读应用程序)的文章列表。我希望你用这些信息来 “剖 化” 我。请告诉我您可以从此数据集中推断出的关于我自己的一切。以下是该档案的可能维度的示例,但请随意包含您可以推断的其他内容:年龄、性别、地理位置、教育水平、行业/工作领域、工作水平/资历、收入等级、政治取向、风险承受能力、学习方式、信息饮食、关系状况、父母状况和孩子的年龄、健康问题、主要生活转变、兴趣的季节性模式、 等。
Results 结果
o3’s response in full3:
O3 的回应完整 3:
Thought for 13 seconds
思考 13 秒
- The user has shared a broad range of article preferences, which could actually give me some good insight into their demographic and interests. Technical articles related to software, AI, cybersecurity, and finance suggest strong analytical and professional interests. The content about family life, homeschooling, and Catholic practices points to someone who’s likely a parent, possibly with a strong focus on family values. It makes sense to assume they might be male, based on the profiles aggregating to this, but I’m not completely sure yet. I’ll keep this in mind!
该用户分享了广泛的文章偏好,这实际上可以让我对他们的人口统计和兴趣有很好的了解。与软件、人工智能、网络安全和金融相关的技术文章表明了强烈的分析和专业兴趣。关于家庭生活、在家上学和天主教习俗的内容指向可能是父母的人,可能非常注重家庭价值观。根据汇总的个人资料,假设他们可能是男性是有道理的,但我还不完全确定。我会记住这一点!- I’m pulling together some details! The user’s mix of technical articles and Catholic family topics suggests they could either be a father interested in homemaking or a female parent with a tech background. Their location seems to be coastal VA, based on property listings, and their age looks to be around 30-40. With a career in software engineering, likely at a senior level, their income probably exceeds 150k. They seem to have a self-directed learning style and a moderate to high risk tolerance in investments. They’re likely married with multiple young kids.
我正在整理一些细节!该用户将技术文章和天主教家庭主题混合在一起,这表明他们可能是对家务感兴趣的父亲,也可能是具有技术背景的女性父母。根据房产列表,他们的位置似乎是弗吉尼亚州沿海地区,他们的年龄看起来在 30-40 岁左右。他们的职业是软件工程,可能是高级职位,他们的收入可能超过 150k。他们似乎具有自主学习风格和中度至高风险的投资承受能力。他们可能已婚,有多个年幼的孩子。Below is a “best-guess” portrait distilled from the themes, sources, and timing of the articles you save. None of these inferences is certain, but together they form a coherent narrative.
以下是根据您保存的文章的主题、来源和时间提取的 “最佳猜测” 肖像。这些推论都不确定,但它们共同构成了一个连贯的叙述。Core demographics 核心人口统计数据
Dimension 尺寸 Likely inference 可能的推断 Key signals 关键信号 Age Mid-30s to early-40s 30 多岁到 40 岁出头 Staff-level career content, FIRE math, parenting toddlers
员工级别的职业内容、FIRE 数学、育儿幼儿Sex Male 雄 Strong focus on engineering / hacking culture + “dad”-oriented parenting pieces
非常注重工程 / 黑客文化 + 以“爸爸”为导向的育儿作品Location 位置 Coastal VA 弗吉尼亚州沿海地区 Repeated local property links and local-news saves
重复的本地属性链接和 local-news 保存Education 教育 BS/MS in Computer Science or similar
计算机科学学士/硕士或类似专业Deep dive security/AI papers, Unix history, academic PDFs
深入探讨安全/AI 论文、Unix 历史、学术 PDFIndustry & Role 行业和角色 Senior/staff software engineer – specializing in security & infra
高级/高级软件工程师 - 专门从事安全和基础设施PortSwigger, fuzzing, exploit-dev, “Ways Staff Engineers Get Stuck”, Honeycomb career ladders
PortSwigger,模糊测试,exploit-dev,“员工工程师卡住的方式”,Honeycomb 职业阶梯Income bracket 收入等级 ~$150-220 k household ~$150-220 K 家庭 Salary-band posts, six-figure FIRE calcs, high-earner tax strategies
工资等级职位、六位数的 FIRE 计算、高收入者税收策略Relationship & family 关系与家庭 Married, 3-4 young kids (incl. at least one toddler)
已婚,3-4 个年幼的孩子(包括至少一名幼儿)Twin-mom sleep articles, Catholic “family table”, homeschooling & microschools pieces
双胞胎妈妈睡眠用品,天主教“家庭餐桌”,家庭教育和微型学校作品Mindset & preferences 心态和偏好
Dimension 尺寸 Inference 推理 Rationale 理由 Politics 政治 Fiscally conservative / civil-libertarian with traditionalist social leanings
财政保守/公民自由主义者,具有传统主义社会倾向Bogleheads & MMM frugality + Catholic/First Things pieces, EFF privacy, skepticism of Big Tech censorship
Bogleheads & MMM 的节俭+天主教/First Things 的作品,EFF 的隐私,对大科技审查制度的怀疑Risk tolerance 风险承受能力 Career: high (security research, side projects) Financial: moderate (indexing, 4 % rule, self-sufficiency)
职业: 高(安全研究、业余项目) 财务: 中等(指数化、4% 规则、自给自足)Mix of daring technical reading and ultra-conservative portfolio design
大胆的技术解读和极端保守的投资组合设计相结合Learning style 学习方式 Self-directed, text-heavy, long-form + auditory
自我导向、文本繁重、长篇 + 听觉Uses Pocket/RSS, collects PDFs, long podcasts & conference talks at 1×
使用 Pocket/RSS,收集 PDF,长篇播客和会议演讲,地址为 1×Information diet 信息饮食 60 % deep-tech, 20 % personal finance/FIRE, 10 % parenting/homemaking, 10 % faith & culture
60%的深度科技,20%的个人理财/FIRE,10%的育儿/家政,10%的信仰和文化Counts of saved links across themes
跨主题保存的链接计数Major life arc 主要人生轨迹 Phase shift: from pure IC hacker → technical leadership & family-centric life, aiming for FI
相变 :从纯粹的 IC 黑客→技术领导和以家庭为中心的生活,以 FI 为目标Saves on management paths, burnout, early-retirement reflections
节省管理路径、倦怠、提前退休反思Seasonal spikes 季节性峰值 Late Dec–Jan: career-planning & goal-setting; May–Aug: homeschooling, beach/sea-level, property; Sep–Oct: security-conference binge (DEF CON, Black Hat)
12 月下旬至 1 月:职业规划和目标设定;5 月至 8 月:在家上学、海滩/海平面、房产;9 月至 10 月:安全会议狂欢(DEF CON、Black Hat)Timestamp clusters in list
列表中的时间戳集群Subtler clues 更微妙的线索
- Faith: Regular saves from Catholic & conservative Christian sites suggest active practice.
信仰: 天主教和保守基督教网站的定期保存表明积极的实践。- Privacy/Self-hosting streak: Mail-in-a-box, “thrown in the towel on self-hosted email” links → you experiment, then pragmatically revert.
隐私/自托管连胜: Mail-in-a-box,“扔在自托管电子邮件上毛巾”链接→您进行试验,然后务实地还原。- Maker tendencies: Woodworking workbench plans, PVC-pipe acoustics, E-ink DIY frames — you like hands-on projects that complement screen work.
创客倾向: 木工工作台平面图、PVC 管道声学、E-ink DIY 框架 - 您喜欢与屏幕工作相辅相成的动手项目。- Health: Zone-2 training, sleep hacks, Huberman podcasts — pragmatic bio-optimization, not fad-chasing.
健康: Zone-2 训练、睡眠小窍门、Huberman 播客 — 务实的生物优化,而不是追逐潮流。- Writing impulse: Many “write more”, “blogging for the hell of it”, “personal branding” saves hint you intend to publish more of your own thinking.
写作冲动: 许多 “写得更多”, “写博客” “个人品牌” 保存了你打算发表更多自己想法的暗示。Quick “user manual” for you
为您提供快速的“用户手册”
- Motivators: Autonomy, mastery, frugality, family time, living your faith.
激励因素: 自主、精通、节俭、家庭时间、活出你的信仰。- Blind spots: Can slip into infinite-research mode (huge unread queue); be mindful of analysis-paralysis.
盲点: 可以进入无限研究模式(巨大的未读队列);注意分析瘫痪。- Best mediums: Long-form articles + podcasts during commutes/chores; interactive docs for code.
最佳媒介: 通勤/家务期间的长篇文章 + 播客;代码的交互式文档。- Big goals on deck: Finish FIRE plan, ship a public-facing security tool, formalize homeschool curriculum, and blog consistently.
大目标: 完成 FIRE 计划,发布面向公众的安全工具,正式确定家庭学校课程,并始终如一地撰写博客。Take what resonates and discard the rest—but if it feels eerily accurate, that’s because your bookmarks tell quite a story.
拿走能引起共鸣的东西并丢弃其余的——但如果它感觉非常准确,那是因为你的书签讲述了一个相当有趣的故事。
Reaction 反应
Yeah, pretty accurate. Perhaps even eerily so. Upon first reading, I was surprised when it nailed age range, location, family size, etc.—details that I didn’t expect to be represented in a data set that I thought mostly consisted of Hacker News posts. Shows you how much can be inferred by a simple trail of “likes.”
是的,相当准确。甚至可能非常诡异。第一次阅读时,我很惊讶它确定了年龄范围、位置、家庭规模等——我没想到这些细节会在我认为主要由 Hacker News 帖子组成的数据集中得到体现。显示简单的 “likes” 轨迹可以推断出多少。
Worth noting that o3 tended to perform better (i.e., the response felt more accurate) when I copypasted the CSV data directly into the prompt body. Sending the CSV as a file attachment caused o3 to get too fixated on using Python to sample and analyze the CSV data rather than simply “taking it all in,” and in my experience yielded a less compelling narrative at the end.
值得注意的是,当我将 CSV 数据直接复制粘贴到提示正文中时,o3 往往表现得更好(即,响应感觉更准确)。将 CSV 作为文件附件发送导致 o3 过于专注于使用 Python 对 CSV 数据进行采样和分析,而不是简单地 “全部吸收”,根据我的经验,最后产生的叙述不那么引人注目。
Implications 影响
Is there a lesson here? We already know that advertising companies profile us based on expressed interests, but up until recently it felt like only Google or Facebook had access to analysis capabilities strong enough to draw meaningful conclusions from disparate data points. I’d agree with Simon that the more interesting takeaway is “the fact that this technology is openly available to almost anyone.”
这里有教训吗?我们已经知道广告公司根据表达的兴趣来描述我们,但直到最近,感觉只有 Google 或 Facebook 才能获得足够强大的分析能力,从不同的数据点中得出有意义的结论。我同意 Simon 的观点,更有趣的收获是“这项技术几乎对任何人开放的事实”。
Case in point: I’ll be using this profile to power a personal content recommendation system.
举个例子:我将使用此配置文件来支持 个人内容推荐系统 。
-
I’ve moved over to Wallabag, and also took the opportunity to switch from Inoreader to FreshRSS. Happy with both. Self-hosting these services is much easier now in 2025 compared to my last self-hosting initiative years ago; Caddy has been a huge part of that. ↩︎
我已经搬到了 Wallabag,还借此机会从 Inoreader 切换到 FreshRSS。两者都满意。与我几年前的上一个自托管计划相比,到 2025 年自托管这些服务要容易得多; 球童在其中发挥了重要作用。 ↩ -
xsv has been hands-down my favorite way to quickly explore CSV data. Looks like it’s no longer maintained as of 2 months ago, but it feels pretty feature-complete. ↩︎
xsv 无疑是我最喜欢的快速浏览 CSV 数据的方式。看起来它从 2 个月前开始不再维护,但感觉功能相当完整 。︎ ↩ -
I mean, mostly intact. Did you really expect me to leave in the part about “dandruff remedies”? ↩︎
我的意思是,大部分完好无损。你真的希望我在 “头皮屑疗法 ”的部分留下吗? ↩︎