从pinbox到notion再到notion api

两年前，为了便于浏览各个网站的收藏文章，于是把它们都整合到pinbox这个软件上，然而整合完自己都忘了。直到最近又有收藏文章的需求，打开它一瞧，除了多了个收费外基本没什么变化，后来发现用邀请码可以创建多级收藏夹，心安理得的继续白嫖，没过多久，估计白嫖多级收藏夹的事被知道了，明明没有达到收藏上限，却再也无法创建新收藏了，虽然一年会员费也不贵，但pinbox略带简陋的界面以及两年来几乎没变化的功能，请容许我拒绝

开始了寻找替代pinbox软件，起初想自部署笔记软件，但功能太少了，自己又没有服务器，想想还是算了，后来找到了notion，虽然是笔记软件，但完美的契合了要求，notion介绍功能网上很多，懒得说了，感觉这款软件最大的亮点在于白嫖模块化

notion api

事情到这里，一般就结束了，无非是换了个软件罢了，但某天躺平在床上刷着豆瓣，偶然发现了notion原来有api的，垂死病中惊坐起，在用过notion功能后，一直想把花瓣网的图片和网易云的歌单导入进去，在看到有api后，开启了折腾之旅

notion api可以结合python使用，python以前从没写过，后来看了下有点像node.js爬虫，仰仗贫瘠的js知识与捉急的智商，抄袭借鉴Notion → 支付宝&微信 → 账单里的代码，头发少了几根后，恼恨愉悦开启爬虫之旅

然而一开始就有问题了，写代码常有的事，输入pip install requests解决

接着又发现notion api怪得很，用图片链接一定要求有后缀，而花瓣网图片恰恰是链接显示的，还能不能愉快地玩耍

后来思考notion支持导入markdown文件，那先把图片链接保存为md文件，在导入到notion中，再根据api更新里面链接，测试可行后，就先开始爬虫花瓣网

花瓣网

花瓣网虽然有tag功能，但却没有排除关键字搜索，这样找起图片来诸多不易，

1.图片链接与信息汇总

简单来说就是为每一个图片链接保存为md文件，再把图片中的tag，花瓣链接，源地址保存到汇总.csv中

代码

import os
import requests
import re

headers={
    'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36',
        'X-Request': 'JSON',
    "cookie":"cookie"
}
req = requests.get(url = "画板链接",headers=headers)
htmlPage = req.content

        
for p in range(1,25):
  prog = re.compile(r'"pins".*')
  appPins = prog.findall(htmlPage.decode("utf-8"))

  null = None       
  true = True
  result = eval(appPins[0][7:-2])
  images = []
  
  for i in result:
     
    info = "![]("+"https://hbimg.huabanimg.com/" + str(i["file"]["key"])+"_fw236/format/webp)"+"\n"+"![]("+"https://hbimg.huabanimg.com/" + str(i["file"]["key"])+")"
    
    with open("E:/office/py/爬虫/"+ str(i['pin_id']) +".md", "a",encoding="utf-8", newline="")as f:
        f.write(str(info)+"\n")
    tagnull= str(i["tags"])
    if tagnull.count(",")>4:
      tagnull= tagnull[0:24]
    while tagnull.count(",") < 4:
     tagnull=  tagnull+",null"
    with open("E:/office/py/爬虫/花瓣汇总.csv", "a",encoding="utf-8", newline="")as fo:
         linknull=str(i["link"])
         if linknull== "":
          linknull="None"
         fo.write(str(i['pin_id']) +","+ tagnull.replace("[]", "null").replace("[", "").replace("]", "").replace("'", "")+","+ "https://huaban.com/pins/"+ str(i['pin_id'])+","+linknull+"\n")
    images.append(i['pin_id'])
    
    htmlPage = requests.get(url = "加载链接前缀" + str(images[-1]) + "&limit=20&wfl=1",headers=headers).content

值得注意的是瀑布流加载链接的地址

写完，接下来是

如无意外文件如下

2.信息处理

将md导入到notion，需要获取每张图的page-id，这也是notion api奇怪的原因，无法从api获取到所有page-id，也不用爬虫，直接写点js了事

 function notion(){
    a=0
  while (a <206){
   var links= document.getElementsByClassName('notion-selectable notion-page-block notion-collection-item')[a].firstElementChild.href.slice(41)
    a+=1
    console.log(links) 
  }
  
}

将page-id 与汇总进行匹配

代码

import csv




filepath = "E:/office/py/date/花瓣汇总.csv"
filepath2 = "E:/office/py/date/notionlink.csv"

names1=[]
tags=[]
names2=[]
links=[]
fo = open("E:/office/py/date/test.txt", "w",encoding="utf-8")

with open(filepath, "r", encoding="utf-8", newline="") as f:
    csvreader = csv.reader(f)
    for row1 in csvreader:
     names1.append(row1[0])
     tags.append(row1[1] + ","+row1[2]+ ","+row1[3]+ ","+row1[4]+ ","+row1[5]+ ","+row1[6]+ ","+row1[7])
with open(filepath2, "r", encoding="utf-8", newline="") as f:
    csvreader1 = csv.reader(f)
    for row2 in csvreader1:
      names2.append(row2[0])
      links.append(row2[1])

a=0
while a < 数目:
    pageid= names2.index(names1[a])
    seq=[links[pageid],tags[a]]
    fo.write(str(seq).replace("[", "").replace("]", "").replace("'", "")+"\n")
    a+=1

匹配过后

3.更新页面

绕了半天，终于能根据路径.csv直接更新页面

代码

import requests
import csv


filepath2 = "E:/office/py/date/路径.csv"
pageid=[]
tag1=[]
tag2=[]
tag3=[]
tag4=[]
tag5=[]
links=[]
link2=[]
a=0
with open(filepath2, "r", encoding="utf-8", newline="") as f:
    csvreader1 = csv.reader(f)
    for row2 in csvreader1:
      tag1.append(row2[1])
      tag2.append(row2[2])
      tag3.append(row2[3])
      tag4.append(row2[4])
      tag5.append(row2[5])
      pageid.append(row2[0])
      links.append(row2[6])
      link2.append(row2[7])
class notionDemo():
    
    def add_bill(a):
        body = {
     
        "properties": {
           
            "Tags": {"multi_select": [{
                    "name":tag1[a],
                },
                {
                    "name": tag2[a],
                    },
                {
                    "name": tag3[a],
                    },
                {
                    "name": tag4[a],
                    },
                {
                    "name": tag5[a],
                    },
               ]},
            "link": {
			     "url": links[a],
		        },  
            "源站": {
			     "url": link2[a],
		        },  
             },
       
    }
    
    
      
        
        r = requests.request(
            "Patch",
            "https://api.notion.com/v1/pages/"+pageid[a],
            json=body,
            headers={"Authorization": "Bearer " + "自己token", "Notion-Version": "2021-05-13"},
            )
        print(r.text)

a = 0
while a < 数目:
    notionDemo.add_bill(a)
    a+=1

写完后，还是熟悉的

网易云音乐

网抑云歌单不支持专辑封面浏览就算了，为每首歌添加标签也不支持，对我这种歌单基本是英语和日语，找起歌来只能一首一首听

1.信息汇总

封面图片链接有后缀，这样直接就能添加了，不过网易云官方api反爬虫严重，还是自部署一个，github地址，

代码

import os
from sys import argv
import requests
import re
import json

headers={
    'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36',
        'X-Request': 'JSON',
    "cookie":"自己的cookies"
}

req = requests.get(url = "https://自部署api/playlist/track/all?id=歌单链接",headers=headers).content.decode('utf-8')


response_dict=json.loads(req)

song=response_dict["songs"]

for i in song:

 with open("E:/office/py/爬虫/网易云/我喜欢的音乐.csv", "a",encoding="utf-8", newline="")as fo:
   # 这里面有个中括号不知道怎么对付，应该可以写成i["ar"][0]["name"]
   ar=str(i['ar']).replace("[{'id': ","").replace("'name':","").replace("''':","").replace(" 'tns': [], 'alias': []}]","").replace("'","").replace(" ","")
   
   fo.write(str(i["name"]).replace(",","-")+","+"https://music.163.com/#/song?id="+str(i["id"])+","+str(i['al']['name']).replace(",","-")+","+"https://music.163.com/#/album?id="+str(i['al']['id'])+","+str(i['al']['picUrl'])+","+ar+"\n")

接下来又是熟悉的

获取到要添加内容

2.添加到notion中

值得注意的是要先创建模板，比如要添加专辑这一内容，那先要再notion创建专辑

代码

from os import path
import requests
import csv
import arrow

class notionDemo():
    def add_bill(a,b,c,d,e,f,g):
        body = {
        "parent": {"database_id": "自己的database"},
            "properties": {
            
            "Name": {"title": [{"type": "text", "text": {"content": b}}]},
            "专辑": {"type": "rich_text",	
                    "rich_text": [{
		                "type": "text",
				        "text": {
					      "content": c,
				       	"link": {"url": d}
				         },}]},
            "name": {"type": "rich_text",	
                    "rich_text": [{
		                "type": "text",
				        "text": {
					      "content": a,
				       	"link": {"url": b}
				         },}]},
            "歌手": {"type": "rich_text",	
                    "rich_text": [{
		                "type": "text",
				        "text": {
					      "content": g,
				       	"link": {"url": "https://music.163.com/#/artist?id="+f}
				         },}]},
            
            "封面": {
			
			"type": "files",
			"files": [{
				"name": e,
				"type": "external",
				"external": {
					"url": e
				}
			}]
		},
        },
        
    }
        r = requests.request(
            "POST",
            "https://api.notion.com/v1/pages",
            json=body,
            headers={"Authorization": "Bearer " + "自己token", "Notion-Version": "2021-05-13"},
            )
        print(r.text)

   

   
filepath="E:/office/py/爬虫/网易云/我喜欢的音乐.csv"

with open(filepath, "r", encoding="utf-8", newline="") as f:
    csvreader1 = csv.reader(f)
    for row2 in csvreader1:
        a=row2[0]
        b=row2[1]
        c=row2[2]
        d=row2[3]
        e=row2[4]
        f=row2[5]
        g=row2[6]
        notionDemo.add_bill(a,b,c,d,e,f,g)

接下来仍是熟悉的

添加后的界面

写到这里就该结束了，不过貌似可以跟github action结合，这样就可以实时更新页面，不过对我用处不大，一来网易云听歌少了，没什么新添加的歌曲，二来花瓣网图片链接无法直接添加页面，所以还是算了

十分感谢以下文章让我抄借鉴代码

豆瓣标记导出到 Notion 并同步

Notion → 支付宝&微信 → 账单

以全新的 Notion API，尝试全新的记账方式