Adetunji Dahunsi

Tweetus Deletus, tweets and json sequences

This you?

TJ Dahunsi

Mar 07 2021 · 3 min read

Categories:
kotlin

I recently downloaded my Twitter archive, and was it a trip down memory lane; I’ve had a Twitter account for almost 10 years! I laughed, reminisced, and also cringed. I’ve changed quite a bit, and like to think of myself as more poised and refined: I needed to do a self audit.

So here I am with an archive of some 11K+ tweets in a json file some 65K lines long. I have the following options for parsing it:

  1. Read the entire file into memory as a json blob with Gson or something similar

  2. Sequentially parse each line and manually keep track of where in the json tree I am, and lazily extract and process the information I need as I find it.

The solution as it would turn out like most things, is somewhere in the middle; use a library to sequentially parse the tweets and delete them as I go along. I opted to use Moshi and its excellent JsonReader class for parsing, however going through it’s docs, the example used is classically imperative:

1 2 public User readUser(JsonReader reader) throws IOException { 3 String username = null; 4 int followersCount = -1; 5 6 reader.beginObject(); 7 while (reader.hasNext()) { 8 String name = reader.nextName(); 9 if (name.equals("name")) { 10 username = reader.nextString(); 11 } else if (name.equals("followers_count")) { 12 followersCount = reader.nextInt(); 13 } else { 14 reader.skipValue(); 15 } 16 } 17 reader.endObject(); 18 return new User(username, followersCount); 19 }

There’s nothing wrong with this, and as it turns out, it’s often the best way to go for strictly performance reasons. Unfortunately for performance, I have the luxuries of time, CPU, RAM and the slightly overzealous need to use functional programming constructs and kotlin. With that said what does Json parsing and functional programming in kotlin have in common? Glad you asked.

The expression while(reader.hasNext()) loops through entries in a json mapping, which is imperative speak for a sequence of tuples. The above then can be represented more f̶a̶n̶c̶i̶l̶y̶ functionally with:

1inline fun <reified T> JsonReader.jsonSequence( 2 crossinline open: JsonReader.() -> Unit, 3 crossinline close: JsonReader.() -> Unit, 4 crossinline nextFunction: JsonReader.() -> T?, 5): Sequence<T> = generateSequence { open(this); null } 6 .plus(generateSequence { nextFunction(this) }) 7 .plus(generateSequence { close(this); null }) 8 .filterIsInstance<T>()

Notice both the open and close functions are lazily evaluated by representing them as empty sequences with the actual sequence sandwiched in the middle. Let’s set about actually parsing that json then, shall we?

1typealias TweetDetails = Map<String, String> 2val tweetFields = setOf( 3 "id", 4 "full_text", 5 "created_at", 6 "retweet_count", 7 "favorite_count" 8) 9 10fun JsonReader.tweetDetails(): Sequence<TweetDetails> = jsonSequence( 11 open = JsonReader::beginArray, 12 close = JsonReader::endArray 13) { 14 if (hasNext()) nextTweetDetails() 15 else null 16} 17 18fun JsonReader.nextTweetDetails(): TweetDetails = jsonSequence( 19 open = JsonReader::beginObject, 20 close = JsonReader::endObject, 21) { 22 if (hasNext()) when (nextName()) { 23 "tweet" -> nextTweet() 24 else -> skipValue() 25 } 26 else null 27} 28 .filterIsInstance<TweetDetails>() 29 .last() 30 31fun JsonReader.nextTweet(): TweetDetails = jsonSequence( 32 open = JsonReader::beginObject, 33 close = JsonReader::endObject, 34) { 35 if (hasNext()) when (val name = nextName()) { 36 in tweetFields -> name to nextString() 37 else -> skipValue() 38 } 39 else null 40} 41 .filterIsInstance<Pair<String, String>>() 42 .toMap()

Using the generic JsonReader.jsonSequence function, I can easily represent all the tweets I want to process as a lazy sequence. Deleting them then becomes as simple as:

1fun Path.tweetJsonSequence(): Sequence<TweetDetails> { 2 val source = Okio.buffer(Okio.source(toFile().inputStream())) 3 val reader = JsonReader.of(Okio.buffer(source)) 4 return reader.tweetDetails() 5 .plus(generateSequence(reader::close).take(1)) 6 .filterIsInstance<TweetDetails>() 7} 8 9val properties = Properties().apply { load(FileInputStream(configPath)) } 10val config = Config(properties) 11val deleter: (Int, TweetDetails) -> DeletionStatus = config.tweetDeleter() 12val tweetsToDelete: Sequence<TweetDetails> = config.tweetsToDeletePath.tweetJsonSequence() 13val statusWriter = CSVWriter(config.deletedTweetsPath.toFile().bufferedWriter()) 14 15tweetsToDelete 16 .filter(config::canDelete) 17 .mapIndexed(deleter) 18 .onEach(::println) 19 .filter(DeletionStatus::deleted) 20 .forEach { status -> 21 statusWriter.writeNext(status.tweetDetails.csv) 22 } 23 24statusWriter.close() 25 26println("DONE")

Source for Tweetus Deletus can be found here:

A young impressionable me

Used to tweet

With reckless glee

Age however

Has stayed my feet

With the above

I am born anew

Less in fear of the dreaded

“This you?”

0