[go: nahoru, domu]

Skip to content

nveldt/fauci-email

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

94 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

fauci-email: a json digest of Anthony Fauci's released emails

A collection of over 3000 pages of emails sent by Anthony Fauci and his staff were released in an effort to understand the United States government response to the COVID-19 pandemic:

Anthony Fauci's Emails Reveal The Pressure That Fell On One Man
Natalie Bettendorf and Jason Leopold
BuzzFeed News, June 2, 2021
https://www.buzzfeednews.com/article/nataliebettendorf/fauci-emails-covid-response

This repository hosts an easy-to-use json digest of these emails appropriate for many future types of studies along with prepackaged datasets derived from this data (networks, graphs, hypergraphs, tensors) along with simple analysis scripts to demonstrate the findings in our arXiv paper.

Citation

@article{Benson-2021-fauci-emails,
  author = {Austin Benson and Nate Veldt and David F. Gleich},
  title = {fauci-email: a json digest of Anthony Fauci's released emails},
  url = {http://arxiv.org/abs/2108.01239},
  journal = {arXiv},
  year = {2021},
  pages = {2108.01239},
  volume = {cs.SI},
  note = {Code and data available from \url{https://github.com/nveldt/fauci-email}}
}

@misc{Leopold-2021-fauci-emails,
  title = {Anthony Fauci’s Emails Reveal The Pressure That Fell On One Man},
  author = {Natalie Bettendorf and Jason Leopold},
  howpublished = {BuzzFeed News, \url{https://www.buzzfeednews.com/article/nataliebettendorf/fauci-emails-covid-response}},
  month = {June},
  year = {2021},
  url = {https://s3.documentcloud.org/documents/20793561/leopold-nih-foia-anthony-fauci-emails.pdf},
}

Findings

Optimal modularity partitions

We solve the NP-hard community detection problems using Gurobi's integer programming software. This results in the following partition of the network into communities (in a graph with Fauci removed, following structural hole theory). The heads of various federal agencies and task forces are present (Birx, Redfield, Farrar, Kadlac).

A network layout of the tofrom graph that shows 15 optimal modularity partitions of the nodes into 15 groups

Temporal evolution

Following this link takes you to an animation of 100 days of emails to Fauci.

Datasets

fauci-email-data.json

The raw JSON digest parsed from the PDF file. It's an array of email threads with names indexed into the name array (0-based). We hope the following schema description helps, although it may seem overly complicated

FILE <- { "emails": EMAILS,
          "names": [ Strings ], # names for people
          "clusters": [ Integer ], # 1-based index of organization ids for each person
          "cluster_names": [ Strings ] # names for each organization
         }

EMAILS <- [ THREAD ] # emails is an array of threads
THREAD <- [ EMAIL ] # a thread is an array of email
EMAIL <- { "sender": Integer, # the sender id in NAMES, 0 indexed
           "recipients": [ Integer ],  # indices into NAMES
           "cc": [ Integer ], # indices into NAMES
           "subject": String, # the subject field
           "time": TIMESTRING, # the normalized time-string
         }

TIMESTRING <- String # An isoformat time from Python isoformat()

e.g. in python3 this will output the sender and recipient lists for each email in each thread.

import json
with open('fauci-email-data.json') as f:
  data = json.loads(f.read())
names = data["names"]
for thread in data["emails"]:
  print("----")
  print("New Thread")
  for email in thread:
    print("--")
    print("From:", names[email["sender"]])
    print("To:", "; ".join([names[nid] for nid in email["recipients"]]))

In Julia, the data can be loaded as follows:

using JSON
data = JSON.parsefile("data/fauci-email-data.json")

The first email in the first thread is then given by:

data["emails"][1][1] =
        Dict{String, Any} with 6 entries:
          "recipients" => Any[1]
          "body"       => "I do not understand why you are asking me to \"review\" this. Is this an FYI??"
          "time"       => "2020-03-06T03:49:45+00:00"
          "sender"     => 0
          "cc"         => Any[2, 3, 4]
          "subject"    => "RE: Please review: House Oversight Letter on Coronavirus Diagnostics"

Networks and Graphs

There are 5 graphs we derive from the data for our analysis, although our tools in methods.jl can produce additional variations.

Graph file nodes simple graph
edges
simple
max degree
simple
mean degree
simple
median degree
simple
lambda2
weighted graph
loops
weighted
volume
max weighted
degree
mean weighted
degree
median weighted
degree
weighted
lambda2
fauci-email-graph-repliedto-nofauci 46 58 18 2.5 1 0.0167 2 435 7 91 9.5 3
fauci-email-graph-hypergraph-projection-nocc 372 2589 267 13.9 6 0.0536 0 13120 0 1998 35.3 11
fauci-email-graph-hypergraph-projection-cc 891 7250 697 16.3 7 0.0084 0 76910 0 4524 86.3 11
fauci-email-graph-tofrom-nofauci-nocc-5 233 325 44 2.8 1 0.0331 2 1168 2 102 5.0 2
fauci-email-graph-tofrom-nofauci-cc-5 386 585 97 3.0 2 0.0438 9 2173 15 247 5.6 2
  • fauci-email-graph-repliedto-nofauci.json : This is a weighted network that enumerates replied-to relationships. We have an edge from u to v if u replied to v's email and then weight the edge with the largest number of interactions in either direction. We remove Fauci from this view of the network to study the view without his emails. This network is an instance of a temporal motif network using a "replied-to" temporal motif. We then remove everyone outside of the largest connected component.
  • fauci-email-graph-tofrom-nofauci-nocc-5.json: This is a weighted network that has an edge between the sender and recipients of an email (excluding the CC list), weighted by the largest number of interactions in either direction. In this network, we remove emails with more than 5 recipients to focus on work behavior instead of broadcast behavior. This omits, for instance, weekly emails that detail spending of newly allocated funds to address the pandemic that were often sent to around 20 individuals. We also remove everyone outside the largest connected component.
  • fauci-email-graph-tofrom-nofauci-cc-5.json: This is the same network above, but expanded to include the CC lists in the number of recipients. The same limit of 5 recipients applies.
  • fauci-email-graph-hypergraph-projection-nocc.json: This is a weighted network that is a network projection of the email hypergraph where each email indicates a hyperedge among the sender and recipients. We then form the clique projection of the hypergraph, where each hyperedge induces a fully connected set of edges among all participants. The weight on an edge in the network are the number of hyperedges that share that edge. The graph is naturally undirected. Because this omits CC lists from each hyperedge, the graph can easily be disconnected if an email arrived via a CC edge. To focus the data analysis, we remove any individual who has only a single edge in the graph (with any weight).
  • fauci-email-graph-hypergraph-projection-cc.json: This version of the network adds CCed recipients to the hyperedge for each email.

We designed these graphs to be easy to read in a variety of software. They can be read as JSON files, but are also simple enough to parse without any JSON libraries.

{
  "vertices": <number of vertices>,
  "edges": <number of edges>,
  "edgedata": [
    <src1>, <dst1>, <weight1>,
    <src2>, <dst2>, <weight2>,
    ...
    <src_number_of_edges>, <dst_number_of_edges>, <weight_number_of_edges>
  ],
  "labels": [
    <list of labels, one per vertex>
  ],
  "orgs": [
    <list of organizations, one per vertex>
  ]
}

For instance, to use them with the SNAP package a few shell commands suffice

$ tail -n +5 fauci-email-graph-tofrom-nofauci-nocc-5.json | sed -n '/],/q;p' | sed 's/,//g' | cut -f1,2 -d" " | less

Or to read them without any JSON package in python3

with open("fauci-email-graph-tofrom-nofauci-nocc-5.json", "r") as f:
  f.readline() # read the first '{'
  nverts = int(f.readline().split(':')[1].split(',')[0])
  nedges = int(f.readline().split(':')[1].split(',')[0])
  f.readline() # read "edgedata"
  src, dst, weights = [],[],[]
  for _ in range(nedges):
    einfo = f.readline().split(",")
    src.append(int(einfo[0]))
    dst.append(int(einfo[1]))
    weights.append(int(einfo[2]))
  f.readline() # read end array
  f.readline() # read label array start
  labels = []
  for _ in range(nverts):
    labels.append(f.readline().strip().strip(",").strip('"'))
  f.readline() # read label array end
  f.readline() # read org array start
  orgs = []
  for _ in range(nverts):
    orgs.append(int(f.readline().strip().strip(",")))