Dataset and Variables

Dataset

The dataset used in this project, MC1_graph.json, is a JSON file generated by Python’s network.node_link_data() function. It can likewise be loaded to a networkx object using the corresponding node_link_graph() function. The root-level JSON object consists of graph-level properties specifying that it is directed and a multigraph, a “nodes” key which holds the list of nodes, and a “links” key which holds the list of edges.

Nodes

The nodes dataset contains 17,412 entries, each representing an entity within the music network and categorized under the Node Type column as “Person”, “Song”, or “RecordLabel”. Each node includes relevant attributes based on its type—for example, songs have fields such as single, release_date, genre, and notable, while people may have stage_name and notoriety_date. Please refer to following table for more details.

Nodes Type Description Attributes
Person These can be anyone in the music industry, including singers, producers, instrumentalists, composers, etc.
  • Node Type (string) – the type of node

  • name (string) – the name of the person

  • stage_name (string) – if provided, the stage name of the musician

MusicalGroup Bands, quartets, small choirs, or other officially organized entities formed by musicians to make music.
  • Node Type (string) – the type of node

  • name (string) – the name of the person

RecordLabel These are organizations—professional, commercial, or otherwise institutional—involved in the recording, production, or distribution of the music.
  • Node Type (string) – the type of node

  • Name (string) – the name of the organization

Song Music song
  • Node Type (string) – the type of node

  • single (boolean) – if provided, specifies whether or not the song was released as a standalone, or as part of a larger album

  • genre (string) – the song’s genre

  • notable (boolean) – whether or not the song has appeared on a top record chart

  • release_date (string) – the year in which the song was released

  • notoriety_date (string) – if provided, the year in which the song first appeared on a top record chart

  • written_date (string) – if provided, the year in which the song was written

Album Music Album
  • Node Type (string) – the type of node

  • genre (string) – the album’s genre

  • notable (boolean) – whether or not the album has appeared on a top record chart

  • release_date (string) – the year in which the album was released

  • notoriety_date (string) – if provided, the year in which the album first appeared on a top record chart

  • written_date (string) – if provided, the year in which the album was written

Edges

The edges dataset contains 37,857 records and 4 fields to represent the various relationships between entities in the network. Each edge contains the node IDs (source and target) of the starting and ending points, as well as 12 Edge Types describing the nature of the relationship, such as “PerformerOf”, ‘ComposerOf’ or “RecordedBy”. Meanwhile, the key field is used to distinguish between multiple connections between the same node pair. Please refer to following table for more details.

Edge Type Description
PerformerOf Indicates a that the source node (Person or MusicalGroup) performed the destination node (Song or Album)
ComposerOf Indicates that the source node (Person) composed the destination node (Song or Album)
ProducerOf Indicates that the source node (Person or RecordLabel) participated in the production of the destination node’s work (Song, Album, Person, or MusicalGroup)
LyricistOf Indicates that the source node (Person) wrote lyrics for the destination node (Song or Album)
RecordedBy Indicates that the destination node (RecordLabel) aided in the recording process for the source node (Song or Album)
DistributedBy Indicates that the destination node (RecordLabel) aided in the distribution process for the source node (Song or Album)
InStyleOf Indicates that the source node (Song or Album) was performed at least partly in the style of the destination node (Song, Album, Person, or MusicalGroup)
InterpolatesFrom Indicates that the source node (Song or Album) interpolated a melody from the destination node (Song or Album).
CoverOf Indicates that the source node (Song or Album) is a cover of the destination node (Song or Album)
LyricalReferenceTo Indicates that the source node (Song or Album) makes a lyrical reference to the destination node (Song or Album)
DirectlySamples Indicates that the source node (Song or Album) consists of (an) audio recording(s) that directly reuse a portion of the audio recording of the destination node (Song or Album) via sampling
MemberOf Indicates that the source node (Person) is (or was) a member of the destination node (MusicalGroup)