read.graph {igraph}R Documentation

Reading foreign file formats

Description

The read.graph function is able to read graphs in various representations from a file, or from a http connection. Currently some simple formats are supported.

Usage

read.graph(file, format = c("edgelist", "pajek", "ncol", "lgl",
        "graphml", "dimacs", "graphdb", "gml", "dl"), ...)

Arguments

file

The connection to read from. This can be a local file, or a http or ftp connection. It can also be a character string with the file name or URI.

format

Character constant giving the file format. Right now edgelist, pajek, graphml, gml, ncol, lgl, dimacs and graphdb are supported, the default is edgelist. As of igraph 0.4 this argument is case insensitive.

...

Additional arguments, see below.

Details

The read.graph function may have additional arguments depending on the file format (the format argument). See the details separately for each file format, below.

Value

A graph object.

Edge list format

This format is a simple text file with numeric vertex ids defining the edges. There is no need to have newline characters between the edges, a simple space will also do.

Additional arguments:

n

The number of vertices in the graph. If it is smaller than or equal to the largest integer in the file, then it is ignored; so it is safe to set it to zero (the default).

directed

Logical scalar, whether to create a directed graph. The default value is TRUE.

Pajek format

Pajek it a popular network analysis program for Windows. (See the Pajek homepage at http://vlado.fmf.uni-lj.si/pub/networks/pajek/.) It has a quite flexible but not very well documented file format, see the Pajek manual on the Pajek homepage for some information about the file format.

igraph implements only a subset of the Pajek format:

From version 0.6.1 igraph supports reading bipartite (two-mode) graphs from Pajek files and adds the type vertex attribute. A warning is given if invalid edges (edges connecting vertices of the same type) are present in the file.

Vertex and edge attributes defined in the Pajek file will be also read and assigned to the graph object to be created. These are mainly parameters for graph visualization, but not exclusively, eg. the file might contain edge weights as well.

The following vertex attributes might be added:

igraph name description, Pajek attribute
id Vertex id
x, y, z The ‘

x’

, ‘

y’

and ‘

z’

coordinate of the vertex
vertexsize The size of the vertex when plotted (size in Pajek).
shape The shape of the vertex when plotted.
color Vertex color (ic in Pajek) if given with symbolic name
framecolor Border color (bc in Pajek) if given with symbolic name
labelcolor Label color (lc in Pajek) if given with symbolic name
xfact, yfact The x_fact and y_fact Pajek attributes.
labeldist The distance of the label from the vertex. (lr in Pajek.)
labeldegree,
labeldegree2 The la and lphi Pajek attributes
framewidth The width of the border (bw in Pajek).
fontsize Size of the label font (fos in Pajek.)
rotation The rotation of the vertex (phi in Pajek).
radius Radius, for some vertex shapes (r in Pajek).
diamondratio For the diamond shape (q in Pajek).
type vertex types in bipartite (two-mode) graphs.

These igraph attributes are only created if there is at least one vertex in the Pajek file which has the corresponding associated information. Eg. if there are vertex coordinates for at least one vertex then the ‘x’, ‘y’ and possibly also ‘z’ vertex attributes will be created. For those vertices for which the attribute is not defined, NaN is assigned.

The following edge attributes might be added:

igraph name description, Pajek attribute
weight Edge weights.
label l in Pajek.
color Edge color, if the color is given with a symbolic name, c in Pajek.
color-red,
color-green,
color-blue Edge color if it was given in RGB notation, c in Pajek.
edgewidth w in Pajek.
arrowsize s in Pajek.
hook1, hook2 h1 and h2 in Pajek.
angle1, angle2 a1 and a2 in Pajek, Bezier curve parameters.
velocity1,
velocity2 k1 and k2 in Pajek, Bezier curve parameter.
arrowpos ap in Pajek.
labelpos lp in Pajek.
labelangle,
labelangle2 lr and lphi in Pajek.
labeldegree la in Pajek.
fontsize fos in Pajek.
arrowtype a in Pajek.
linepattern p in Pajek.
labelcolor lc in Pajek.

There are no additional arguments for this format.

GraphML file format

GraphML is an XML-based file format (an XML application in the XML terminology) to describe graphs. It is a modern format, and can store graphs with an extensible set of vertex and edge attributes, and generalized graphs which igraph cannot handle. Thus igraph supports only a subset of the GraphML language:

See the GraphML homepage at http://graphml.graphdrawing.org for more information about the GraphML format.

Additional arguments:

index

If the GraphML file contains more than one graphs, this argument can be used to select the graph to read. By default the first graph is read (index 0).

GML file format

GML is a simple textual format, see http://www.infosun.fim.uni-passau.de/Graphlet/GML/ for details.

Although all syntactically correct GML can be parsed, we implement only a subset of this format, some attributes might be ignored. Here is a list of all the differences:

Please contact us if you cannot live with these limitations of the GML parser.

There are not additional argument for this format.

DL file format

The DL format is a simple textual file format used by the UCINET software. See http://www.analytictech.com/networks/dataentry.htm for examples. All formats mentioned here is supported by igraph.

Note the specification does not mention whether the format is case sensitive or not. For igraph DL files are case sensitive, i.e. ‘Larry’ and ‘larry’ are not the same.

Additional arguments:

directed

Logical scalar, whether to create a directed graph. The default is to make the graph directed.

NCOL format

This format is used by the Large Graph Layout program (http://bioinformatics.icmb.utexas.edu/lgl), and it is simply a symbolic weighted edge list. It is a simple text file with one edge per line. An edge is defined by two symbolic vertex names separated by whitespace. (The symbolic vertex names themselves cannot contain whitespace.) They might followed by an optional number, this will be the weight of the edge; the number can be negative and can be in scientific notation. If there is no weight specified to an edge it is assumed to be zero.

The resulting graph is always undirected. LGL cannot deal with files which contain multiple or loop edges, this is however not checked here, as igraph is happy with these.

Additional arguments:

names

Logical constant, whether to add the symbolic names as vertex attributes to the graph. If TRUE the name of the vertex attribute will be ‘name’.

weights

Character scalar, specifies whether edge weights should be added to the graph. Possible values are and their meaning are: ‘no’, edge weights will not be added; ‘yes’, edge weights will be added, if they are not present in the file, then all edges get zero weight; ‘auto’, edge weights will added if they are present in the file, otherwise not. The default is ‘auto’.

directed

Logical constant, whether to create a directed graph. The default is undirected.

LGL file format

The lgl format is used by the Large Graph Layout visualization software (http://bioinformatics.icmb.utexas.edu/lgl), it can describe undirected optionally weighted graphs. From the LGL manual: “The second format is the LGL file format (.lgl file suffix). This is yet another graph file format that tries to be as stingy as possible with space, yet keeping the edge file in a human readable (not binary) format. The format itself is like the following:

	 # vertex1name 
	vertex2name [optionalWeight]
	vertex3name [optionalWeight]
      
Here, the first vertex of an edge is preceded with a pound sign '\#'. Then each vertex that shares an edge with that vertex is listed one per line on subsequent lines.” LGL cannot handle loop and multiple edges or directed graphs, but in igraph it is not an error to have multiple and loop edges. Additional arguments:
names

Logical constant, whether to add the symbolic names as vertex attributes to the graph. If TRUE the name of the vertex attribute will be ‘name’.

weights

Character scalar, specifies whether edge weights should be added to the graph. Possible values are and their meaning are: ‘no’, edge weights will not be added; ‘yes’, edge weights will be added, if they are not present in the file, then all edges get zero weight; ‘auto’, edge weights will added if they are present in the file, otherwise not. The default is ‘auto’.

DIMACS file format

The DIMACS file format, more specifically the version for network flow problems, see the files at ftp://dimacs.rutgers.edu/pub/netflow/general-info/

This is a line-oriented text file (ASCII) format. The first character of each line defines the type of the line. If the first character is c the line is a comment line and it is ignored. There is one problem line (p) in the file, it must appear before any node and arc descriptor lines. The problem line has three fields separated by spaces: the problem type (min, max or asn), the number of vertices and number of edges in the graph. Exactly two node identification lines are expected (n), one for the source, one for the target vertex. These have two fields: the id of the vertex and the type of the vertex, either s (=source) or t (=target). Arc lines start with a and have three fields: the source vertex, the target vertex and the edge capacity.

Vertex ids are numbered from 1.

The source vertex is assigned to the source, the target vertex to the target graph attribute. The edge capacities are assigned to the capacity edge attribute.

Additional arguments:

directed

Logical scalar, whether to create a directed graph. By default a directed graph is created.

GraphDB format

This is a binary format, used in the graph database for isomorphism testing (http://amalfi.dis.unina.it/graph/) From the graph database homepage (http://amalfi.dis.unina.it/graph/db/doc/graphdbat-2.html):

The graphs are stored in a compact binary format, one graph per file. The file is composed of 16 bit words, which are represented using the so-called little-endian convention, i.e. the least significant byte of the word is stored first.

Then, for each node, the file contains the list of edges coming out of the node itself. The list is represented by a word encoding its length, followed by a word for each edge, representing the destination node of the edge. Node numeration is 0-based, so the first node of the graph has index 0.

See also graph.graphdb.

Only unlabelled graphs are implemented.

Additional attributes:

directed

Logical scalar. Whether to create a directed graph.

Author(s)

Gabor Csardi csardi.gabor@gmail.com

See Also

write.graph


[Package igraph version 0.6.5-1 Index]