lalegpl.datasets.multitable.fetch_datasets module¶
- lalegpl.datasets.multitable.fetch_datasets.fetch_imdb_dataset(datatype='pandas')[source]¶
Fetches the IMDB movie dataset from Relational Dataset Repo. It contains information about directors, actors, roles and genres of multiple movies in form of 7 CSV files. This method downloads and stores these 7 CSV files under the ‘lale/lale/datasets/multitable/imdb_data’ directory. It creates this directory by itself if it does not exists.
Dataset URL: https://relational.fit.cvut.cz/dataset/IMDb
- Parameters
datatype (string, optional, default 'pandas') –
If ‘pandas’, Returns a list of singleton dictionaries (each element of the list is one table from the dataset) after reading the downloaded / existing CSV files. The key of each dictionary is the name of the table and the value contains a pandas dataframe consisting of the data.
If ‘spark’, Returns a list of singleton dictionaries (each element of the list is one table from the dataset) after reading the downloaded / existing CSV files. The key of each dictionary is the name of the table and the value contains a spark dataframe consisting of the data.
Else, Throws an error as it does not support any other return type.
- Returns
imdb_list
- Return type
list of singleton dictionary of pandas / spark dataframes