TomData
/

GPT2-review

Text Generation

Model card Files Files and versions

GPT2-review / README.md

TomData's picture

Update README.md

76afde0 verified over 1 year ago

|

history blame contribute delete

2.58 kB

	---
	datasets:
	- McAuley-Lab/Amazon-Reviews-2023
	language:
	- en
	library_name: pytorch
	pipeline_tag: text-generation
	base_model: openai-community/gpt2-medium
	---

	# GPT-2 Medium - Review

	## Model Details

	Model Description: This model is a checkpoint of GPT-2 Medium the 355M parameter version of GPT-2, a transformer-based language model created and released by OpenAI. The model is a further pretrained model on a causal language modeling (CLM) objective with English Amazon Product Reviews from the Fashion category.

	- Developed by: Students at University of Konstanz
	- Model Type: Transformer-based language model
	- Language(s): English
	- Base Model: [GPT2-medium](https://huggingface.co/openai-community/gpt2-medium)
	- Resources for more information: [GitHub Repo](https://github.com/TomSOWI/DLSS-24-Synthetic-Product-Reviews-Generation)


	## How to Get Started with the Model

	Use the code below to get started with the model. You can use this model directly with a pipeline for text generation. Since the generation relies on some randomness, we
	set a seed for reproducibility:

	```python
	>>> from transformers import pipeline, set_seed
	>>> generator = pipeline('text-generation', model='TomData/GPT2-review')
	>>> set_seed(42)
	>>> generator("Hello, I'm a language model,", max_length=30, num_return_sequences=5)
	```


	Here is how to use this model to get the features of a given text in PyTorch:

	```python
	tokenizer = AutoTokenizer.from_pretrained("TomData/GPT2-review")
	model = AutoModelForCausalLM.from_pretrained("TomData/GPT2-review")
	text = "Replace me by any text you'd like."
	encoded_input = tokenizer(text, return_tensors='pt')
	output = model(**encoded_input)
	```


	and in TensorFlow:

	```python
	tokenizer = AutoTokenizer.from_pretrained("TomData/GPT2-review")
	model = AutoModelForCausalLM.from_pretrained("TomData/GPT2-review")
	text = "Replace me by any text you'd like."
	encoded_input = tokenizer(text, return_tensors='tf')
	output = model(encoded_input)
	```

	## Uses

	This model is further pretrained to generate artificial product reviews. This can be usefull for:
	- Market research
	- Product analysis
	- Customer preferences
	- Fashion trends
	- Research


	## Training


	The model is further pretrained on the [Amazion Review Dataset](https://huggingface.co/datasets/McAuley-Lab/Amazon-Reviews-2023) from McAuley-Lab.
	For training only the reviews related to the Amazon Fashion category are used. See:

	```python
	dataset = load_dataset("McAuley-Lab/Amazon-Reviews-2023", "raw_review_Amazon_Fashion", trust_remote_code=True)
	```